Method for storing data files on a multiple volume media set5579516Abstract A method for storing a set of files on a multiple volume media set supports the ISO/IEC 13346 standard for optical media while minimizing swapping and temporary storage requirements when the multiple volume media set is used in an optical disk autochanger. The method allows the media set (e.g., optical disks) to be self-describing in accordance with the ISO/IEC 13346 standard. However, when the media set is on-line, the meta-data is separated from the data to optimize on-line performance. The method sequentializes the meta-data of the file set before writing it to the media set to achieve these advantages. Claims What is claimed is: Description BACKGROUND OF THE INVENTION
______________________________________
Structure Explanation
______________________________________
D0' ICB for directory D0
F01' ICB for first file in directory D0
F02' ICB for second file in directory D0
D1' ICB for directory D1
D2' ICB for directory D2
F21' ICB for first file in directory D2
F22' ICB for second file in directory D2
D3' ICB for directory D3
D4' ICB for directory D4
D0 Directory file D0
D1 Directory file D1
D2 Directory file D2
D3 Directory file D3
D4 Directory file D4
______________________________________
One approach to writing this information to optical disk would be to first write the ICB for directory D0, followed by directory D0. However, writing directory D0 requires knowing where the ICBs for directories D1 and D2 are going to be written. Writing the ICBs for directories D1 and D2 requires knowledge of the location and size of the directory files for directories D1 and D2. This interrelationship is illustrated in FIG. 2. FIG. 2 illustrates the complex relationship between directories D0, D1 and D2 and their ICBs. For ease of discussion, directories D3 and D4 have been omitted. As indicated by line 202, ICB D0 points to the location of directory D0. Directory D0 includes an entry pointing back to its parent directory, as indicated by line 204. In the case of directory D0, line 204 points to its own ICB, since directory DO is the root directory. Directory DO includes entries for directories D1 and D2. The directory entries for directories D1 and D2 include the addresses of the ICBs for directories D1 and D2, as indicated by lines 206 and 208, respectively. ICBs D1' and D2' then include the extents for the directory files D1 and D2, respectively, as indicated by lines 210 and 212. Note that directory files D1 and D2 include pointers to the ICB (D0') of their parent directory (DO) as indicated by lines 214 and 216. As illustrated by this diagram, the links between the files, directories and ICBs can become quite convoluted when many files are involved in the file set. It is this convoluted series of links that can necessitate that multiple disk swaps occur in performing a simple task such as listing a directory. For example, the ICBs and directories illustrated in FIG. 2 may be spread across multiple surfaces of a disk set. Each time a pointer between an ICB and a file crosses a surface boundary, a swap is required. The present invention is directed towards minimizing a number of swaps required to read this meta-data from or write this meta-data to one or more volumes of a multiple volume disk set. The amount of temporary storage required to build the meta-data is also minimized by the method of the invention by making multiple passes through the file information. Minimizing the amount of required temporary storage can be critical considering the numbers of files and directories which may reside on a group or set of optical disks inside an autochanger. The invention achieves these advantages by sequentializing the meta-data to be stored to the optical disks. In a preferred embodiment, data and meta-data are written to the optical disks in a sequential fashion as discussed below. In an alternate embodiment, the data files are already on the optical disk surfaces and only the meta-data is sequentialized and written to the disks. In accordance with the standard, the header information on one of the disk volumes of the volume set (i.e., normally the last volume in the set) is modified to indicate that the data has been written in a sequential manner. Fully Sequential Write In the embodiment of fully sequentializing the data and meta-data written to the optical disk surfaces, the data and meta-data are written to the disk surfaces such that the ICBs appear first, the directories appear next, and the data appears last. This is illustrated in FIG. 3. The ICBs, directories and data, however, are written in reverse order from how they appear in the sequential data stream. Note, that in FIG. 3, the ICBs, directories and data are illustrated as being sequential with one following the other. This however is not required by the invention. What is meant by "fully sequential" is that the ICBs are all written sequentially in one or more extents, the directories are written sequentially in one group of extents, and the data files are written sequentially in one group of extents. By "partially sequential," what is meant is that the ICBs and directories are written sequentially to the extent possible given that they will likely be written to space on disk as available and will likely be interleaved with extents of data files. The data files will not be written sequentially. The high level method for writing the meta-data and data to optical disk is illustrated in FIG. 4. As illustrated, the data is written first in a step 402. The directories are written next in a step 404, and finally, the ICBs are written in a step 406. Once the ICBs have been written, a list or map of the extents of the ICB table (i.e., where the ICB table is located) is written in step 408 to the volume header or, alternatively, in an implementation specific attribute of the ICB for the root directory. In step 402, the file data is written to the disk surfaces in the proper recursive descent order. That is, starting at the root directory, the data found in each file is written to disk. This allows files which are broken up into multiple extents to span across surface boundaries. It is not required that the file names be stored at this time. Rather, a list of extents is stored in temporary storage (e.g., a solid state cache or other memory device). As described below, a pass will be made through this data when building the ICBs. In step 404, the directories are built and written to the disk in the same recursive descent order as described above. Since directories tie the file names and ICBs together, the ICB addresses must be assigned to each file before the directories can be built. It is not necessary to write the ICBs at this time. However, it is necessary to allocate the space on the disk surfaces in which the ICBs will be later stored. By "allocate," it is meant to reserve the addresses on the disk surfaces for the later storage of the ICBs. This allocated space is called an ICB table. Space for the ICB table may be allocated on disk by determining a total number of files, including both directory files and data files, to be stored to the disk surfaces. Once this is known, a total number of required ICBs can be determined. For files containing a large number of extents, all of the extents may not fit in a single sector. For such files, overflow sectors or allocation extents must also be allocated in the ICB table. In addition, ICBs must be allocated for soft or logical links. A logical link is a file which contains a path to another file. Thus, space for the ICB table may be allocated on disk by determining an amount of space required to store ICBs for all data files and directory files as well as any overflow sectors and logical links in the set of files. As the directories are built to include file names and ICB addresses from the ICB table, the directories are written to disk. As the directories are written to disk, the locations and lengths of the directories are stored in temporary storage. Use of the locations and lengths (extents) of the directories is described below. Finally, in step 406, the ICBs are built and written to disk. Each ICB includes a list or map of the extents into which the corresponding file is written to the disk. The attributes in the ICB are obtained from an existing source. For example, the attributes may be retrieved from an existing file system where the data files originated. The extent information for the data files and directories comes from the temporary storage generated at steps 402 and 404, respectively. Note that an overflow sector, whenever needed, is placed immediately subsequent to the ICB referencing it. The overflow sector will have been accounted for in the original allocation of space required for the ICB table. Partially Sequential Data Oftentimes, the data files will already be stored on the disk surfaces. In this case, the meta-data may still be written according to steps 404 and 406. When this is done, it is assumed that extent information about each file will be available. Thus, the difference between writing fully sequential data and partially sequential data, is that step 402 will be skipped, and in step 406 the extent information about a data file will come from a different source than the temporary storage. The data, being previously present on the disk surfaces, may be scattered randomly across all the disks in the volume set, leaving multiple "holes" for the meta-data to be placed. Alternatively, some room may be available only on the last disk in the volume set. The method of the invention will operate in either case. When a single extent (e.g., on the last disk of the volume set) is available, the meta-data is written sequentially inside the extent as illustrated in FIG. 3. In the case where the data is randomly scattered across the disks in the volume set and a single extent large enough for the meta-data is not available, than each available extent is used in a sequential fashion such that all available extents on a given surface are used before any extents are used on subsequent surfaces. The meta-data will look similar to that illustrated in FIG. 3. However, it will be interspersed with data. In this case, a map of all extents used for the ICB table must be maintained in, for example, the volume header or root directory ICB. Write Example An example of sequential meta-data written according to the present invention is illustrated in FIG. 5. This example illustrates meta-data for the sample directory structure of FIG. 1. Note that no data files are present in this example. The invention begins by counting the total number of directory files and data files in the file set. Overflow extents and logical links are also counted. This will determine the size of the ICB table. The ICB table is indicated by reference number 502 in FIG. 5. In the meta-data structure of FIG. 5, the directories and meta-data structures are identified as set forth in the table above. Once space has been allocated on the disks for the ICB table, the directories are built starting with directory D0. As entries are added to the directories, ICBs are allocated from the ICB table. Each time an ICB is allocated, the ICB location is taken from an address pointed to by a pointer in the ICB table. The pointer is then incremented to the next ICB location. The ICBs are not stored in the corresponding locations in the ICB table at this time. Rather the table addresses are only used to build the directories. Once a directory is built, it is written to the disk surface, starting at the location determined or identified as the end of the ICB table. As illustrated, directory D0 is written to optical disk at a portion directly following ICB table 502. The starting location and length of directory DO is then recorded in temporary storage. Next, the method continues to build the remaining directories in a similar fashion and to write them to the disk set. As illustrated, directories D1, D2, D3 and D4 are written to the disk set in a recursive descent pattern following directory D0. After all the directories are written to the disk set, the ICB table is built and written to the disk set. The ICB table is built as follows. First, the ICB table pointer is set to the beginning of the table (the ICB address for D0' in the example). The directories and their entries are visited in the same order as made in the first pass through the directories, when the directories were written. Each time, an ICB is written to the location pointed to by the ICB table pointer. The pointer is then incremented. If the ICB corresponds to a directory, then the corresponding location and length or extent information is retrieved from temporary storage. If the directories and directory entries are visited in the same order as they were when the directories were written, then the ICB addresses will match those referenced by the directories already written to disk. Fully Sequential Read In the fully-sequential case, the meta-data and data are read through the disk surfaces in the sequential fashion implied by its order on the media. First the ICBs are read, then the directories are read, and finally, the data files are read. This is illustrated in the flow chart of FIG. 6. In a step 602, the location of the ICB table is determined. Preferably, the ICB table extent is stored in the volume header or root directory ICB as discussed above. In a step 604, all ICBs are read in sequential fashion. Note, that this causes the data to be read in the same recursive descent order with which it was stored. This step of the invention likely requires the most temporary storage because the attributes and extents found in the ICBs must be maintained in temporary storage. This will allow any ICB to be retrieved, given its address. In one embodiment, the ICBs are maintained in temporary storage by creating a file for each ICB. The extent list is stored in the data space of the file. The file name is the ASCII form of the ICB address. This allows all pertinent file information to obtained for an ICB by simply opening the proper file using the ICB address as a file name. In step 606, all directories are also read into temporary storage. File names are then associated with the ICB information from step 604. To read in the directories, the location and length of the first directory is obtained from the appropriate ICB. This is always the root directory. Once the root directory is read in and the names are attached to all the files and directories it contains, each subdirectory is visited in a recursive fashion and the same operation is performed. Note that as this basic recursive descent is performed, the physical order of reading directories is sequential. In the embodiment where the ICB information is held in files named by their ICB addresses, the files are simply renamed or changed into directories as appropriate. Once this step is complete, the directories will be in their final state and the files will contain their extent lists. In step 608, the data is read from the optical disks, if desired. Again, this is done in a recursive descent fashion, where all files and all directories are visited and the data is copied from the disk set to a desired storage medium, such as a dedicated magnetic disk. Partially Sequential Read In the case where the data has not been written sequentially because, for example, it already existed on the disk set, step 606 may be interleaved with step 608 in order to sequentialize the read process as much as possible. This involves visiting all the stored ICBs and looking for extents on the current surface. These extents may all be gathered and restored in some optimized sequential and/or bulk fashion such that all data is read from the disk surface before swapping in a new disk surface. Alternatively, the data files may be retrieved as the ICBs are found. In this scenario, extra buffering is required to make the data retrieval as efficient as possible. Note that since the location of all the data cannot be known until all the ICBs have been read in, if an ICB segment spans a surface boundary, it is possible that multiple visits to a disk may be required. Nonetheless, the invention still minimizes the number of visits required to a surface as compared to known data storage techniques. Reading Fully Non-Sequential Data In the case where data and meta-data have not been written sequentially, the method of the invention uses the same recursive descent method as described above. As directories are read as indicated at step 606, each ICB referenced by a directory must be read from the disk surfaces, rather than temporary storage. This involves extra seeks on the disk and may involve swapping. However, this is the price that must be paid for the data which was originally written in a non-sequential fashion. The data may also be retrieved in this same step. However, additional temporary storage will be required. Compatibility The method of the invention is directed toward minimizing file swapping and temporary storage requirements in an optical disk autochanger for multiple volume media sets. The invention accomplishes this by storing data on the media set in a format which is fully compatible with ISO/IEC 13346 file format standard. Preferably the data will be stored according to the invention in a fully sequential format. This method is a hybrid format in that it is sequentially written and optimized for sequential reading and writing. However, all of the links between structures are intact such that applications uninformed of the sequential formatting may read the files sets from the disk volumes. Thus, the media set may be read fully sequentially or fully non-sequentially. In addition, the meta-data may be read sequentially and the data read non-sequentially. In the embodiment where the data is stored non-sequentially and the meta-data is stored sequentially, fully sequential reading will not be supported. If the data is written fully non-sequentially, it may be read only non-sequentially. This is illustrated in the diagram of FIG. 9. FIG. 7 is a detailed flow chart showing the preferred steps of the method for building directories and ICBs and writing them to the disk volumes. Steps 702-712 are directed towards building and writing the directories. Steps 718-728 are directed to building and writing the ICBs. As illustrated at step 702, the method starts at the root directory in accordance with the recursive descent scheme. In a step 704, the root directory is built. In a step 706, the root directory is written to the disk set. In a step 708, the directory entries in the root directory are checked for any subdirectories. If subdirectories exist, then the current directory is changed to a subdirectory at step 710. The method then repeats 704-708 for this subdirectory. The test of step 708 will cause directories to be built and written for all downstream subdirectories. If an additional subdirectory does not exist for the current directory, then step 712 checks to see if the current directory is the root. If it is not, the method changes directory to the parent of the current directory in a step 714 and steps 704-708 are repeated. This method will assure that all directories and subdirectories are built and written to the disk set. Once no more subdirectories are found and the current directory is the root directory, then the method proceeds from step 712 to step 718. Step 718 writes the ICB for the root directory. At step 720, the method searches for subdirectories of the root directory. If a subdirectory is found, then it is selected as the current directory at step 722 and it's ICB is written in step 724. Note that the order that subdirectories are selected in step 722 must be the same as the order the subdirectories are selected in step 710 to assure correspondence of the ICB addresses and their directory files. After step 724, the method then returns to step 720 until ICBs are written for all downstream subdirectories. At step 726, it is determined whether the current directory is the root directory. If the current directory is the root directory then the method completes at step 730. If the current directory is not the root directory, then the current directory is changed to the parent directory at step 728, and the method returns to step 720. This method will assure that an ICB is written for each directory and subdirectory. FIG. 8 illustrates the preferred method according to the invention for reading meta-data from the disk set. In steps 802 and 804, all ICBs are read from the disk sets. Once all ICBs have been read, the directories will be read starting with the root ICB as indicated at step 806. The root ICB will point to the root directory. The root directory is then read at step 808. The method checks for any subdirectories at step 810. If subdirectories are found, then the ICBs for the subdirectories are retrieved (from temporary storage) at step 812. The directory is then read into temporary storage at step 808. Steps 808 and 810 are then repeated until all downstream subdirectories have been read from the disk set. At step 814, it is determined whether the current directory is the root directory. If the current directory is the root directory, then the method ends at step 818. If it is not, then the current directory is changed to the parent directory at step 816, and the method returns to step 810. This method will assure that all ICBs and all directories are read from the disk set. While the invention has been particularly shown and described with reference to several preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined in the appended claims.
|
Same subclass Same class Consider this |
||||||||||
