Method and system for managing multiple database storage units6829623Abstract The system manages at least a database in multiple database storage units, and the database is duplicated for redundancy. The system maintains the database redundancy for security. The system enables the maintenance and reorganization of the database storage units without interrupting the on-going operations involving the access to the database storage units. The non-stop ongoing database access is available by allocating a pair of a main storage unit and an intermediate storage unit that maintains the substantially identical data. During certain predetermined operations, the main storage unit is being dedicating to the on-going transactions while the intermediate database storage unit is manipulated to contain data that corresponds to a specified time. For example, the content of the intermediate storage is subsequently copied or used to update another database. When the above duplication or incorporation is completed, the intermediate storage unit is updated for the transactions that have taken place during the above operations to be synchronized with the main storage unit. Claims What is claimed is: Description FIELD OF THE INVENTION
CREATE TABLE sales record table
(sales code CHAR (16),
sales item CHAR (16),
territory code CHAR (16),
time code DATE,
sales amount DEC(10))
IN ((area 1) time >='1997-12',
(area 2) time >='1998-12',
(area 3) time >='1999-03',
(area 4) time >='1999-06',
(area 51) time >='1999-07',
(area 52) time >='1999-08',
(area 53) time >='1999-09');
In the above example, no area is defined for the fourth quarter in the year 1999, the areas 51-304, 52-305 and 53-306 for respectively July, August and September, 1999 are considered as a combined fifth storage area. Now referring to FIG. 4, a diagram illustrates a database before and after a change in the database division method is made in September, 1999. For example, the following definition changes the data for July, August and September stored respectively in the areas 51-304, 52-305 and 53-306 to be combined into a single area 5-307 as a third quarter of 1999.
ALTER TABLE sales record table
MERGE AREA (area 51-304, area-52-305, area 53-306)
INTO ((area 5-307) time code >="1999-09');
As a result of the above change, the dictionary 50 now has a single entry for `1999-09` in the area 5, which includes the three previous entries for `1999-07,` `1999-08` and `1999-09` in the areas 51, 52 and 53. FIG. 5 is a flow chart illustrating the steps involved in a preferred process of processing a database management command according to the current invention. The logical processing unit 22 of the database management unit 20 as shown in FIG. 2 processes the following steps. It is determined whether or not a command is an analysis request, a command request or an execution request in step 225. If it is a command request, the process proceeds to a step 224 for analyzing the command. If it is a table definition command in a step 226, a division method and each corresponding area are registered in the dictionary in a step 227 based upon the table definition command. If it is a table modification command in a step 228, a modification for each corresponding area is registered in the dictionary in a step 229 based upon the table modification command. If it is an area mapping command in a step 230, each area and the corresponding main and sub volumes are registered in the dictionary in a step 231 based upon the area mapping command. If it is a main-sub volume command in a step 232, a connection or a disconnection is registered in the dictionary in a step 233 based upon the main-sub volume control command. If the command is not any of the above described command, the process terminates. On the other hand, if the request is determined to be an execution command in the step 225, the code is interpreted for execution in a step 223 and the process terminates. Lastly, the request is determined to be an analysis request in the step 225, an query analysis is performed in a step 220 while an optimization process is executed in a step 221. Based upon the steps 220 and 221, an appropriate code is generated in a step 222, and the process terminates. Now referring to FIG. 6, a block diagram illustrates preferred disk processes of writing data that are used in the process according to the current invention. As shown in FIG. 6A, a disk device containing a pair of a main volume and a sub volume writes data without a predetermined automatic duplicating process. A database buffer unit 231A issues a data write command to both a main volume disk 2-30A and a sub volume disk 2-31A that are not equipped with an automatic double writing or duplicating feature. That is, by using certain software, the two disk volumes maintain the same data without the duplicating feature. On the other hand, as shown in FIG. 6B, a database buffer unit 231B issues a data write command to both a main volume disk 2-30B and a sub volume disk 2-31B that are equipped with an automatic double writing or duplicating feature. With the duplicating feature, a disk area 2-30 of the main volume 2-30B has been corresponded to a disk area 2-31 of the sub volume 2-31B. By using the correspondence, the data to be written to the disk area 2-30 will be automatically also written to the disk area 2-31. In other words, the two disk volumes 2-30B and 2-31B maintain the identical data. The following area volume mapping definition corresponds the area 2-30B of the main volume to the area 2-31B of the sub volume. CREATE AREA 2 as PRIMARY main volume2 SECONDARY sub volume2; By the above area volume mapping definition, the dictionary 50 contains information on the names of the volume such as "main volume 2" and "sub volume 2" for the areas such as "area 2." FIG. 7 is a flow chart illustrating the steps involved in a preferred process of managing a database buffer according to the current invention. The database buffer unit 231 as shown in FIG. 7 processes the following steps. The database buffer management process is initiated when a write or a read operation is requested to work on a page in the database in a step 231. It is determined whether or not a requested page exists in a predetermined database buffer in a step 2310. If the requested page exists, the preferred process reports the buffer in a step 2311. On the other hand, if the requested page does not exists in the database buffer in the step 2310, a page to be purged is initially determined in a step 2312 before reading the requested page. Then, it is determined whether or not a double writing is necessary in a step 2313. If the disk device is equipped with the double writing feature, the data is written to a pair of the disks via an operating system software program in a step 2314. On the other hand, the disk device is not equipped with the double writing feature, the data is initially written only to a main disk volume in a step 2315. Subsequently, the same data is written to a sub disk volume in a step 2316. Finally, the requested data page is written to the database buffer in a step 2317. The preferred process terminates. FIG. 8 is a block diagram illustrating one preferred process of updating disk storage units in the database management system according to the current invention. For example, the disk storage units include a relational database and an OLAP database, and information is updated from the relational database to the OLAP database while the relational database is continuously accessed for transactions. This preferred process is accomplished without the above described double writing feature in the disk storage units. In other words, the data duplication is accomplished by predetermined software. The database management system includes a database buffer management unit 231, a data extraction unit 200, a data update unit 600, a first data storage unit or a main storage volume 2-30, a second data storage unit or a sub storage volume 2-31, a system log unit and a set of third data storage units 700-1 through 700-m. In general, a common database such as a relational database is duplicated and maintained in both the main and sub data storage volumes 2-30 and 2-31 via the database buffer management unit 200. The system log unit 40 maintains the changes that are made to the database stored in the data storage units 2-30 and 2-31. Still referring to FIG. 8, the database management system follows the following three steps. In a specified time data generation phase (i), the main storage volume 2-30 and the sub storage volume 2-31 are not disassociated, but a mode is set to write only to the main storage volume 2-30 to update its database for the incoming transactions. The database buffer management unit 231 continues to update the data in the main volume storage 2-30 as well as the transaction data in the system log unit 40 in response to continuous transactions. Subsequently, the sub storage volume 2-31 will be rolled back or adjusted to contain complete data as the result of all transactions that have taken place up to a specified point in time. To complete the data in the sub storage volume 2-31 according to the specified time, the data extraction unit 200 extracts a relevant portion of the transaction data in the system log unit 40, and the data in the sub storage volume 2-31 is modified. In the data extraction /Update phase (ii), the updated or rolled back data in the sub storage volume 2-31 now will be incorporated into a third database such as an OLAP database that is stored in a plurality of disk storage units 700-1 through 700-m. To synchronize the data in the sub storage volume 2-31 and that of the third database, the data extraction unit 200 extracts the relevant portion of the data to be incorporated from the sub storage volume 2-31. Based upon the extracted data, the data update unit 600 updates corresponding portions of the data that are possibly stored in multipack disk storage units of the third database. To determine a disk storage space, a predetermined division schema or method is used. After the incorporation the relevant new data of the second database into the third database, in the main-sub volume reconnection phase (iii), the second database will be reconnected and synchronized to the first database. Since the first database in the main storage volume 2-30 is being continuously handling all transactions and the two databases are being disconnected during the above described phases (i) and (ii), the second database in the sub storage volume 2-31 is no longer synchronized with the first database in the main storage volume 2-30 by the end of the phase (ii). In the phase (iii), after the main storage volume and the sub storage volume are reconnected, the data extraction unit 200 now extracts relevant information from the system log unit 200 so as to roll forward the data in the sub storage volume 2-31. That is, the relevant information accounts for the transactions that had taken place in the first database in the disconnected main storage volume 2-30 during the phases (i) and (ii) so that the first database in the main storage volume 2-30 and the second database in the sub storage volume 2-31 are again synchronized. FIG. 9 is a block diagram illustrating a second preferred process of updating disk storage units in the database management system according to the current invention. For example, the disk storage units include a relational database and an OLAP database, and information is updated from the relational database to the OLAP database while the relational database is continuously accessed for transactions. This preferred process is accomplished with the above described double writing feature in the disk storage units. In other words, the data duplication is accomplished with a help of a certain hardware unit such as a differential information unit 3000. The database management system includes a database buffer management unit 231, a data extraction unit 200, data update unit 600, a first data storage unit or a main storage volume 2-30, a second data storage unit or a sub storage volume 2-31, a system log unit, a differential information unit 3000 and a set of third data storage units 700-1 through 702-m. In general, a common database such as a relational data base is duplicated and maintained in both the main and sub data storage volumes 2-30 and 2-31 via the database buffer management unit 200. The system log unit 40 as well as the differential information unit 3000 maintain the changes that are made to the database stored in the data storage units 2-30 and 2-31. Still referring to FIG. 9, the database management system follows the following three steps. In a specified time data generation phase (i), the main storage volume 2-30 and the sub storage volume 2-31 are disassociated, and the sub storage volume 2-31 is disconnected to update its database for the incoming transactions. The database buffer management unit 231 continues to update the data in the main volume storage 2-30 as well as the transaction data in the system log unit 40 in response to continuous transactions. In addition, in response to certain disk operations, the main storage volume 2-31 also updates the information in the differential information unit 3000. Subsequently, the sub storage volume 2-31 will be rolled back or adjusted to contain complete data as the result of all transactions that have taken place up to a specified point in time. To complete the data in the sub storage volume 2-31 according to the specified time, the data extraction unit 200 extracts a relevant portion of the transaction data in the system log unit 40, and the data in the sub storage volume 2-31 is modified. In the data extraction /Update phase (ii), the updated or rolled back data in the sub storage volume 2-31 now will be incorporated into a third database such as an OLAF database that is stored in a plurality of disk storage units 700-1 through 700-m. To synchronize the data in the sub storage volume 2-31 and that of the third database, the data extraction unit 200 extracts the relevant portion of the data to be incorporated from the sub storage volume 2-31. Based upon the extracted data, the data update unit 600 updates corresponding portions of the data that are possibly stored in multiple disk storage units of the third database. To determine a disk storage space, a predetermined division schema or method is used. After the incorporation the relevant new data of the second database into the third database, in the main-sub volume reconnection phase (iii), the second database will be reconnected and synchronized to the first database. Since the first database in the main storage volume 2-30 is being continuously handling all transactions and the two databases are being disconnected during the above described phases (i) and (ii), the second database in the sub storage volume 2-31 is no longer synchronized with the first database in the main storage volume 2-30 by the end of the phase (ii). In the phase (iii), after the main storage volume and the sub storage volume are reconnected, the differential information unit 3000 rolls forward the data in the sub storage volume 2-31 based upon the relevant information. That is, the relevant information accounts for the transactions that had taken place in the first database in the disconnected main storage volume 2-30 during the phases (i) and (ii) so that the first database in the main storage volume 2-30 and the second database in the sub storage volume 2-31 are again synchronized. Now referring to FIG. 10, a block diagram illustrates a third preferred process of updating disk storage units in the database management system according to the current invention. The third preferred process is accomplished without the above described double writing feature in the disk storage units. In other words, the data duplication is accomplished by predetermined software. The database management system includes a database buffer management unit 231, a data extraction unit 200, a data update unit 600, a first data storage unit or a main storage volume 2-30, a system log unit and a set of additional data storage units 700-1 through 700-m In general, a database such as a relational data base is maintained in the main data storage volume 2-30 via the database buffer management unit 200. The system log unit 40 maintains the changes that are made to the database stored in the data storage unit 2-30 as well as a system log point which indicates a time of the last update of a database in the additional data storage units 700-1 through 700-m based upon the data stored in the system log unit 40. Upon specifying a new time, the data extraction unit 200 extracts a relevant portion of information from the system log unit 40, and the data update unit 600 updates the data in the additional data storage units 700-1 through 700-m for a period between the last update time and the currently specified new time. The data update unit 600 updates the additional database according to a predetermined division schema. Now referring to FIG. 11, a flow chart summarizing the steps involved in the above described first, second and third preferred processes of managing a plurality of databases according to the current invention. The data extraction unit 200 and the data update unit 600 as shown in FIGS. 8, 9 and 10 process the following steps. It is determined in a step 2001 whether or not a current update is a system-log-based data update as described with respect to FIG. 10. If the current update is indeed a system-log-based data update, the last system log update point is reported in a step 2002. Subsequently, the data corresponding to a period between the last system log update point and a currently specified time is updated in a step 2003 and the preferred process terminates. On the other hand, if the step 2001 determines that the current update is not a system-log-based data update, it is further determined whether or not the disk devices are equipped with the above described double writing feature in a step 2004. Still referring to FIG. 11, if the disk devices are equipped with the double writing feature, the main storage volume 2-30 and the sub storage volume 2-31 are disassociated in a step 2005. At the same time, the differential information unit 3000 is placed in a predetermined mode to record the information that reflects the changes to the main storage volume 2-30 in a step 2006. Based upon the information in the system log unit 40, the data in the sub storage volume 2-31 is rolled back or adjusted to a specified time in a step 2007. The adjusted or rolled back data in the sub storage volume 2-31 is incorporated into a third database in a step 2008. Upon completion of the step 2008, based upon the differential information unit 3000, the data in the sub storage volume 2-31 is rolled forward to be synchronized with the database in the main storage volume 2-30. Finally, the main storage volume 2-30 and the sub storage volume 2-31 are associated again in a step 2010, and the preferred process terminates. On the other hand, if the disk devices are not equipped with the double writing feature, the operating system is set in a mode to write only to the main storage volume 2-30 in a step 2011. In other words, the sub storage volume 2-31 is left intact from the future incoming transactions. Based upon the information in the system log unit 40, the data in the sub storage volume 2-31 is rolled back or adjusted to a specified time in a step 2012. The adjusted or rolled back data in the sub storage volume 2-31 is incorporated into a third database in a step 2013. Upon completion of the step 2013, based upon the system log unit 40, the data in the sub storage volume 2-31 is rolled forward to be synchronized with the database in the main storage volume 2-30. Finally, the operating system is set in an original mode to write to both the main storage volume 2-30 and the sub storage volume 2-31 in a step 2015, and the preferred process terminates. FIG. 12 is a flow chart illustrating steps involved in a preferred process of incorporating the extracted data according to the current invention. The data update unit 600 as shown in FIGS. 8, 9 and 10 processes the following steps. In a step 6000, the data update unit 600 receives data from a data extraction unit 200, and based upon the received data, the data update unit 600 initializes a new record and incorporates the data. The incorporated data is now stored in a physical area in a storage device associated with the third database, and the physical area is determined by a predetermined division schema or method in a step 6001. If the third database is an OLAP database system, a summary/management database for the OLAP database is also updated based upon the newly incorporated data in a step 6002, and the preferred process terminates. Referring to FIGS. 13A and 13B, a comparison is made between a relational database system and a hybrid relational-OLAP database system. In particular, FIG. 13A illustrates a pair of a first relational database 301 and a second relational database 302 that contains a duplicate set of data. As described above, based upon a new set of transactions, the data in the first database 301 is first updated. Subsequently or simultaneously, the newly updated data is incorporated into the second database 302. Since the first database 301 and the second database 302 share the same format, the data in the two databases is substantially identical. Because of the identical data, when an inquiry or an access is made to the second database 302, the format for the inquiry or the access is also the same as that of the first database 301. On the other hand, referring to FIG. 13B, a first relational database 301 is functionally paired to an OLAP database 701 along with an information management unit 9000. Based upon a new set of transactions, the data in the first database 301 is first updated. Subsequently or simultaneously, the newly updated data is incorporated into the OLAP database 701. To initially incorporate data in the OLAP database from the relational database, all the history data is downloaded from the relational database, and the OLAP database generates the initial data based upon the down loaded data. To later update or reflect the new data in the relational data on the OLAP database, the OLAP database incorporates a portion of newly added history data. In other words, the newly added portion is reflected upon the OLAP database. These two types of the database service must be provided without any interruption to the database access. Still referring to FIG. 13B, in the relational-OLAP database system, when an inquiry is made to the OLAP database 701, a combination of the data in the OLAP database 701 and the data in the information management unit 9000 is used to respond to the inquiry. The information management unit 9000 contains data regarding an update period, time, geographical classifications, and client classifications. While the relational database manages data on facts, the OLAP system yields a trend analysis from the relational database with the help of a particular set of business requirements that is stored in the information management unit 9000. For example, for customer analyses, it is necessary to analyze the sales data in various manners such as chronological analyses including time and frequency, regional sales amounts for sales items. For the chronological analyses, in order to analyze the sales data, a unit of time comparison such as a year, a quarter, a week and a day, the data 701 in the OLAP system needs to be organized from the fact data 301 in the relational database according to the above time analysis units. In other words, a time-related OLAP inquiry is correctly responded only when the information management unit 9000 and the OLAP database are maintained for the requested time units. The information management unit 9000 maintains data on the updated status of the OLAP database as well as data on the organization of the OLAP database. As a result, in response to a summary inquiry is made to the relational database 301, the OLAP system and the information management unit 9000 output an appropriate summary/trend analysis results. For example, a typical sales record database access is speculated to be 7.7 million times per day based upon the actual sales retail data for a major retail chain. Each sales record ranges from 200 bytes to 1 K bytes. The amount of a monthly sales database ranges from 40 Giga bytes to 200 Giga bytes while that of an annual sales database ranges from 2 Tetra bytes to 4 Tetra bytes. The above sales databases form a star schema and are used for generating statistical summaries based upon SQL inquiries. Furthermore, in order to analyze the sales data in diversified manners, an OLAP system is used. After 18 to 24 months of sales data are built, it becomes necessary to analyze the sales data for each of the sales territories, a comparison of sales data of the same item among the months and a predetermined length of periods. For some cases, since 80% of the data access is concentrated only on three months, it is necessary to increase the resolution in understanding these data. Now referring to FIG. 14, a flow chart illustrates steps involved in a preferred process of generating a response to an OLAP inquiry according to the current invention. The OLAP system as described above with respect to FIG. 13B will perform the following steps. In response to an OLAP inquiry in a step 900, the OLAP system initially analyzes the inquiry itself to determine which a portion of the database 701 will be accessed in a step 901. In a step 902, the OLAP system then determines the update status of the OLAP database and generates a portion of a response to the inquiry. Lastly, the OLAP system accesses the above determined relevant portion of the database 701 and merges the accessed data with the already generated partial response before returning the merged response in step 903. The preferred process then terminates. Referring to FIG. 15, exemplary content of the system log is illustrated in a table. The exemplary system log includes information on a log sequence number, a time stamp, a log type and log information. The log sequence number is a unique number to be used in the relational database. The time stamp records time of a logged event. The log types include a start event (event_start) of the data extraction and incorporation process; a suspension event (event_suspend) suspending the extraction or incorporation process; a restart of the start event (event_restart) restarting the suspension event; a data commit event (event_commit) committing the data for a process; a data update event (event_update) updating data; and a data extraction reflect event (event_reflect) reflecting or incorporating the data. The log information records information on a history of updates of the database. Based upon the above described system log, the multiple database management unit extracts data from the relational database and updates the OLAP database based upon the extracted data. The above described current invention is implemented in many forms. One implementation is a large-scale computer network while another implementation is a group of small-scale distributed processors. A single processor is also optionally used for a distributed system. It is also optional to share a plurality of disk storage units among the processors. Similarly, the software that implements the above described process such as illustrate in FIG. 2 is stored in a floppy disk and a hard disk that are accessed through a network. As mentioned above, it is necessary to provide the database services without interrupting the access to relational or OLAP systems. In order to provide efficient non-stop database access, the following two issues are considered. A first issue is that since the relational and OLAP databases access an overlapping memory area, the relational database system generally has a lowered performance level. A second issue is that approximately 80% of OLAP access is generally focused on recent data or a small portion of the data. To alleviate the above issues, one solution is to have physically separate areas in a database by disk special features or OS double-writing feature since the most access areas in the database are either added some new data or read for updating the corresponding database in the OLAP server. These database areas are synchronized for writing and physically isolated. For example, the physically isolated areas include a main storage volume for adding data and reading as well as a sub storage volume for rolling forward or backward to a specified time. A second solution is to use a system log that is originally designed to provide a database recovery mechanism. Since the system log records every database transaction, the transaction information is isolated to a particular time point. In order to incorporate the data in the OLAP database, desired data is generated based upon an isolated portion of the system log data. A third solution is to localize an area for incorporating new data in the OLAP server. By multiple layers of divisions, the recent data that is 80% of access is further divided in order to increase row effect. The chronologically concentrated data is thus further divided by sorting methods such as a Hashing algorithm and a key value to lower the access load. In is necessary to make the relational database available to the OLAP system, and the relational database has been adjusted to a specified time prior to the availability. This is because when data from the relational database is incorporated into the OLAP database, it is necessary to analyze the OLAP database that has been adjusted to a specified time. In order to accommodate the above issue, the following solutions are provided. A first solution is to manage the relational database in chronologically divided units. Because of the chronological division, the range of access for adding new data is limited to a narrow area. In the relational database side, by keeping track of how far the update in the OLAP system has been accomplished, a next range of access is easily determined. A second solution is to update the OLAP database based upon a system log in which information on updating the OLAP database is kept. The system log information includes the history data of incorporating data from the relational database into the OLAP database in a table format, and a certain entry is selected in the table to use during the data incorporation. In combination with the above described SAN solution, servers for processing a database and storage units for storing the databases are connected via a high-speed network such as optical fibers. By modifying mapping information between the storage units and the servers, the servers and the storage units are easily and flexibly operated. Furthermore, since each storage unit employs the RAID structure, the redundancy is secured for the database operation and the necessary storage areas are used as flexible devices. The storage access is not interrupted without stopping the normal operation when the database system is reorganized or restructured. One concrete example of the database reorganization is to add a server and or a disk unit in order to enlarge the database system, and this example also shows that the system is scalable. It is to be understood, however, that even though numerous characteristics and advantages of the present invention have been set forth in the foregoing description, together with details of the structure and function of the invention, the disclosure is illustrative only, and that although changes may be made in detail, especially in matters of shape, size and arrangement of parts, as well as implementation in software, hardware, or a combination of both, the changes are within the principles of the invention to the full extent indicated by the broad general meaning of the terms in which the appended claims are expressed.
|
Same subclass Same class Consider this |
||||||||||
