Method and apparatus for fast and comprehensive DBMS analysis6584474Abstract A fast and comprehensive analysis of a database table is peformed by reading a header block describing the location of blocks storing data information of a database table. The data blocks of the database table are read using direct asynchronous IO into memory. Data read in from the data blocks is analyzed to determine information regarding the health or condition of the database table. The analysis is performed by spawning separate processes, each process being assigned an approximately equivalent number of data blocks to analyze. Once each process completes analysis of the assigned data blocks, the analysis by each thread is summarized and averaged to provide information to the DBA relating to the condition of the database table. The information gathered regarding the database table is more comprehensive than that provided by currently available database management systems and tools. Claims What is claimed is: Description BACKGROUND OF THE INVENTION
TABLE 1
Comparison of Fast Analysis and Oracle Information Gathered
Fast
Name Analysis Oracle Description
ROW COUNT X X No. of Rows
NORMAL ROW X No. of Simple Rows
COUNT
MIGRATE ROW X No. of Migrated Rows
COUNT
CHAINED ROWS X X No. of Chained Rows
DELETED ROWS X No. of Deleted Rows
BLOCK COUNT X X No. of Blocks allocated
for table
USED BLOCK X X No. of Blocks used by a
COUNT table
EMPTY BLOCK X No. of Blocks initialized
COUNT but empty
FREE BLOCK X X No. of Blocks allocated
COUNT but free
BLOCKS ON X X No. of Blocks on the
FREELIST freelist
AVG SPACE IN X X Average available space in
FREELIST BLKS freelist blocks
AVG FREE SPACE X X Average free space [across
blocks in a table] in a
block
AVG UNUSED FREE X Average free space in a
SPACE block never used
AVG RELEASED X Average free released free
FREE SPC space in box
ROW LENGTH X X Min, Max, Average Row
length
AVG MIGRATED X Average row length of
ROW LEN chained rows
BLOCK READS X No. of Blocks to be read
for normal, chained and
migrated rows.
BLOCK FILL X Measure of Density of the
PERCENTAGE blocks used
BLKS MORE THAN X No. of Blks exceeded
PCTFREE PCTFREE
BLOCKS LESS X No. of Blks less than
THAN PCTUSED PCTUSED
FILESPAN COUNT X No. of chained rows
spanning datafiles
BLOCKSPAN X min, max average of
OFFSET chained rows spanning
blocks
AVG BLOCK X Avg. Block header size
HEADER
AVG BLOCK DATA X Avg. space available in a
block
MIGRATED AND X No. of rows migrated and
CHAINED CNT chained
MIGRATED OR X No. of rows either
CHAINED CNT migrated or chained
AVG ROW DIR X Reusable offsets to row
ENTRIES data
TOTAL ROW DIR X No. of row directory
ENTRIES entries
AVG NORMAL X Avg. length of a normal
ROW LEN row
AVG CHAINED X Avg. length of migrated
ROW LEN and chained rows
AVG MIGRATED X Avg. length of migrated
AND CHAINED and chained rows
ROW LEN
HOME ROW READS X No of initial blocks to
read all rows of a table
MIGRATED ROW X No of additional blocks to
READS read for migrated rows
CHAINED ROW X No of additional blocks to
READS read for chained rows
In order to determine the information in Table 1, the structure of the database tables is first established. For example, in an Oracle database, table structure is based on an Oracle block which is defined as a unit of data storage (e.g., an Oracle block storing rows of data for a table is an Oracle data block). Block size in bytes is defined by an instance parameter by the DBA at the time of configuration and different types of Oracle blocks store different types of information. As the tables are built, groups of blocks, known as extents, are allocated. Every Oracle table has an initial allocation of blocks which constitutes the first extent for an Oracle table. Subsequent extents (contiguous sets of blocks) are allocated to accommodate growth of the table and are allocated based on a next extent parameter specified at table creation. Every table has an extent segment header which maintains an extent list in the form of file number (identifying a file), block number (identifying an offset within the file), and length for each extent of the table. In Oracle, the extent segment header is the starting block for a given table. The extent list and a free list is stored in this block. The address of this block (extent segment header) is available as part of information stored in a table which is part of a system dictionary (Oracle system dictionary, for example). In Oracle, a unique block address is a combination of a file number and a block number. The file number is a unique number that is assigned to each data file managed by the DBMS. A block number is relative to a file and relates to an offset in the file. Therefore, a file offset can be calculated given the block number and the block size for the instance. Although the above structural elements (blocks, extents, extent segment header, etc.) are described in terms of an Oracle database and tables, the same structure applies to other databases and tables as well (Sybase and Informix, for example). In addition, other database structures can also be appropriately described utilizing the same or related terminology. For example, although a block is a typical unit data storage, other database implementations may be based on pages. In this case, an extent would be a group of pages rather than blocks, and the extent itself may be described using other terminology. Therefore, an extent can be generically described, as a group of contiguous pages, blocks, or any unit of data storage. In addition, the extent segment header or extent segment header block can be generically defined as any file or other storage where header information about a table can be found, i.e., any location or locations where various extents and their respective sizes used in building one or more database tables may be considered an extent segment header. Consistent with the above discussion, it is important to note that although Fast Analyzer as described herein provides information described in terms of specific parameters consistent with the structure of an Oracle database (blocks, for example), these terms are applicable in a generic sense to all database applications and therefore Fast Analyzer should not be limited to any specific database implementation. The following is an element by element explanation of the table information gathered by Fast Analyzer. Each piece of information (element) is defined along with a clarifying explanation including implications where applicable. Information gathered by Fast Analyzer are typically stored in a repository, such as the DAO repository, for example, as defined hereinbelow. The information gathered by Fast Analyzer is divided into the following categories: block level information, block space usage information, row level information, row level space information, and information to measure block and file I/O. BLOCK LEVEL INFORMATION 1. Used_Block_Cnt Total number of Blocks that have one or more rows for a given Oracle table. This may be less than the number of blocks allocated for the table. 2. Empty_Block_Cnt Total number of Blocks that are included in the extent and block high water mark, but have no rows in them. The blocks are initialized for use and are potentially the next candidates to be filled when new rows are inserted for the given table. The extent and block high water indicators are stored in the segment header block of an Oracle table to indicate the current allocation usage of an Oracle table. Sum of USED_BLOCK_CNT and EMPTY_BLOCK_CNT is the value for BLOCKS in the DBA_TABLES view. 3. Free_Block_Cnt Total number of Blocks which are part of the allocation for the table but have never been initialized. Same as the Oracle's EMPTY_BLOCKS in the DBA_TABLES view. 4. Emb_Cnt Total number of blocks used by the segment header to hold an unlimited extent list. When a table is created with the unlimited extents option, it is possible that the list of extents which are stored in the segment header exceed the space available in a block, the extent segment header entries points to additional blocks which contain extent list and or freelist information. 5. Freelist_Block_Cnt Total number of blocks on the freelist. Oracle maintains a list of blocks available for new row inserts. If an Oracle block is filled less than PCT_FREE this block is on the freelist. If rows in an Oracle block get deleted after reaching the PCT_FREE limit and the block usage falls below the PCT_USED limit the block is put on the freelist. BLOCK SPACE USAGE INFORMATION 1. Avg_Released_Free_Space Released free space in a block is defined as the space previously used and now available due to row deletions or migrations. Released free space need not be contiguous in the block. Average released free space is the average of the released free space in bytes for the blocks of the table. 2. Avg_Unused_Free_Space Unused free space is defined as the free space in the block not yet used by the block for any rows. Unused free space is contiguous, given the fact Oracle stores rows bottom up, it is the space between the row directory and the last row inserted. Average unused free space is the average of the unused free space in bytes of the blocks for the given table. 3. Avg_Free_Space Free space is the sum of the unused free space and the released free space for a block. Average free space in bytes, is the average of the free space available in the blocks for the given table. This statistic is the same as the AVGSPC statistic in Oracle 7.x and AVG_SPACE statistic in Oracle 8.x found in the DBA_TABLES view. 4. Avg_Space_Freelist_Blocks Average Free space in bytes found in Oracle blocks that are on the freelist. This statistic is the same as the AVG_SPACE_FREELIST_BLOCKS in Oracle 8.x found in the DBA_TABLES view. This information is not reported by Oracle in versions prior to 8.x. 5. Used_Block_Fill_Percentage This statistic is a measure of the block fill for blocks that have one or more rows in them. The percentage is measure of the amount of space used for data in a block versus the amount of space available for data in a block, not the block size. 6. Blocks_Fill_More_Pctfree The number of blocks which are found to be filled more than the PCT_FREE limit assigned to the table. This occurs when rows in the block are updated and get migrated within the block. 7. Block_Fill_Less_Pctused The number of blocks which are found to be empty less than the PCT_USED limit assigned to the table. This occurs when rows in the block are either deleted or migrated and the space occupied by rows falls below the PCT_USED limit. 8. Avg_Block_Header_Bytes Each data block contains a block header which contains the table directory, row directory and transaction information storage areas. The AVG_BLOCK_HEADER_BYTES is the average space used in bytes to store block header information for the Oracle blocks of a given table. 9. Avg_Block_Data_Bytes The space available for row data storage in a block. The Average space available in bytes for row data storage in the Oracle blocks of a given table. ROW LEVEL INFORMATION 1. Row_Cnt The total number of rows found in the blocks for the given table. This statistic is the same as the NUM_ROWS statistic in Oracle, found in the DBA_TABLES view. 2. Normal_Row_Cnt The total number of rows that are neither migrated nor chained and are found as one contiguous piece in an Oracle block. 3. Migrate_Row_Cnt The total number of rows that have been found to be migrated for a given table. Migration of a row normally occurs when a row is being updated cannot be updated in place, hence Oracle stores a forwarding pointer and relocates the entire row in the same or new block. 4. Chain_Row_Cnt The total number of rows that are chained for a given table. A row is defined to be chained when the entire row does not fit into an Oracle block and hence is stored as row pieces in separate blocks, each row piece has forward and backward pointers linking the row pieces together. 5. Migrate_And_Chain_Row_Cnt The total number of rows that are migrated and chained as per the previous definitions 6. Chained_Or_Migrated The sum of all rows that are either chained or migrated or both. Oracle reports this statistic as CHAIN_CNT in the DBA_TABLES view. 7. Delete_Row_Cnt The total number of rows that are marked as being deleted in the blocks for a given table. A row when deleted is just marked as being deleted in the row directory of an Oracle block and the space occupied is not reclaimed. The DELETE_ROW_CNT is a count of such entries. 8. Avg_Row_Dir_Entries Row directory entries are entries in an Oracle block that contain offsets that point to row data. These entries are created as new rows are inserted but never deleted. These entries may be reused. 9. Total_Row_Dir_Entries The Total number of row directory entries found in all the Oracle blocks for a given table. This number may be more than the number of rows if deletions and updates occur on the given table. ROW LEVEL SPACE INFORMATION 1. Avg_Row_Len, Min_Row_Len, Max_Row_Len The average, minimum and maximum row length in bytes found in the Oracle Blocks for a given table. The AVG_ROW_LEN statistic is the same as AVG_ROW_LEN Oracle reported statistic in the DBA_TABLES view for a given table. 2. Avg_Normal_Row_Bytes The average length in bytes for a normal rows of a table, see normal row definition. 3. Avg_Migrated_Row_Bytes The average length in bytes for a migrated rows of a table, see migrated row definition. 4. Avg_Chained_Row_Bytes The average length in bytes for a chained rows of a table, see chained row definition. 5. Avg_Mig_And_Chained_Row_Bytes The average length in bytes for migrated and chained rows of a table, see migrated and chained row definition. INFORMATION TO MEASURE BLOCK AND FILE I/O 1. Min, Avg, Max Chain_Blockspan_Offset Chained rows span blocks, in an Oracle data file block numbers are relative to the start of the data file. The blockspan offset is a measure of, the distance between blocks containing chained row pieces. The minimum, average and maximum is collected for a given Oracle table that has chained rows. 2. Chained_Rows_That_Span_Files The total number of chained rows that have row pieces in more than one Oracle datafile. 3. Home_Row_Reads The total number of initial blocks to be read if all the rows of the table were requested for. If the table did not have any migrated or chained rows then the total number of blocks to read to access all rows would be equal to the HOME_ROW_READS. 4. Migrated_Row_Reads The total number of additional blocks to be read if all migrated rows of the table were requested for. A row can be migrated more than once and it is possible to have more than two block i/o's for one row. 5. Chained_Row_Reads The total number of additional blocks to be read if all chained rows of the table were requested for. A row can be chained over more than one block and it is possible to have more than a two block i/o for one row. The above described information provides the DBA with a comprehensive set of data for determining the condition of a database table. Each element is determined by reading the blocks of a database table, examining the contents of the blocks, and deriving the information needed to determine the element. Proper examination of the contents of a block requires knowledge regarding the structure of the block. Block structure may be obtained either from a specification describing the structure or by investigation of a block already created. For example, FIG. 3 illustrates the structure of an Oracle data block. Each data block of an Oracle table is made up of a block header 90, block transaction information area 92, and a table and row directory 94 containing pointers to row data 96. The row data itself is filled bottom up. The above described elements are divided into block level information, block space usage information, row level information, row level space information, and information to measure block and file I/O, each present in some form or another in the block itself. For example, block space usage information is present in the block header 90. The number of rows, number of row directory entries, and deleted rows for the block can be found in the table and row directory area 94. The table and row directory provides pointers to find actual row data. Row types to determine migrated, chained, and normal rows can be decoded from a row type that is stored within each row. Each row is looked at to compute row size which provide the row level space information. Within each row is provided pointers if the row is incomplete (i.e. is migrated or chained). Following these rows is necessary to determine actual row size information and related elements. Block level information is typically gathered from the extent segment header for the table and information provided in the block header itself. Most of the information are computed in various ways to enable future modeling. Therefore, FASTANAL must collect the following for most metrics: Minimum Maximum Average Standard Deviation (Optional) Minimum and Maximum are determined for each piece of information (metric). The easiest way to compute the Average is to keep a running total of the sum of the metric, as well as a count of the number of data-points collected. Then, use the formula below for the standard mean (Average): A=X/n where: X=data value n=number of points in each series Standard Deviation is then computed with a second pass by comparing the deviation of each data value with the Average computed above. Since Standard Deviation requires a second pass, it must be specifically requested. .sigma.=v((X-A).sup.2)n The present inventor has also realized that, in addition to extending the analysis performed on database tables, there is a need to improve the speed of the analysis. Fast Analysis provides an extended information set (metrics), and improves the speed at which the metrics are retrieved and processed. Conventional database management systems utilize SQL or related commands to provide information for a limited analysis of tables in the database. However, this process is slow because of the overhead costs incurred by the database management system (particularly relational database management systems). For example, the Oracle Analyze command utilizes the SQL call UPDATE_STATISICS_FOR TABLE, which invokes all the overhead associated with the Oracle DBMS. Therefore, conventional DBMS analysis are restricted to a few items of information and are relatively slow. The Fast Analysis process is significantly faster than the Oracle analyze command. The speed enhancements are achieved by using direct asynchronous I/O, performing block read aheads when reading the database data files and by parallelizing the processing and analysis of data contained in the files. By using direct asynchronous I/O to read database files, Fast Analysis is able to directly access specific blocks of a file, thereby bypassing the overhead of typical DBMS routines (SQL shared memory techniques, and other parameters required to access the database tables, for example). The additional overhead required of typical DBMS calls occurs because SQL and other shared retrieved techniques form a generalized language implemented to provide flexibility in retrieval of database information. In contrast, the Fast Analysis is more singular in purpose and directly retrieves the information needed. Fast Analysis operates on the following assumptions: 1. The Fast Analysis process has DBA privileges or read permissions to necessary system files (system dictionary, for example). 2. The Fast Analysis process has read permissions to the data files of the DBMS and particularly any specific tables to be analyzed. Considering the DBMS Block Structure as discussed above (blocks or pages, extents, and extent header or equivalent structure), the Fast Analysis process is described. Referring to FIG. 4, a user (database operator, DBA, or other mechanism inquiring as to the condition of a database table) first identifies the table of which the condition is to be determined. The table is normally identified by providing the table name (Step 100), and includes determining the location of the extent header segment block for the given table (System Dictionary, for example). The location of the extent segment header block can normally be found in another table that specifies the location of the header block. At step 105, the extent segment header block is read and an extent list is created. The extent list includes identifying information for each set of blocks allocated for the table. For example, in one embodiment of the present invention, the extent list includes a file number, block number, and length for each extent in the list. Other types of identifying information such as pages, location or size of the data to be processed, depending on the structure of the data base files, is also appropriate for construction of the extent list. In a more basic description, the extent list need only identify the location(s) of all space allocated for the identified table. At step 110, the Fast Analysis process determines a number of threads to spawn. The number of threads to spawn can be based upon the number of blocks, or size of the table to be analyzed. This includes reading the extent list to determine how many blocks need to be processed. Physical limitations on the number of useful threads (for example, the number of separate processors available for spawned processes) may be considered. In addition, a specific physical condition or an optimal number of blocks that may be processed in a single thread (100 megabytes of data or 10 blocks, for example) may limit or guide determination of the number of threads to spawn. Included in step 110 is the creation of one or more sub-lists from the list created at step 105. Each sub-list is preferably created using either an equal number of blocks or proportionately as close to equal as possible to maximize efficiency of the separate parallel processes (threads) to which the lists will be assigned. If necessary, an item may also be split from the original list, and unequal lists of blocks may be utilized. For example, if 1001 blocks are to be processed, sublists containing 500/501, 300/300/401, and 50/800/151 may be spawned, the latter combinations only having less efficiency assuming equal processing power in each thread. Each sublist contains the following information for each item in the list: File number, starting block, and number of blocks to process from the starting block number. Again, similar pertinent information may be described in terms of page count, bytes, or other units of data storage and not depart from the scope of the present invention. Once the threads are spawned, each thread is assigned a number of blocks to process (step 115). Preferably, the assigned blocks are contained in one of the sublists created in step 110. Alternately, the sublists may be created upon (or contemporaneously with) assignment. Other variations regarding the order or responsibility of each step may be made, again, without departing from the scope or intent of the present invention. At step 120, the assigned blocks are read into memory or other storage location using direct asynchronous I/O. As discussed above, the use of direct asynchronous I/O provides direct and immediate access to the data stored in the database table. At step 125, the blocks read are examined to determine the contents of each block. In this manner, block and row statistics are gathered from the blocks read. Once all threads have finished processing their respective assigned set of blocks, summation and averaging of information gathered from each thread is performed (Step 130). At Step 135, the gathered information is either displayed or utilized to update a condition table or provided to another mechanism for using the information gathered. The process may be repeated for any number of tables that may be contained in the data base. Thus, Fast Analysis provides comprehensive information regarding the condition of database tables and increases the speed at which database information is retrieved and processed. Table 2 illustrates a DDL definition of a DAO repository that may be utilized to store Fast Analysis information extracted from the database tables.
TABLE 2
DDL Definition of DAO Repository
/
CREATE TABLE DAO.COMMON_DICT
OBJECT_ID NUMBER
CONSTRAINT TABLE_DICT_ID
PRIMARY KEY
USING INDEX TABLESPACE DA02,
COLLECT ID NUMBER
CONSTRAINT TABLE_DICT_COLLECT_ID
REFERENCES DAO.COLLECT,
PCT FREE NUMBER,
PCT USED NUMBER,
CACHE NUMBER(1),
TABLE LOCK VARCHAR2(8),
NESTED NUMBER(1),
NUM ROWS NUMBER,
BLOCKS NUMBER,
EMPTY BLOCKS NUMBER,
AVG SPACE NUMBER,
CHAIN CNT NUMBER,
AVG ROW LEN NUMBER,
AVG SPACE FREELIST BLOCKS NUMBER,
NUM FREELIST BLOCKS NUMBER,
SAMPLE_SIZE NUMBER,
IOT NAME VARCHAR2(30),
IOT TYPE NUMBER(1),
TABLE TYPE OWNER VARCHAR2(30),
TABLE TYPE VARCHAR2(30),
PACKED NUMBER(1),
MIN EXTENT SIZE NUMBER,
MAX EXTENT SIZE NUMBER,
AVG EXTENT SIZE NUMBER,
STD EXTENT_SIZE NUMBER
/
CREATE TABLE DAO.TABLE FAST_ANAL
TABLE_ID NUMBER
CONSTRAINT TABLE_FAST_ANAL_ID
PRIMARY KEY
USING INDEX TABLESPACE DAO2,
TRUE CHAIN CNT NUMBER,
TRUE MIGRATE CNT NUMBER,
FREE LIST_LENGTH NUMBER,
EMB CNT NUMBER,
SEGMENT _ HEADER _ HIGH_WATER NUMBER,
BLOCK IiEADER SIZE NUMBER,
BLOCK DATA BYTES NUMBER,
MIN_N7ORMAL_ROWS_BYTES NUMBER,
MAX _ NORMAL_ROWS_BYTES NUMBER,
AVG _ NORMAL_ROWS_BYTES NUMBER,
STI) NORMAL ROWS BYTES NUMBER,
MIN DELETE ROWS BYTES NUMBER,
MAX_DELETE ROWS BYTES NUMBER,
AVG DELETE ROWS BYTES NUMBER,
STD DELETE_ROWS_BYTES NUMBER,
MIN MIGRATED ROWS BYTES NUMBER,
MAX MIGRATED ROWS_BYTES NUMBER,
AVG MIGRATED ROWS BYTES NUMBER,
STD _ MIGRATED ROWS_BYTES NUMBER,
MIN CHAINED ROW PIECE BYTES NUMBER,
AVG CHAINED ROW PIECE BYTES NUMBER,
STD CHAINED ROW PIECE BYTES NUMBER,
MIN NEW ROW PIECE BYTES NUMBER,
MAX NEW ROW PIECE BYTES NUMBER,
AVG_NEW_ROW_PIECE_BYTES NUMBER,
REFERENCES DAO.COLLECT
/
Table 2 is not intended to be a working copy, but merely an example repository structure. Fast Analysis may be conveniently implemented using a conventional general purpose or a specialized digital computer or microprocessor programmed according to the teachings of the present specification, as will be apparent to those skilled in the computer art. Appropriate software coding can readily be prepared by skilled programmers based on the teachings of the present disclosure, as will be apparent to those skilled in the software art. The invention may also be implemented by the preparation of application specific integrated circuits or by interconnecting an appropriate network of conventional component circuits, as will be readily apparent to those skilled in the art. The present invention includes a computer program product which is a storage medium (media) having instructions stored thereon/in which can be used to program a computer to perform any of the processes of the present invention. The storage medium can include, but is not limited to, any type of disk including floppy disks, optical discs, DVD, CD-ROMs, and magneto-optical disks, ROMS, RAMs, EPROMS, EEPROMS, magnetic or optical cards, or any type of media suitable for storing electronic instructions. Stored on any one of the computer readable medium (media), the present invention includes software for controlling both the hardware of a general purpose or specialized computer and for enabling the computer to interact with a human user or other mechanism utilizing the product of the present invention. Such software may include, but is not limited to, device drivers, operating systems, and user applications. Ultimately, such computer readable media further includes software for determining the condition of a database or for specific database tables, as described above. Included in the programming (software) of the general purpose or specialized computer are software modules for implementing the teachings of the present invention, including, but not limited to, identification and retrieval of database table structures, reading database files and tables, performing analysis of data retrieved from database tables and other related information, and the display, storage, or communication of results as determined by the present invention. Obviously, numerous modifications and variations of the present invention are possible in light of the above teachings. It is therefore to be understood that within the scope of the appended claims, the invention may be practiced otherwise than as specifically described herein.
|
Same subclass Same class Consider this |
||||||||||
