System for accessing cobol data files by generating a dictionary of NF.sup. 25432930Abstract A system and method enabling use of a COBOL compatible structured query language (CCSQL) to manipulate data stored in a COBOL data file, which is in a nonfirst normal, hierarchical format. Initially, a dictionary generator subsystem creates a corresponding dictionary for each COBOL data file. In the dictionary, the NF.sup.2 hierarchical logical relationship between the items and attributes (hierarchical nature) of each item of a COBOL data file are defined. Thereafter, whenever a new COBOL data file is produced, a corresponding dictionary is generated. When a user enters a request in the CCSQL format to query a COBOL data file, the request is parsed. The definitions in the dictionary are used by the CCSQL kernel to manipulate the data in response to the user's query. A command interpreter processes the request to determine the appropriate data manipulation to apply and initiates a CCSQL kernel request that carries out the intended action, with reference to the corresponding dictionary. For example, the request may require that certain items from a COBOL data file be extracted to produce a report. Because the hierarchical logical relationships between the items in the COBOL data are defined in the corresponding dictionary, a CCSQL report generator subsystem can produce the report using data extracted from the COBOL data file in accordance with criteria provided by the user, but without converting the data in the COBOL data files to a 1NF. COBOL programs with embedded CCSQL commands are preprocessed to convert the CCSQL commands to a COBOL compatible structured language that uses the dictionaries to access data referenced in the COBOL data files. Claims The embodiments of the invention in which an exclusive property or privilege is claimed are defined as follows: Description FIELD OF THE INVENTION
______________________________________
typedef struct DIC-REC {
char fname [.sub.-- FSIZE];
/* field name
*/
short level;
/* level of the field in FILE
* section of COBOL source code
*/
char pic[.sub.-- PICSIZE];
/* picture clause which describes
* the features of this field
*/
int length; /* length of the field
*/
.... /* auxiliary information for
* query processing
*/
} DIC.sub.-- REC;
______________________________________
Another data structure referred to as "TREE.sub.-- SPE" is used to represent the relation of the fields in a dictionary, thereby more efficiently referencing the fields. When a dictionary is opened, the definition of each field is read into an array of the DIC.sub.-- REC structure. The "dic" field in TREE.sub.-- SPE identifies the subscript of this field in the DIC.sub.-- REC array. The type definition structure for TREE.sub.-- SPE is as follows:
______________________________________
typedef struct TREE.sub.-- SPE {
int dic; /* index in DIC.sub.-- REC array which
* stores detailed information
* of each field
*/
struct TREE.sub.-- SPE*brth;
/* next field subordinate to
* the same group item
*/
/* if the present node is group item,
struct TREE.sub.-- SPE*son;
* it points to items subordinate to
*this group item; otherwise it is
*null.
*/
.... /* auxiliary information for
* query processing
*/
}TREE.sub.-- SPE;
______________________________________
For an example of a relation that may help to clarify how a dictionary is configured, refer to FIG. 8, wherein a graphical relationship for a table STAFF 192 is disclosed by a tree structure 190. In this example, the items: NAME 194, DEPARTMENT 196, CHILDREN'S NAME (CNAME) 202, AGE 204, FAMILY NAME (FNAME) 208, and PHONE NUMBER 210 are elementary items, while CHILDREN 198, COURSE 206, and FRIEND 200 are group items. Similarly, ID 212 and GRADE 214 are elementary items, because they represent subdivisions of a record that cannot be subdivided. In contrast, the group items listed above comprise a named sequence of elementary items or group items. It should be evident that in tree 190, each of the leaf nodes identify elementary items, while other nodes are associated with group items. A STAFF record could also be represented by the following outline, which illustrates the hierarchical format in which the items comprising the record are stored as COBOL data:
______________________________________
01 STAFF
02 NAME PIC X(20)
02 DEPARTMENT X(10)
02 CHILDREN OCCURS 3 TIMES
03 CNAME X(20)
03 AGE PIC N(2)
03 COURSE OCCURS 3 TIMES
04 ID PIC
04 GRADE X(2)
02 FRIEND OCCURS 2 TIMES
03 FNAME x(20)
03 PHONE.sub.-- NO N(7)
______________________________________
In the above outline, the relative level of an item to the other items is indicated by the prefixes 01-04 assigned to it. The above representation of a COBOL record is described by a corresponding dictionary based on the type definition structure TREE.sub.-- SPE. The form of the dictionary definition is easier to understand in connection with the representation of FIG. 8. The dictionary is developed as shown in FIG. 9 by a graphical description 200 of the fields in the record STAFF. Identical reference numbers with primes added are used to show the correspondence between the graphical representation of the COBOL data in FIG. 8 and the dictionary fields of FIG. 9, which comprise the dictionary. A dictionary field 194'0 represents NAME 194, which as shown in FIG. 8, has no "son" relationship with other data items. Similarly, a dictionary field 196' corresponding to DEPARTMENT 196 also has no son. However, both dictionary fields 198' and 200', corresponding to CHILDREN 198 and FRIEND 200, respectively, both have sons. Dictionary field 198' has a son, comprising a dictionary field 202'. Dictionary field 202', which corresponds to CHILD'S NAME 202, has no son, nor does a dictionary field 204', corresponding to AGE 204. However, a dictionary field 206', corresponding to COURSE 206, has two sons, including a dictionary field 212' and a dictionary field 214', corresponding respectively to ID 212 and GRADE 214. Neither of these latter two dictionary fields have sons. Dictionary field 200' includes two sons, corresponding to dictionary fields 208' and 210', representing FIRST NAME 208 and PHONE NUMBER 210, respectively. These dictionary fields define the attributes of each item in the COBOL hierarchical data, indicating the relationship of that item to all others so that CCSQL can handle the data items in a logical manner. The DG thus represents a bridge between the COBOL data files and CCSQL. As indicated in a block diagram 150, shown in FIG. 6, CCSQL can access and even modify the COBOL data file after the corresponding dictionary for that data file is created, yet COBOL applications can continue to access the same data file in their normal manner because the format of the COBOL data file has not been changed. As a result, both the CCSQL applications and the COBOL applications can readily access and modify the same COBOL data file without requiring any conversion in data format. Details of the logic implemented by the DG are illustrated in a flow chart 170 in FIG. 7, beginning at a start block 172. In a block 174, the logic initiates a search for a file section in the COBOL source program that is to be scanned in order to generate the corresponding dictionary. In a block 176, the first or 01 level item description is obtained from the record. In the example presented, the level 01 record would correspond to STAFF 192. Then, in a block 178, a dictionary field like that represented in FIG. 9 is created for each COBOL data item. The dictionary containing these fields is then added into the VOC file(s) in a block 180. In a decision block 182, the logic determines if there is any additional items at the 01 level remaining to be processed and if so, returns to block 178 to create a dictionary field for the remaining COBOL data items at that level. Otherwise, the logic completes its processing in end block 184. By extracting each COBOL data item from its non-first normal relational structure and defining it in a dictionary field using the method described above, the CCSQL can access the item with knowledge of its relationship to other items. The CCSQL query subsystem derives a resultant table (i.e., a formatted representation) by selecting specific items from one or more COBOL data files, subject to specified criteria. In order to facilitate this process, the conventional syntax of SQL has been extended to include some additional operations such as UNNEST, which is used to "flatten" non-first normal format records, have been introduced to manipulate COBOL data files. When a query statement is input by the user, it is parsed to define a SQL SFW.sub.-- EXPRESSION (a query statement that contains Select, From, and Where clauses). This operation involves applying the UNNEST command to the query statement with reference to the items appearing in a Select clause. The SFW.sub.-- EXPRESSION is evaluated by selecting items from one or more COBOL data files meeting any Where conditions that are provided, and these items are grouped to form a class in which all items meet the specified condition. Once a CCSQL statement has been parsed, useful information derived from it is placed in a data structure referred to as SQL.sub.-- FRAME, which is defined as follows:
______________________________________
typedef struct SQL.sub.-- FRAME {
int type; /* QUERY normal query
* UNNEST query with
unnest * clause
*/
union {
struct QUERY /* query;
struct UNNEST * unnest;
struct SETOP * setop;
} union;
} SQL-FRAME
______________________________________
In the definition of SQL.sub.-- FRAME, three additional structures are referenced including QUERY, UNNEST, and SETOP. These three structures are defined as follows:
__________________________________________________________________________
typedef struct QUERY {
struct COLUMN
*column;
/* field or expression fist.
* each node in list represents
* a selected item, which could
be
* a field name or expression
*/
struct TABLE
*table; /* table fist.
* each node in fist could be
* a table name or a table with
* UNNEST operation on it
*/
struct CSTACK
*where; /* where condition stack.
* conditions are interrelated
* by logic operators.
* each item in condition stack
* is a relation expression
* in which every operand could
* be a field name, expression, or
* subquery
*/
struct GROUP
*group; /* group by clause
*/
struct CSTACK
*having;
/* having condition stack.
* conditions related to group
* by items
*/
} QUERY;
typedef struct UNNEST {
char field[FSIZE]; /* name of the field to be
* unnested
*/
char **rename; /* rename list
* after unnest operation,
* several fields will be
generated, so
* rename them
struct SQL.sub.-- FRAME
*frame;
/* query expression to be
* unnested
*
} UNNEST;
typedef struct SETOP {
int type; * UNION, DIFFERENCE or
*INTERSECT
*/
struct SQL.sub.-- FRAME
*left;
/* left operand
*/
struct SQL.sub.-- FRAME
*right;
/* right operand
*/
} SETOP;
__________________________________________________________________________
It is important to carry out a frame-to-node conversion at this point to convert SQL.sub.-- frame structure to the NODE structure. During this conversion, all dictionaries of any referenced COBOL data files are opened. TREE.sub.-- SPE and DIC.sub.-- REC arrays for each table are created and a REL.sub.-- TYPE structure, which describes a relation between data items, is established for each table. The select list of the final query and subqueries are also treated as a relation and are described by REL.sub.-- TYPE structures. If "views" (virtual data files, which are derived by executing one or more CCSQL statements) are used in the query or subqueries, they are processed first. To the user, a view appears to be a real data file; however, a view does not actually contain data items. Instead, it contains pointers to selected data items in one or more COBOL data files. A view can be used in carrying out almost any type of data manipulation operation that is normally applied to data items, such as a Join, a Union, a Cartesian Product, etc., as discussed above. A semantic check is made of the field names and expressions, to ensure that each field points to an appropriate node in the dictionary tree (TREE.sub.-- SPE). Any flags for a Join, Cartesian product, repeated query, or other operation are set. The condition stacks defined by the query are optimized by using a cost tree in which each condition is assigned a cost according to predefined rules. Based upon the relative levels of these costs, the evaluation is carried out in a corresponding order. Any information related to a query or temporary result is referenced through NODE, and operations actually applied on a NODE to generate the final result. The NODE structure is defined as follows:
__________________________________________________________________________
typedef struct NODE {
int type; /* QUERY, UNNEST, SETOP
*/
int subi; /* index to SUBQHEAD array,
* which registers the
* intermediate file names,
* file pointers, etc., for query
* (or subquery) on this NODE
*/
int hsubi; /* index to SUBQHEAD array
for
* condition check
*/
int old.sub.-- rel;
/* after unnest operation, used to
* keep the original relation
* entry in REL.sub.-- TYPE array
int process; /* denotes the processing status
* of this NODE
* START start processing
* REPEAT repeated query
* PASSED after repeated
* query being processed first
* time
* FINISHED finish
processing
union {
struct N.sub.-- QUERY
*query;
/* SFW.sub.-- expression
*/
struct NODE
*unnest;
/* NODE to be unnested.
* unnest is processed by
creating
* a new relation according to
* the old relation and unnest
* requirements provided by
* UNNEST structure
*/
} union;
.... /* other flags and auxiliary data
* structures for query, unnest,
etc.
*/
}NODE;
__________________________________________________________________________
Included in the definition of NODE structure are structural definitions for N.sub.-- SETOP and N.sub.-- QUERY. These structures are defined as follows:
__________________________________________________________________________
typedef struct N.sub.-- SETOP {
int type;
/* UNION, DIFFERENCE,
* INTERSECT
struct NODE *left;
/* left operand
*/
struct NODE *right;
/* right operand
*/
} N.sub.-- SETOP
typedef struct N-QUERY
struct N.sub.-- COLUMN
*column
/* fields can be referenced by
* pointers to TRE-SPE and
* DIC-REC array. Expressions
* organized as a tree in which
each
* node can be a constant or a
* pointer to reference certain
fields
*/
struct N.sub.-- TABLE
*table
/* tables to be selected;
*/
struct N.sub.-- CSTACK
*where
/* where condition stack
*/
struct N.sub.-- CSTACK
*having
/* having condition stack
*/
struct N.sub.-- GROUP
*group
/* group by columns
*
struct VIR.sub.-- STACK
*vhstack
/* virtual stack for having
condition
*/
struct GFUNC.sub.-- LIST
*gfunc
/* store intermediate and final
* results of group functions
*/
} N.sub.-- QUERY;
__________________________________________________________________________
The definitions of the above two structures include references to additional structures, which are defined as follows:
__________________________________________________________________________
typedef struct N.sub.-- TABLE
int index; /* index in REL.sub.-- TYPE array
*/
int joined /* flag indicating whether the
table
* has been used in a join
int oldrel; /* index in REL.sub.-- TYPE array
before
* unnest operation
*/
struct N.sub.-- TABLE
*next /* next table in FROM clause
*/
}N.sub.-- TABLE;
typedef struct REL.sub.-- TYPE
/* data structure for keeping
* complete information of a
relation
*/
char relname[.sub.-- NAMESZ];
/* the name of relation;
temporary
* relation has no name
char realname[.sub.-- NAMESZ];
/* the name registered in VOC
file
*/
char label[.sub.-- NAMESZ];
/* reference name in FROM
clause
* or unnest clause
*/
int ftype; /* flag indicating if the file is:
* 1. COBOL line sequential
file;
* 2. COBOL record sequential
file;
* 3. COBOL indexed file; or
* 4. COBOL relative file
*/
int reclen; /* the length of a record
*/
int minreclen; /* the minimum length for the
* records when variable length
* records exist
*/
struct TRE.sub.-- SPE
*tree /* dictionary tree
*/
struct DIC.sub.-- REC
**dic; /* DIC.sub.-- REC array for keeping
* detailed information of each
field
*/
short count; /* total number of fields for
index
* file, file management, and
other
* temporary structure, etc.
*/
.... /* other information for index
file,
* file management, and other
* temporary structure, etc.
*/
} REL.sub.-- TYPE;
__________________________________________________________________________
When SQL.sub.-- FRAME is passed to a query subsystem, the main query routine is called and undertakes the logical steps disclosed in a flowchart 220, as shown in FIG. 10. Beginning at a start block 222, the logic proceeds to an initialization step, in a block 224. The initialization sets up the query environment, and processes the views involved in the query, as described below in FIG. 11 and as indicated in a block 226. Next, in a block 228, the logic converts the SQL.sub.-- FRAME structure to a NODE structure, and retrieves items that satisfy each of the query conditions (see FIG. 12). This step, which is shown in a block 230, corresponds to the normal processing of a query. After the step is completed, the logic proceeds to an end block 232. Referring now to FIG. 11, details of the logic involved in processing views is illustrated in a flow chart 240, beginning at a block 242. A decision block 244 determines if an existing view has already been processed, and if so, a block 246 determines the temporary file information for the view and sets at least some of the fields in the NODE structure. Thereafter, the logic proceeds to an end block 256. However, if the view has not been processed, the logic proceeds to a decision block 248, which determines whether it is necessary to unnest a table. Since UNNEST could appear in a FROM clause, it is treated just as a view would be; for example, if an item in a FROM clause is a transformed file, an expression for R is represented as: UNNEST (UNNEST) (R) ON . . . ) ON . . . , . In the above expression, R can also be a view. In dealing with this expression, a SQL.sub.-- FRAME structure is generated, representing a CCSQL query statement of the form: UNNEST (UNNEST) SELECT*FROM R) ON . . . ) ON . . . , which is equivalent to that in the FROM clause. (See block 250.) This query statement is then processed as a normal view in a block 254. The only difference is that at the SQL.sub.-- FRAME is directly generated instead of resolving it from a view definition. In addition, the information of this assumed view is not stored in a view definition file. If the inquiry in decision block 248 produces a negative response, the logic proceeds to a block 252, which reads in the view definition from a view file, resolving it into a SQL.sub.-- FRAME. After processing the view, a new table is created as a relation and the query results of the view are saved in a temporary file. This relation can be referenced as a normal relation in further processing. The logic then terminates at an end block 256. The main logic routine for query processing is illustrated in a flow chart 260, shown in FIG. 12. Beginning with a start block 262, the logic proceeds to a decision block 264, which determines if the NODE is of the SETOP type. If so, in a block 266, the left and right node operands are processed following the above steps and are then combined according to the type of Set operation in a block 268. The logic then proceeds to an end block 292. However, returning to decision block 264, if the NODE is not of the SETOP type, a decision block 270 then determines if the NODE is of the UNNEST type. If so, in a block 272, a new relation is created based upon the resultant relation of the query. The UNNEST information is added on that relation, each time an item is generated by the query. This UNNEST operation is applied directly to the item without putting it into a temporary file. Thereafter, the logic again proceeds to end block 292. If the NODE is not of the UNNEST type, the logic proceeds from decision block 270 to a decision block 274, which determines if a subquery exists. If so, the subquery is processed in the same way as a normal query would be, in a block 276; then, the logic returns to a decision block 278. Similarly, if no subquery exists, the logic proceeds directly to decision block 278, which determines if the query is for a data manipulation language operation, It should also be noted that in block 276, subqueries that contain a "where condition" are evaluated after the subquery is processed in accordance to the procedures applied to a normal query. If a "group function" or a "group by" clause appears in a query, the group information is stored in predefined data structures. For a "group by" clause that includes an "if having condition," all groups that do not satisfy the condition are eliminated. An affirmative response to decision block 278 leads to a block 280 to evaluate the condition contained in the query and to apply the data manipulation language operation to each item that satisfies the query condition. Thereafter, the logic again proceeds to end block 292. Finally, if the query is not related to any data manipulation language operation, a block 290 evaluates the condition and passes each item that satisfies it to the report subsystem. Thereafter, the logic concludes at end block 292. Data manipulation language subsystem 138 carries out various operations in accordance with query statements, including commands for INSERT, DELETE, and UPDATE, each of which operate on a data item of the COBOL data files. If a data manipulation language statement is executed without a condition, it simply calls a specific routine to carry out the required operation. If a condition is stated, however, the data manipulation language statement is embedded in a query process and starts the query routine that applies the appropriate operation on each item that satisfies the condition. The INSERT command inserts items into a selected table. If it is called with a query statement, it inserts one item into the named table for each item that it obtains from the query; otherwise, it inserts an item with the specified value at the indicated column. The value of the i.sub.th column of the item in the derived table, or the i.sub.th value in the VALUES clause is assigned to the i.sub.th column of the table. Omitting a reference to a column is equivalent to specifying all the columns of the table, in ascending order. When INSERT occurs in a query statement, the query process routine is called and instead of passing the resultant item to the report subsystem, the logic rebuilds an item according to the INSERT requirement and then inserts the item into the table. The DELETE command deletes any items from a named table that satisfy a "Where condition." If the Where condition is omitted, all items are deleted from the named table. When a Where condition is stated, the query process routine is called and any items that satisfy the condition are deleted. The UPDATE command updates items in a named table. Each column identified corresponds to a column that is to be updated. If a Where condition is omitted from the statement, all items are updated. However, when a Where condition is included in a statement, a query process routine is called and any items that satisfy the condition are updated. The CCSQL file subsystem comprises functions for reading from and writing to a COBOL file and carries out operations related to file control such as locking the file. There are four types of COBOL files that are supported in CCSQL including line sequential, record sequential, indexed, and relative. When reading an item, it is locked, its validity is checked, and an appropriate routine is called according to the type of file. After the item is read, the lock is released. Conversely, when writing an item to a data file, if the file is indexed, the item is added to a free block. Otherwise, it is added to the end of the file. The logic employed in reading and writing to COBOL data files in the CCSQL are illustrated in FIGS. 13 and 14, respectively. Turning now to FIG. 13, a flow chart 300 illustrates the logic for reading from a COBOL data file. Beginning at a start block 302, a decision block 304 determines if the COBOL file is a sequential file type and if so, seeks the end of the data last read in a block 306. Thereafter, the logic proceeds to a block 318, wherein the item last read is locked, the item is read, and the lock is released. The logic then proceeds to an end block 320. Returning to decision block 304, if the file type is not a sequential, a decision block 308 determines if it is indexed and if so, proceeds to a decision block 310. In decision block 310, the logic determines if the reading mode is sequential and if so, proceeds to a block 312 to get the next item or the primary key in the indexed file. Following block 3 12, the logic again proceeds to block 318 to read the item. If the reading mode is nonsequential in decision block 310, the logic proceeds to a block 314 to look up a key on the index file according to the indexed data item. Thereafter, the item is read in block 318 as described above. Assuming that in decision block 308 the file type is not indexed, the logic proceeds to a block 316, to determine the item position according to its number. Thereafter, the item is read in accordance with block 318. Turning now to FIG. 14, the logic for writing to a COBOL file begins with a start block 331. A decision block 332 determines if the file is indexed and if so, identifies free space in the data file and locks the item, in a block 334. Following that operation, a block 336 provides for inserting all indexing data items into the index file. Thereafter, the item is written to the data file in a block 340, leading to the conclusion of the process in end block 342. If, based upon decision block 332, the file is not indexed, a block 338 provides for seeking the end of the file and locking the item before writing the item to the data file in accordance with block 340. The report subsystem is used for printing out the result of a query. It provides a set of commands to format query results into useful and meaningful reports. The report subsystem implements a number of CCSQL commands, which are generally similar to SQL commands. Specifically, the report commands can Set, Clear, and List a report environment data structure, such as column, break, computation, TTITLE, BTITLE, etc. After the environment data structures are set, they remain until Clear commands delete them. Among the report-related commands is the COLUMN command that redefines a column format, conversion, display name, etc., a column can have an alias name and the column definition can copy to a different column by name. The BREAK command used in the report subsystem defines the break fields and the operations on those break points. It defines the number of lines to skip before or after breaking and the contents or value to be applied on a break line. The COMPUTE command defines computations to be carried out on specified fields. Computations include options for: SUM, AVERAGE, NUMBER, COUNT, MAXIMUM, AND MINIMUM. A TITLE command defines how the header and footer should look and is comprised of text, blanks, and values of certain fields. These values can also be formatted. If a field in the title does not appear in a Select list, the field name is not replaced by the value. Title items comprise: COLUMN, TAB, LEFT, CENTER, and RIGHT options that are used to determine the position of the title. After all report environments are set, the report is output one item at a time. For each item, there can be values in the header or footer that need to be saved in a specific data structure for use by the header or footer generator. If there is a break field, it should be checked to determine if there is a break point. If so, the processing of the break field proceeds according to the break options and calculations that are set. The current item break fields' value for checking is then set for the next item. After all items are printed, a final break is processed to complete the output of the report. While the preferred embodiment of the invention has been illustrated and described, it will be appreciated that various changes can be made therein without departing from the spirit and scope of the invention. Accordingly, it is not intended that the present invention be limited by the Disclosure set forth above. Instead, the scope of the invention should be determined entirely by reference to the claims that follow.
|
Same subclass Same class Consider this |
||||||||||
