Dictionary for encoding and retrieving hierarchical data processing information for a computer system5778223Abstract A data transmission dictionary is provided, which is adapted for use by a computer system for encoding, storing, or retrieving hierarchically related data transmission information. The dictionary is comprised of a group of one or more computer searchable definition trees relating to transmission information of the computer system. The trees are derived from a first definition group which includes characteristics of commands, replies or data usable by the computer system. The characteristics include structure and value properties and restrictions, if any, applying to the commands, replies or data. Each tree represents, respectively, a definition of a the command, reply or data to which it relates. Each tree includes a root node identified by name, e.g., a codepoint. The root node includes information describing the type of definition tree concerned (i.e., whether it relates to a command, reply or data), and may include one or more internal or terminal descendant nodes. These nodes represent components of the definition represented by the tree. The descendent nodes include level information describing the level of the node within its tree. The nodes may include attribute information, and may include value requirements relating to transmission information represented by the nodes. Claims What is claimed is: Description FIELD OF THE INVENTION
______________________________________
Newdef LCF.sub.-- Construct (IN Codepoint)
(*LCF Method for constructing Definition*)
Search for the file identified by the Codepoint
Scan for all its parameters (or instance variables),
if any
If There Are Some Then
Do;
Scan file for instance variables
Do for all the Instance Variables
Definition = Definition +
LCF.sub.-- Construct (Codepoint)
End Do;
End If;
End LCF.sub.13 Construct;
______________________________________
To illustrate the LCF flow and provide some insight with regard to the impact of Dictionary access and recursion on path length consider the example illustrated in FIG. 7 which depicts the definition tree to be built. LCF maintains 13 files for this tree. To illustrate the LCF flow and provide some insight with regard to the impact of Dictionary access and recursion on path length consider the example as depicted in FIG. 8. Hence LCF retrieves each file, sequentially searches for parameters in each file (the search argument is a variable length character string, or DDM Mnemonic, such as RDBNAM in the example above), and then for each parameter found, gets the file and extracts its parameters. This is a recursive method. This recursive step is done at run time, each time one wants to generate or parse a DDM stream. This means that the methods to construct a DDM Dictionary definition is an exhaustive search that goes through the entire file: Hence, in order to build the definition, LCF requires m retrievals and with each retrieval there is a sequential search to locate the parameters. LCF Storage Methodology LCF stores each DDM definition in a file, in the format shown in FIGS. 5a and 5b. This means that each term is stored in a separate file with information that is not needed by the parsing and generation processes. Also each of its instance variables are stored in the same fashion, etc. The storage requirements for LCF are approximately 1000+100 m bytes per term in the dictionary, i.e., assuming 1000 bytes head and tail overhead plus 100 bytes per internal node. Hence, the storage requirements for the entire dictionary are approximately: (1000+100 m) N. Root Storage Method The Root Storage Method (RSM) approximates or simulates the recursion aspects of DDM object definition construction by an appropriate implementation technique (nested CASE statements, CASE within CASE within CASE). Given this direction, the objects defined within the DDM dictionaries can be entirely eliminated or restricted to objects of a given type. RSM restructures the DDM Dictionaries by first eliminating the dictionary identifier as an element in the search argument, and hence all dictionaries are merged together. Then, the dictionary search arguments are changed from character strings to codepoints. The character strings are still maintained within the dictionary. Finally, objects defined within the dictionaries are restricted to root nodes only. Thus, only DDM commands, command data, reply messages and reply data are defined. However, the constituent instance variables of any given DSS (or parameters), collection or scalar are not defined. RSM Retrieval Methodology Once the object has been identified to satisfy a request, then for each root level object, a unique root level object generator exists, which will generate one complete object. The object generator non-recursively constructs the instance variables (collections and scalars) which constitute the object. Consequently, the Generator must simulate the recursion inherent in the generation of all instance variables comprising that object. FIG. 9 depicts the CASE within CASE method. FIG. 10 depicts the flowchart of RSM object construction. With this approach, the DDM dictionaries are partitioned such that objects are defined within static data structures and the constituent instance variables are hardcoded. Note that in this method, the definitions of the various parameters are hardcoded multiple times, and that this method is not extendible to all possible variations of DDM. For example, it has the limitation in the number of levels of nesting that CASE statements are allowed. To construct the definition for ACCRDBRM (as depicted in FIG. 7 ), RSM undertakes the steps depicted in FIG. 11. To construct a definition, one must execute one retrieval with cost proportional to Log N to the base 2, and m CASE statements. Thus, RSM retrieves the root term definition. Thereafter, the parameters' expansions are hard-coded into the procedure. This method approximates the recursion aspects of DDM Object Generation by an implementation technique (e.g., CASE within CASE . . . etc.). Due to limitations in programming languages, there are only so many levels of nesting of case statements that are possible, hence making the method not expandable. This appears to be a hard limitation. If DDM expands to have more levels, the RSM will exhaust its usefulness. If DDM strings reach a depth exceeding the nesting limit, then redesigning of the code will have to be done. In addition, this method is not well suited to parsing, because DDM is not static. When parsing DDM Strings the parameters at each level of DDM term in the tree can appear in any order. The CASE within a CASE . . . does not provide all possible combinations of parameter ordering. Also, for each occurrence of the parameter in the dictionary, the semantic procedure associated with it is duplicated. The programs are hardcoded, and therefore difficult to maintain. Due to the increased size, the programs are more complex. In order to maintain the program, recompilation is performed each time. Hence, in order to obtain the definition of the DDM term, there is one retrieval necessary and one sequential search in the top level file. Then, a series of embedded CASE statements provide the rest of the DDM definition. RSM Storage Methodology RSM stores only the root or "top level" definitions. The constituent instance variables of the parameters are not defined. This means that only the top level codepoint definitions are stored as data. All the parameters derived through the root are hardcoded in the program. This results in the loss of information, including some of the necessary information required to parse and generate a DDM string. That is, all the information about the structure of the parameters is not available as data. If there are changes in the dictionary, this may result in consistency problems. While LCF stored all the information for all the codepoints, this method only stores the structural information for the top level codepoints. The storage requirements for RSM are approximately 1000+100 m per top level term assuming 1000 bytes for head and tail overhead plus 100 bytes per internal node. Hence, there are about (1000+100 m)k for the entire dictionary. The rest of the information for the structure of the parameters is hardcoded in the program as depicted in FIG. 9. Assuming there are N/10 top level objects, then the cost of storage is (1000+100 m) N/10 bytes. Drawbacks of the LCF and RSM Methods LCF maintains a set of files without constructing the definition. This means that each time a definition of an object is required, LCF has to reconstruct it using the methods described above. There is no added value to reconstructing the definition each time it is required since the same definition will be required over and over again. In addition, LCF does not keep a very compact form of each of the definitions of each of the parameters; it remembers information that is not needed, i.e., information that is not essential for parsing and generating. The invention herein overcomes these drawbacks by expanding the definition of a DDM command inside the data structure, and therefore not requiring its reconstruction each time it is accessed and by defining a short form of the data to describe the essence of the definition in a few bytes. RSM only stores the top level node definition of the tree. The rest of the definition is hardcoded in the program. While this saves on space compared to the LCF method, RSM does not record the information of the root node in a compact fashion. RSM maintenance may be difficult due to hard coding of each parameter and duplication of code for each instance of the parameter in the dictionary. RSM is also subject to the limitations of programming languages such as the level of nesting of CASE statements. The invention herein overcomes these problems. SUMMARY OF THE INVENTION Inconveniences of other methods discussed above and elsewhere herein are remedied by the means and method provided by the instant invention which is described hereafter. In accordance with one aspect of the invention a data transmission dictionary is provided, which is adapted for use by a computer system for encoding, storing, or retrieving hierarchically related data transmission information. The dictionary is comprised of a group of one or more computer searchable definition trees relating to transmission information of the computer system. The trees are derived from a first definition group which includes characteristics of commands, replies or data usable by the computer system. The characteristics include structure and value properties and restrictions, if any, applying to the commands, replies or data. Each tree represents, respectively, a definition of the command, reply or data to which it relates. Each tree includes a root node identified by name, such as a codepoint. The root node includes information describing the type of definition tree concerned (i.e., whether it relates to a command, reply or data), and may include one or more internal or terminal descendant nodes, which nodes represent components of the definition represented by the tree. The descendent nodes include level information describing the level of the node within its tree. The nodes may include attribute information, and may include value requirements relating to transmission information represented by the nodes. The root node of each definition in the dictionary may include information relating to length restrictions of transmission information represented by its definition tree. The attribute information may include a requirement as to whether data transmission information represented by a node is required, optional or ignorable. The attribute information also may include information on length, repeatability or non-repeatability of data transmission information represented by the node. Advantageously, the root node of each of the definition trees may be made-the sole accessible entry for the tree. As their size tends to be compact the definition trees may be stored in main memory of the computer system using them for use by parsing or generating programming to process data transmission for the computer system. Advantageously the definition trees are stored in a compact linear form preferably expressed in a depth first search form. In accordance with another aspect of the invention there is provided a method of creating the data transmission dictionary, above, by deriving a group of one or more computer searchable definition trees from a first definition group of nodes defining portions of commands replies or data usable by a computer system, compacting each of the nodes by retaining only information necessary for the processing of data transmission streams according to the definition trees; assembling each definition tree by sequencing the compacted nodes in a linear form, starting with the root node of each of the definition trees, by placing information included in each compacted node in a resulting implemented dictionary; and by assembling each child node of said definition tree in turn. The process of assembling each child node involves placing information included in the child node in the resulting implemented dictionary and assembling each of the child's child nodes in turn. The process of assembling a terminal node involves placing information included in the terminal node in the resulting implemented dictionary. In accordance with still another aspect of the invention means is provided for constructing the data transmission dictionary described above which comprise an extractor for deriving a group of one or more computer searchable definition trees from a first definition group of nodes defining portions of commands replies or data usable by a computer system. A compactor is provided for compacting each of the nodes while retaining only information necessary for the processing of data transmission streams according to the definition trees. An assembler is provided for assembling each definition tree starting with the root node for each tree. The assembler can place information included in each compacted root node in the resulting implemented dictionary and assemble each of the compacted node's child nodes, if any, in turn. The assembler is adapted to place information included in each child node in the resulting implemented dictionary and to assemble each of said child's child nodes, if any, in turn. In accordance with a further aspect of the invention the dictionary described above is incorporated into a computer system for use by it for encoding, storing, or retrieving hierarchically related data transmission information for use by said computer system internally or in communication with another computer system. In accordance with another aspect of the invention there is provided a method of encoding and decoding a data transmission of one or more computer systems using the dictionary described above using the following steps: separating the data transmission into command, reply, or data parts corresponding to individual definitions in the dictionary, and ensuring that the parts conform to required specifications of the data transmission protocol used by the system; for each of the parts, retrieving a corresponding definition tree from the dictionary, and stepping through the data transmission ensuring that required information is present and that relevant rules are obeyed for the tree structure for each of said nodes encountered in the data transmission; and also ensuring that structural and value rules relating to the nodes, as described in the definition corresponding to the node are adhered to. Advantageously, in the above method when used for encoding the data transmission the dictionary definitions serve as a roadmap for the translation of internal data structures of the computer system into a data transmission which conforms to requirements of the definitions. Advantageously as well in the aforementioned method when used for decoding a data transmission the dictionary definitions serve as a roadmap for the verification of the data transmission according to the definition requirements and the translation into internal data structures of the computer system. In accordance with another aspect of the invention there is provided a distributed computer system comprising a source system and destination system. The source system includes an application requestor, a parser and a generator supporting the application requestor. The destination system includes a server and a parser and generator supporting the server. The parsers and generators have access to one or more dictionaries constructed in accordance with the dictionary described above for the purpose of processing data transmissions between the source and destination systems. The distributed computer system described above may contains the destination and source systems within one or a local computer system. In accordance with yet another aspect of the invention a data processing dictionary is provided, which is adapted for use by a computer system for encoding, storing, or retrieving hierarchically related data processing information. The dictionary is comprised of a group of one or more computer searchable definition trees relating to data processing information of the computer system. The trees are derived from a first definition group which includes characteristics of commands, replies or data usable by the computer system. The characteristics include structure and value properties and restrictions, if any, applying to the commands, replies or data. Each tree represents, respectively, a definition of a the command, reply or data to which it relates. Each tree includes a root node identified by name. The root node includes information describing the type of definition tree concerned (i.e., whether it relates to a command, reply or data), and may include one or more internal or terminal descendant nodes, which nodes represent components of the definition represented by the tree. The descendent nodes include level information describing the level of the node within its tree. The nodes may include attribute information, and may include value requirements relating to data processing information represented by the nodes. It may prove advantageous for some of the nodes of the tree to be linked to data stored by the data processing system for representing or accessing the data stored. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 depicts DDM Objects. FIG. 2 depicts a DDM Object Interchange Format. FIG. 3 depicts a flowchart illustrating depth first searching. FIGS. 4a,b illustrate an example DDM Object: Root Node as defined in the architecture. FIGS. 5a,b illustrate an example of the Root Node OPNQRY. FIG. 6 comprises a diagram representing a method of constructing the definition for loosely coupled files. FIG. 7 illustrates a tree for the Command portion of ACCRDBRM. FIG. 8 depicts an example of retrieving a definition for the LCF method. FIG. 9 depicts a CASE method as used in RSM. FIG. 10 comprises a diagram representing the construction of a DDM definition by the root storage method. FIG. 11 depicts an example of retrieving a definition for the RSM method. FIG. 12 depicts an ADDG Flowchart. FIG. 13 depicts a flowchart for step 1 of ADDG; generate DDMTXT. FIG. 14 depicts a flowchart for step 2 of ADDG; create DDM definitions. FIG. 15 depicts a flowchart for step 3 of ADDG; assemble DDM definitions. FIG. 16 depicts ADDG tool pseudocode. FIGS. 17a-1 depict an implemented DDM dictionary and retrieval method in accordance with the instant invention. FIG. 18 comprises a representation of a DDM Command in the form of tree. FIG. 19 illustrates the DDM Dictionary Definition Syntax. FIG. 20 depicts parsers and generators in a Distributed System. FIG. 21 illustrates a tree for the Command portion of OPNQRY. FIG. 22 illustrates a tree for the Command Data portion of OPNQRY. FIG. 23 illustrates a tree for the Reply Data portion of OPNQRY. FIG. 24 depicts the method of parsing and generation employed by the instant invention. DETAILED DESCRIPTION OF THE INVENTION In the invention described herein below the definitions of DDM commands, replies, and data are stored in command, reply, and data trees, respectively. This invention which will be termed the DDM Dictionary Structure Optimizer (including method and means) (DDSO) compacts the definition of nodes of the DDM command and reply data trees by retaining only the information necessary for parsing and generation of the DDM data streams. DDSO also assembles the definition of a DDM command, reply, or data by sequencing the compacted nodes in the corresponding tree in a depth first search manner. Definitions are created by first scanning the DDM Architecture document (which may be on line advantageously) and then by extracting the necessary information. Then, each of the definitions is assembled. In order to explain DDSO, it is first described how to create the DDM Dictionary structure of the invention from the DDM architecture document, then what the storage and retrieval methodologies are, and the formal specification of the definition syntax. Finally, we discuss the advantages and disadvantages of DDSO are discussed. Creating the DDM Dictionary Data Structure The DDM Dictionary Data Structure is a compact form of definitions derived from selections of the dictionary defined by the DDM architecture document. Each definition is expressed as a tree (having one or more nodes) in a linear form, and preferably expresses it in depth first search form, with each of the nodes defined in a compact form. In general, the steps are the following: Step 0: (Extraction Stage) Get all the codepoints (identifiers of the nodes) for the trees required in the forest. The DDM architecture provides a network of nodes that are pointing to each other. This stage extracts the nodes needed for the trees of the application. Only the root nodes are given to the Extraction Stage. This step calculates which nodes are needed for the definitions. Step 1: (Compaction Stage) Scan all the DDM files created in step 0 for essential information, i.e., the top level codepoint for each node and all node parameters. Retain the information in DDSO form for the parameter. The specifics of the DDSO form are described below. An example of DDSO form is: "RN1: 2401,*255", which indicates attributes (RN), level in the tree (1), unique identifier (2401) and length attribute (*255). Step 2: (Assembly Stage) This step assembles (expands) each of the parameters. This means that if a parameter itself has parameters (i.e., it is a parent) then the children are added in a depth first search manner, and they are given one level higher than that of the parent. ADDG (Automated DDM Dictionary Generator) is a convenient tool which can be used to create one or more DDM Dictionary data structures (dictionaries) from the DDM architecture document. ADDG has three steps, as depicted in FIG. 12: 1. Generate DDMTXT This exec steps through the DDM architecture document extracting the information required by the user. This includes the root nodes specified by the user, as well as all the nodes required in the expansion of the root nodes. Each of these nodes is extracted into a file with filename equal to the DDM mnemonic term and a file type of DDMTXT. Other files are generated, such as DDM FLVL which provides a list of all DDM terms which are going to be expanded; EXPCDPT FILE which provides a list of all valid part specifications (a part specification specifies whether the DDM object is a command, reply, or data object) and their corresponding DDM codepoints and DDM HEX which provides a list of all DDM mnemonics with corresponding codepoints. The generate.sub.-- DDMTXT high level flowchart is depicted in FIG. 13. 2. Create DDM Definitions The Generate.sub.-- DDMTXT exec must be run before the Create.sub.-- DDM-Definitions exec. Create.sub.-- DDM Definitions creates the DDM.sub.-- DEF FILE which contains a DDM definition for each DDM Term. It follows the specific rules that were setup in the DDSO form for the dictionary. Create.sub.-- DDM.sub.-- Definitions is depicted in FIG. 14. 3. Assemble DDM Definitions The Generate.sub.-- DDMTXT and Create.sub.-- DDM.sub.-- Definitions execs must have been executed before this exec is run. This exec assembles all top level DDM terms by assembling parts of several DDM definitions. It also contains the source language specific statements in order to store each definition. The definitions are stored in a file. Pseudocode for the Assemble DDM.sub.-- Definitions is depicted in FIG. 15. The pseudocode for the ADDG tool is shown in FIG. 16. There are therefore two main operations involved in constructing the definition and these are compaction and assembly. Compaction involves storing each parameter in the compacted form, while assembly is an expansion process that reassembles a complete definition of a root node in depth first search format. It is possible to compact the definitions of each parameter without performing the assembly. Resulting storage savings over LCF will occur. However, the performance overhead of LCF to create the definition will have to be incurred, since the definition will have to be created at run-time as opposed to creating the definition before runtime, as is done in the instant invention. It is also possible to assemble the definition without compacting it. Due to the duplication of certain internal nodes, and large storage requirements for each node, this alternative may not prove attractive. However, if compaction and assembly are both done then maximum benefits may be obtained from the instant invention. Storage Methodology DDSO stores the DDM definition files in the format shown by the example depicted in FIGS. 17a-l. A DDM definition is a linear expression of a tree, assembled in depth first search manner, and contains information required, namely: information required for the root node and information stored for non-root nodes. The root node requires 6 bytes for its definition and each non root node requires 11 bytes. If there are m nodes in the tree then the tree requires 11 m+6 bytes. Hence, for N trees in a dictionary, 11 mN+6N bytes are required. In addition, a small search table requires 6 bytes per tree, hence 6N bytes. Therefore the total implementation requires 11 mN+12N bytes. Note that in the example, the constants 11 and 6, i.e., the number of bytes per internal and root nodes respectively are slightly higher. Certain additional characters ( "/"'s) and punctuation (",") were added to improve human readability. For the example application, approximately 5088 bytes of data are required for the dictionary itself and a small lookup table of about 510 bytes for the purposes of searching. Since the definition is already constructed, the cost of retrieval reduces to the cost of a search through the lookup table, e.g., the cost using binary searching. 1. Information Stored for Root Node The following attribute information is stored for the root node: (a) Carrier Type: i.e., whether it is a request, reply, or data object. In DDM there is one general format for the request data stream structure. The request envelope (RQSDSS) fields must be specified in a certain order because they are not self-defining structures. Only one command can be carried by a RQSDSS. Similarly, in DDM there is one general format for the reply data stream structure. All fields must be specified in the order required because the reply envelope (RPYDSS) is not a self-defining structure. Similarly, the data object envelope (OBJDSS) has a pre-specified format, and carries all objects except the commands and reply messages. An OBJDSS however may carry multiple objects; (b) The codepoint of the root node; (c) The length characteristic: The length characteristic includes descriptions for fixed length objects, variable length objects, objects with a maximum length, and objects with a minimum length. 2. Information Stored for Internal Nodes and Leaves (terminal nodes): The following attribute information is stored for non-root nodes: (a) whether the node is Required, Optional, or Ignorable; (b) whether the node (and its descendents) are repeatable or not; (c) the level or depth of the node in the tree; (d) the length characteristic of that node. The first attribute stored is the Required, Optional, or Ignorable attribute. A Required attribute specifies that support or use of a parameter is required: when a parameter is specified as being required in a parameter list for a command, the parameter must be sent for that command. All receivers (of transmissions) supporting the command must recognize and process the parameter as defined. When specified in the parameter list of a reply message, the parameter must be sent for that reply message. All receivers must accept the parameter. An Optional attribute specifies that support or use of a parameter is optional. When a parameter is specified as being optional for a parameter in a parameter list for a command, the parameter can optionally be sent for that command. All receivers supporting the command must recognize and process the parameter as defined and use the default value if it is not sent. When specified in the parameter list of a reply message, the parameter can optionally be sent for that reply message. All receivers must accept the parameter. An Ignorable attribute specifies that a parameter can be ignored by the receiver of a command if the receiver does not provide the support requested. The parameter can be sent optionally by all senders. The parameter codepoint must be recognized by all receivers. The receiver can ignore the parameter value. Next is the Repeatable or Not Repeatable attribute. A Repeatable attribute specifies that a parameter can be repeated. If it is specified as Not Repeatable it can't. There are no requirements that the elements of the list be unique, or that the elements of the list be in any order. The information stored for root and non root nodes is logically depicted in FIGS. 21-23. For example, a top level node with the description "1,200C,**** " has a carrier of 1 (request), codepoint of hex`200C` and variable length (i.e., up to an unspecified limit). In addition, a parameter, or internal node, with the following description: "RN2:2408,*255" means that the parameter is required, non-repeatable, has a codepoint of hex`2408` and has variable length of up to 255. Ordering of the Parameters In the embodiment described the parameters for each full tree are listed in a linear fashion; for example, for the tree depicted in FIG. 18, the ordering of the parameter definitions in the tree for depth first search is: N0, N1, L1, N2, N2.1, L2, N2.2, L3, N3, L4, N4, N4.1, N4.1a, L5, N4.1b, L6, N5, L7, where: N stands for Node, and L stands for Leaf. The order of the tree is maintained. The tree can be reconstituted in a hierarchical form, since depth first search order was used, and depth information was maintained. Other Parameter Orderings: Because all the valid orderings in which DDM parameters sent-are all of the orderings of depth first search (not just those limited to the left-to-right notation convention) it is more convenient to store the definition in this manner. It would be possible, but more expensive to store them in another order. Additional information, e.g., parent information, would have to be added to the definition, so that the tree may be reconstructed from the linear form. Retrieval Mechanism In the embodiment of the invention described the retrieval mechanism is based on a simple search technique, a binary search. However, other suitable search methods can be used depending on the range of the codepoints, the values of the codepoints, the size of the forest to be implemented, etc. DDM Dictionary Syntax FIG. 19 depicts DDM dictionary definition syntax for commands, replies, and data using the embodiment of the invention described herein. Interpretation Rules The rules describing DDM Dictionary syntax can be interpreted as follows: 1. ":=" means "is defined by", e.g., A :=B means that A is defined by B. 2. ".vertline." means logical or, e.g., A :=B .vertline. C, means that A is either defined as B or C. 3. Lower case characters represent terminal nodes of the definition and are defined as literals. 4. Upper case characters represent non-terminal nodes and are defined as a collection of terminals and non-terminals. 5. quotes : Items in quotes are literals. For example `B` means the letter B. Acronyms & Syntax used in FIG. 19 Carrier indicates the DSS carrier 0 indicates the DSS carrier used for partial replies 1 indicates the DSS carrier field RQSDSS (request DSS), used for commands; 2 indicates the DSS carrier field RPYDSS (reply DSS), used for replies; 3 indicates the DSS carrier field OBJDSS (object DSS), used for objects; Codept indicates the DDM codepoint: identifier for the DDM term; Maxlen indicates the maximum length of the DDM term; Minlen indicates the minimum length of the DDM term; Level indicates the level of the DDM tree, i.e., indicates the level of nesting with the parameter; Length is the length of the DDM parameter; **** means variable length; $ signals the end of the definition; LOWERA indicates a lower level architecture used by DDM. This allows for DDM to include other architectures. The formal specification of the definition basically means the following (still referring to FIG. 19): DDM.sub.-- ENTRY: Line 1 is the top level entry and defines the root node. The root node can have either a request, reply or data object envelope and this is specified by the Carrier. A carrier for the specific application has four possible values, 0 through 3, but this can be extended for other types of envelopes. In addition to the carrier, the root node information includes the codepoint, Codept of the node and the length specification of the root node (the length specification of the root node is usually variable length although this is not required. The length specification can specify a fixed length field, a maximum length field, a minimum length field or a variable length field). The root node can be composed of DDM objects, referred to as DDM.sub.-- PARMS (first line in the formal specification) or can be composed of objects of a lower level architecture and can either have itself a lower level data value (Line 2) or can be a collection of lower level objects (Line 3). DDM.sub.-- PARMS: If the root node contains a collection of DDM objects and lower level objects, then this DDM definition is followed. The DDM object can either be (a) a terminal object (Line 4), with information such as required/optional/ignorable, repeatable/non-repeatable, level of the terminal object in the tree (with root node being level 1), the codepoint and length characteristic; (b) A terminal object with lower level object contents, with the same characteristics as the terminal object above (Lines 5-6); (c) Two DDM.sub.-- PARMS objects. This allows a DDM.sub.-- PARMS object to recursively define itself in order to allow more than one terminal object and more than one depth in the tree (line 7); (d) One DDM.sub.-- PARMS object. This is a syntactic trick to allow for the `$` which indicates the end of the object, and is required in the definition (Line 8). LOWOBJ: Allows for the same structure as a DDM object and hence allows nesting and terminal nodes. The terminal nodes contain the same basic information as a DDM terminal node (Lines 9-11). Line 12: A carrier can have values ranging from `0` to `3`. This can be expanded to more values as the need arises. Line 13: The level of the parameter in the tree. The root has level 1 and its children have level 2. If a node has level i then its children have level i+1 . Line 14 : Codept indicates any valid DDM codepoint. Line 15 : Length characteristic for DDM: For example, it may take on the following values: (a) dddd, such as 1233, which means fixed length of 1233, (b) ****, which means variable length, (c) *maxlen, such as *255 which means that the DDM object has a maximum length of 255, (d) minlen*, such as 255*, which means that the DDM object has length of at least 255. Note that there are only four characters for length. This can easily be expanded as needed Lines 16 and 17 : Specification of minlen and maxlen Line 18 : "roi" means that the parameter is either required, optional, or ignorable. Line 19 :"rn" means that the parameter is either repeatable or not. Line 20 :"d" is any valid digit from 0 to 9. It is possible to modify the formal specification of the syntax in various ways, without changing the intent and the meaning of the invention. Various ways of modifying it include: (a) adding more carrier types, (b) adding more attributes to the root node, or to the parameter nodes; as more attribute characteristics are added to the architecture, more attribute place holders or more valid values may be added to describe DDM; (c) length specifications could change such as to add more digits to one length specification, or to add a parameter which has both minimum and maximum length restrictions. As DDM evolves, the formal specification for the dictionary syntax will evolve as well. EXAMPLE The files depicted in FIGS. 5a,b can be stored as follows: Request 1,200C,****/ON2:2110,0022/RN2:2113,0068/RN2:2114,0008/ON2:2132,0006$ Command Data 3,200C,****/ON2:2412,****,LOWERA/RR3:0010,****/OR3:147A,****$ There are two degenerate cases one can look at to compare DDSO with LCF and RSM. These are: (a) a tree with one node: while DDSO stores the node in compact form, LCF stores one node in one file; LCF still needs to scan the file, but does not need to perform the assembly. RSM in the case of the tree with one node reduces to LCF, since there are no CASE statements associated with one node. Hence in the case of the tree with one node, DDSO still maintains its advantage of storage compaction, but is still slightly better than LCF and RSM in performance. (b) A forest with one tree; in this case, DDSO avoids the binary search. LCF and RSM still have to construct the definition. Hence, in the case of a forest with one tree, the invention has advantages. How DDSO Definitions are Used The DDSO definitions are retrieved in both the parsing and the generation processing of DDM strings. Parsing means receiving a DDM string, checking its syntactic correctness and building the equivalent internal data structure for use by the local processor. Generation means receiving an internal data structure and building the DDM string using the definition tree. FIG. 20 depicts the parsing and generation process in a requester-server distributed system. An application program first submits a request in internal format. (Step 1) The request is translated into the DDM string by the generation process (the generator consults the DDM Dictionary to do this). (Step 2) Then, the request is sent to the server, which receives; it. The parser translates the request into internal format by consulting the DDM dictionary for syntax verification. (Step 3) Then, the internally formatted request is executed by the server. This can be one of various different suitable types, of servers such as file servers, or database servers. (Step 4) The server issues one or more replies in internal format, which are translated by the generator (Generator consults the DDM Dictionary) into a DDM string or strings. (Step 5) DDM reply is sent to the source system. (Step 6) Finally, the source system's parser translates DDM reply into internal format (Parser consults DDM Dictionary) and returns to the application program. Conceptual Layering of DDM In the specific embodiment described the parser and generator advantageously share a common design which stems from partitioning DDM data streams (DDM strings) into a series of layers. The first, or topmost layer, Layer Zero, consists of a DDM Command or a DDM Reply, which constitutes a logical object. A request for parsing or generating must always come at layer 0. Next is Layer One, which is derived from breaking up this logical object into one or more Data Stream Structures, or DSSs (or data communications envelopes) which are linked to each other. For example, the DDM Command to execute an SQL Statement is accompanied by various parameters as well as command data (the SQL statement). DSSs can include a command part and zero or more command data parts; or, a reply part and zero or more reply data parts; or, one or more reply data parts. Layer Two consists of the structural properties of a tree without looking at the specific values of the nodes within that tree. An example of a structural property of the tree is the length value at each node which is the sum of its children's length plus a constant (for its own length field and codepoint, or identifier). Finally Layer Three: consists of each node of the DDM Tree. Each node has structural properties in the tree and valid required values. Examples of the structural properties within the tree include whether the node is required, optional, ignorable, repeatable, a collection, or a scalar. ("Collection" refers to an internal node, and "scalar" refers to a leaf node). Examples of values of the nodes: Leaf nodes carry values and these values carry certain restrictions. For example, leaves may be of certain data types, such as enumerated value data types or they may have certain length restrictions, such as maximum length. Non leaf nodes don't have values but have length restrictions. SOFTWARE ARCHITECTURE FOR DDM PARSING AND GENERATION METHODS There are three major levels of the DDM Parsing/Generation Process which correspond to the three layers mentioned above, and are depicted in FIG. 24. The first level deals with the processing of a DDM Entry (Multiple Related Data Stream Structures): or relating two logical DDM Objects together. For example, a command must always be followed by command data if it has any. The "links" between the two Data Stream Structures (DSSs) (command, command data objects) are established by the processing of the DDM Entry. This level takes care of linking DSSs together, through various continuation bits, and ensures that the rules as defined by DDM architecture for linkage are enforced. The second level involves processing one Data Stream Structure at a time. This level takes one of the DSSs and looks at its internal structure. Each DSS is composed of a tree. This level obtains the definition of the relevant DDM object from the DDM Dictionary, and then proceeds to step through the definition, and starts comparing it to the actual data received (parsing), or, uses it as a roadmap to generate the appropriate data stream (generation). While level 1 was concerned with the relationship between DSSs, level 2, the DDM layer, takes care of the relationships between the nodes within a DDM tree, with such activities as length checking for collection objects, etc. The third level (the action level) concerns itself with individual nodes which include: Action Execution, Action Specifics, and a Link to a Lower Level Architecture. The Action Execution sublevel is the next natural level down and deals with individual nodes. These nodes have properties, such as: required, optional, ignorable, repeatable, etc. It is the responsibility of the Action Execution sublevel to ensure that required nodes are parsed or generated and that other structural properties of the codepoints are obeyed. The Action Specifics sublevel deals with the values in individual nodes. The nodes are either collection objects, (i.e., internal nodes: in which case they are composed of other DDM nodes), or they are scalars (i.e., leaf nodes ). The collection objects have no specific values associated with them. The scalars do, and it is the responsibility of this sublevel of the hierarchy to ensure that the values parsed or generated are the correct ones. The length attribute is also verified against its corresponding definition in the dictionary. The third sublevel or the lower level architecture sublevel deals with more complex scalar objects defined in another architecture, such as the Formatted Data Object Content Architecture developed by IBM Corporation. The common Parser and Generator design provides the following advantages including maintainability, generality, and non-recursive methodology. Maintainability is due to the fact that changes in the syntax of DDM are only limited to the action specifics portion. For example, if a parameter changes, it is very easy to locate the unique instance of its action in the code. Also, the common logic makes it easier to maintain the code. The Parsing and Generation processes use common data structures, such as the Length Tree Data Structure. The code is very general, in that changes in the dictionary are localized to the action specifics (Generality). One could merely change the action specifics part and have a Parser and Generator for a Distributed File System Application, for example. The structure of DDM is followed and hence changes can easily be incorporated. The actions described above are for a Data Base Application. However, it would be relatively easy for a person skilled in the art herein to build a set of actions for another application of DDM and substitute the new set to achieve the intended results. Another advantage of the use of the dictionary of the invention is that the method of use simulates recursion by having a completely expanded dictionary. That is, the DDM Tree is expanded in a depth-first search manner. Therefore, the method has the advantages of a recursive solution without the overhead of the actual recursion. Advantages of DDSO In terms of storage requirements DDSO shows useful advantages. The efficient utilization of storage is due to the fact that only essential information is retained. The dictionary is encoded into a specific format so that it will contain the definition in its most minimal form while still including information about all the nodes in the tree of the definition including the optionality information about the node, the node's length information, and the node's level information. Also there is only one dictionary access per top level DDM definition. One dictionary access gives access to the entire definition as opposed to the definition of the node only. By comparison, LCF requires as many accesses as the number of parameters in the tree. RSM requires one access per top level node, but only provides structural information for the top level node and not the entire definition tree. In addition to being more storage efficient and requiring only one dictionary access to obtain the full definition, DDSO constructs the definition prior to compile time. Since the definition has been expanded prior to compilation, the recursive step is not done at run time which would be at the expense of the user. DDSO incurs the cost once per definition prior to compiling the code. DDSO uses binary searches into a table of top-level nodes. DDSO could also utilize other search methods, such as hashing etc. LCF and RSM appear to be limited to sequential search methods. DDSO code is less complex. DDSO has a unique action for the same node and hence does not duplicate code unnecessarily. DDSO is independent of the programming language. Also, DDSO can use a table driven method while RSM has hardcoded programs. DDSO encodes the definitions as data. A change in DDM architecture would require RSM to change the program rather than just the data. For clarity, maintenance, and simplicity, the table driven approach has advantages. Also, the method is expandable for future use. DDSO appears to be independent of programming language, while RSM appears limited to the number of nestings of CASE statements allowed in the implementation of programming languages. DDSO compacts the definitions, and defines a grammar to describe DDM. The expansion of the trees is done before compile time, and hence the recursive step of LCF need not be done for each DDM tree parsed or generated. DDSO is a table-driven method, in which the table contains the node identifier followed by a pointer to the already expanded definition. DDM Dictionary Data Structure Example An example of a DDM dictionary according to the invention herein is depicted in FIGS. 17a-1. Some points to note about the example are: 1. Data Structures Used: In this example, a DDM Dictionary data structure and retrieval mechanism are discussed. It is composed of the following declarations: TABLE : a table containing: Specification and codepoint: used to search for a root level codepoint concatenated with the specification, which indicates: CD-command data, CP-command part, RD-reply data to distinguish between carrier types. Length of definition Pointer of definition: this points to the definition of the tree. This table is used for binary search. The specification and root level are listed in alphabetical/numerical order. TBLBASE: a pointer to the table used to remember the starting location of the table. TBL.sub.-- PTR: a pointer used to search through the table DDM.sub.-- TBL: a template used in conjunction with TBL.sub.-- PTR to search in the table and obtain the necessary fields. 2. Specific Method to Retrieve the Data (a) Find out part specification and codepoint in last four character positions. (b) Do a binary search in the table to match desired codepoint. When found, then move to the definition buffer area. The retrieval mechanism depicted in FIGS. 17k,l is based on a simple binary search. However, other search methods can be used to fit the particular application. The above-described embodiments are merely illustrative of the application of the principles of the invention. Other arrangements may be devised by those skilled in the art without departing from the spirit and scope of the invention.
|
Same subclass Same class Consider this |
||||||||||
