Method and apparatus for extracting data from files6507855Abstract A system and method are disclosed for creating or modifying a documentation output object that describes a portion of computer code. A documentation input object within a code file that is associated with a first documentation information object is provided. The first documentation information object is extracted based on the documentation input object. The first documentation information object is output to the documentation output object. A method is also disclosed for creating a data structure. A computer readable medium containing program instructions for creating or modifying a documentation output object that describes a portion of computer code is also disclosed. Claims What is claimed is: Description BACKGROUND OF THE INVENTION
/* edt: * function my_function()
* Argument: void *inarg
* A pointer to a buffer containing input arguments. May
* be NULL.
* Argument: ulong_t func
* The size, in bytes, of the inarg buffer.
* Use: All status and control functions for an interface are
* accessed through this vector.
*/
In the above example, the tags are separated from the code since they are located within comment lines. Additionally, the tags are in the form of text objects that are followed by a colon (:) such that the tags are distinguishable over normal comment text objects. In the above example, the tag "edt:" indicates the beginning of a listing of tags (also referred to as "fields" or "field tags" ) that are associated with a particular code portion, such as a macro or function. The listing of tags include two "Argument:" tags that are associated with a text object that describes inputs for the particular code portion and a "Use:" tag that is associated with a text object that describes how to use the particular code portion. The tags may be associated with a plurality of specifications that control various aspects of how the code is documented and/or may also be included within the code documentation. In the above example, the specification "*" of the tag "edt:" indicates that the following listing of tagged documentation information will be incorporated into a previously defined default chapter; the specification "function" indicates that the associated code portion is in the form of a function; and the specification "my_function( )" indicates a reference name for the function that will be included within the code documentation. The specifications "void *inarg" and "ulong_t func" of the "Argument:" tags indicate the names of arguments of the function (or particular code portion). Note that the tag "Use:" does not include specifications. The tags may also be followed by a text object that is to be included within the documentation. In the above example, the tag "argument:" is followed by the text object "A pointer to a buffer containing input arguments. May be NULL." Various types of tags, specifications, and text objects are explained further below in reference to FIGS. 2 through 4. One advantage of the present invention is that the tags may be entered in any suitable manner as long as the tags are distinguishable from the code. For example, tags may be entered into a code's comment lines using any editor. Thus, a programmer may take advantage of the code documentation capabilities of the present invention without having to learn how to use another editor. The tags may serve any code documentation purpose. To name a few examples, the tags may indicate what documentation information to include within the code documentation, where to find documentation information to include in the code documentation, how to format the documentation information within the code documentation, and/or the type of code documentation (e.g., output to a file and/or output to a computer display). In one embodiment, the tags are divided into three general categories: control, engineering, and documentation tags. Control tags indicate how to extract and/or interpret documentation information and/or associated code portions. Engineering tags identify documentation information to include within the code documentation. The identified documentation information describes characteristics of the code. Documentation tags indicate how to sort, filter, and transmit a tagged portion of documentation information to the correct audience or code documentation. FIGS. 2 through 4 provide example lists for each tag category. FIGS. 2A and 2B are lists of control tag examples in accordance with one embodiment of the present invention. An example of a control tag is the "edt:" tag. This tag marks the beginning of a group of tags (hereafter referred to as a "template). A template includes tags (hereafter referred to as "fields") that are used to document a particular prototype. A prototype may be any type of code component, such as a function, macro, structure, or variable. The lists of tags and prototypes are merely illustrative and are not meant to restrict the scope of this invention. Another example of a control tag is the "include" tag, which specifies another documentation input file that includes a portion of the template. That is, the "include" tag indicates that documentation input is located outside the code file. When the documentation procedure is implemented on the code file, documentation information may then be extracted from the "include" file. Preferably, the documentation procedure is recursive. For example, if the first "include" file has additional "include" tags, documentation information is also extracted from a second "include" file. FIGS. 3A and 3B are lists of engineering tag examples in accordance with one embodiment of the present invention. These tags are used to identify text that will be incorporated into specific sections of the code documentation. Additionally, these tags describe portions of the code. For example, the field "Argument:" identifies text objects that describe the arguments of a particular function. The "Argument" text objects may then be extracted and included in the code documentation. FIGS. 4A and 4B are lists of documentation tag examples in accordance with one embodiment of the present invention. Documentation tags are used to identify meta-information about the templates that is used to manage the templates and direct output. By way of example, "default_chapter:" is used to identify a default chapter. When a template does not include a defined chapter, documentation information for the template may then be included within the specified default chapter of the code documentation. By way of another example, "audience: private" is used to identify a portion of the documentation information as private, and private documentation information is filtered out during certain types of code documentation. In operation 104, documentation input may be in the form of invocation flags that may be set by the user, for example, while invoking the documentation procedure. These invocation flags allow the user to select certain options for generating the code documentation, such as determining the format of the code documentation and how the code documentation is output. By way of example, the programmer may input a flag that selects a "private audience" option so that code documentation is only generated for certain public portions of the templates and/or code (e.g., the documentation information that is tagged as private is not output). By way of another example, the user may select a type of output, such as to output the code documentation to a computer display. FIG. 5 is a list of invocation flags in accordance with one embodiment of the present invention. For example, the user may invoke a documentation procedure "docx" by the following: docx (-flags) (infile) (outfile) In this example, the user invokes the program docx and inputs a plurality of flags (-flags), a documentation input file (infile), and a code documentation output file (outfile). Of course, the above example is merely an illustration and is not meant to limit the scope of the invention. That is, operation 104 is optional, and the user may be prohibited from selecting options. In other words, the code documentation format and/or output type may depend on invocation flags (or other forms of documentation input) or may be preset and unchangeable by the user. Documentation Procedure The documentation procedure (e.g., 106 through 110) may be performed at any point during or after entering of documentation input. For instance, the code documentation may be generated even prior to entering all relevant documentation information. The code documentation may then be used as a check by the programmer as each code section is completed. In operation 106, documentation information is extracted from the code file. For example, documentation information that is in the form of templates, tags or fields, text, and/or prototypes is extracted from the code file. One embodiment of this extraction process is further explained below in reference to FIGS. 6A and 6B. Preferably, as the documentation information is extracted, it is also organized and arranged in a data structure. Any suitable type of data structure for arranging the documentation information may be implemented that allows easy access to the extracted documentation information. For example, in one embodiment the data structure is in the form of a binary tree. A binary tree is an internal representation of the templates, template fields, and text objects (remarks) that are related to code portions within the code file. ##STR1## In the above binary tree example, each item (e.g., template (T), field (F), and remarks (R)) include a right and a left link. Each template is linked to the right of the previous template. That is, as documentation information is extracted from the code file, a linked chain of templates is created and each template is linked to the previous template. Likewise, as fields are extracted for a particular template, the fields are linked and chained to the left of the particular template. Also, any text objects or remarks that are associated with a particular field are linked and chained to the right of the particular field. Thus, the templates, fields, and remarks are linked in a specific way such that they may be readily ascertained and/or accessed from the binary tree. The generation of a data structure from the code file has many advantages. For example, each time a particular code file is modified, a new data structure may be generated and then compared to a previous data structure. Thus, documentation changes may then be readily incorporated into the code documentation without creating new documentation each time the code is modified. Additionally, the data structures may be used by the programmer to determine what changes have recently been made to the code. FIGS. 6A and 6B are flowcharts that further illustrate the operation 106 of FIG. 1 of extracting documentation information from the code files in accordance with one embodiment of the present invention. Initially, in operation 202 any global tags are found. Global tags may be applied to all templates for a particular code file or may only be applied to templates that do not specify a corresponding local tag and/or specifies that a global tag should be applied. For example, global default tags are applied to each template that does not specify a corresponding local default tag and/or specifies a default setting. The global tags may be found by implementing any suitable technique that is capable of distinguishing the global tags from the computer code. For example, a global tag may be located within a C program's comment lines that are located near the top of the program file.
/*
* Default_Chapter: Drivers
*/
In the above example, a default chapter (Drivers) is defined such that documentation information within a template is included within the default chapter of the code documentation when the defined default chapter is specified (e.g., by a "*") within the template. Operation 202 is optional, and may not be required if there are no global tags. Alternatively, global tags may be located within an individual template and applied to subsequent templates, for example. A current template or end-of-file (eof) mark is then found in operation 204. The current template may be found using any suitable technique that is capable of distinguishing the template from the computer code. For example, the code file is sequentially scanned, line-by-line, until either a template or the end of the code file is reached. A template may be found by scanning for a tag that designates the beginning of a template (e.g., "edt:"). It is then determined whether the current template has been found in operation 206. In this embodiment, a template is defined as a set of related fields and associated documentation information. For example, a template may include all fields within a single comment section that precede and describe a section of code. The following comment section example includes a plurality of tagged documentation information that describes a function called "con_control_ft" that follows the comment section. The template includes three tags: a beginning of template tag (edt:), a tag that is associated with the return of the function con_control_ft (Return:), and a tag that is associated with an argument of the function (Argument:).
/*
* edt: drivers function con_control_ft
* Typedef for the interface control function.
* return: bool
* Status of request, TRUE for success, False otherwise.
* If FALSE, then errno will be set to indicate reason for
* failure.
* Argument: connector_st *connector
* Pointer to the connector of the interface to control.
*/
typedef bool (*con_control_ft) (connector_st *connector);
If the current template has been found, the template specifications are then determined in operation 208. The template specifications may be determined using any suitable search technique. In the above example, when a beginning template tag is found (e.g., "edt:"), the specifications (e.g., "drivers function con_control_ft") for the template are located within the same line as the beginning template tag. Each keyword of the specifications (e.g., drivers, function, and con_control_ft) is also expected to be separated by white space. The template specifications may include one or more keywords that specify how to control how documentation information is extracted from the template and/or output to the code documentation. In the above example, the specification keyword "drivers" indicates that any documentation information that is extracted from the current template is to be included within the "driver" chapter of the code documentation. By way of another example, the keyword "function" indicates that the template documentation information describes a code portion that is a function. Thus, when the actual code portion that follows the template is extracted later, the extraction process is facilitated by knowing what type of code portion to expect. In other words, since the code portion's format is known, relevant code portions may be readily identified and extracted for the code documentation. After the template specifications are determined, it is then determined whether the template is a remark type in operation 210. A remark type template inhibits the extraction of the template fields and prototype that may follow the template. Instead, the contents of the template (or remarks) are determined in operation 214 and are simply added to the code documentation in operations 108 and 110 of FIG. 1. If however, the template is not a remark type, the fields and prototype are analyzed in operation 212 of FIG. 6A. After the fields and prototypes are analyzed, a new current template or eof mark is found in operation 204. FIG. 7 is a flowchart further illustrating the operation 212 of FIG. 6A of analyzing the fields and prototype in accordance with one embodiment of the present invention. Initially, a current field or the end of the template is found in operation 402. It is then determined whether the current field has been found in operation 404. If the current field has been found, the field type, specifications, and associated text objects or remarks are determined, respectively, in operations 408 through 412. Any suitable technique may be implemented for determining the field type specifications, and associated remarks. For example, the field type corresponds to the field tag value; the field specifications are located on the same line as the field tag; and the associated remarks are found on the lines that follow the field tag and field specifications.
/*
* edt: drivers function con_control_ft
* Typedef for the interface control function.
* return: bool
* Status of request, TRUE for success, False otherwise.
* If FALSE, then errno will be set to indicate reason for
* failure.
* Argument: connector_st *connector
* Pointer to the connector of the interface to control.
*/
typedef bool (*con_control_ft) (connector_st *connector);
In the above example, the template may be scanned line-by-line to find the current field tag "Return:". The field type is found in the field name "Return:"; the field specification "bool" is read from the same line as the "Return:" tag. In this example, the field specification "bool" describes the format of the return value for the function "con_control_ft" as a Boolean return value. The field remarks are then read from the lines following the field tag and field specification. In this example, the remarks describe the meaning of the return value: "Status of request, TRUE for success, FALSE otherwise. If FALSE, then errno will be set to indicate reason for failure." The field tag "Argument" and associated field specifications and remarks are similarly determined. When the end of the template is found, the prototype is then determined in operation 406. The prototype may be determined using any suitable technique. For example, the prototype may be extracted directly from the code line that follows the template, as would be the case in the above example. That is, the function "typedef bool (*con_control_ft) (connector_st *connector);" would be extracted from the code portion following the template. Preferably, a lexical approach is utilized to determine the prototype. For example, the file type (e.g., header file) and template keyword "function" are used to locate the prototype within the code lines following the template. Alternatively, the prototype may be specified and identified by field tags within a portion of the template and/or code. For example, the field tag "prototype:" may be used to identify a portion of text as the prototype. That is, the prototype may then be extracted from the lines following the "prototype:" tag to the end of the template or the start of the next field, while the code portion after the template is ignored. Alternatively, the "prototype_end:" field may be inserted into a comment line after the template. In this alternative example, the lines spanning from the end of the template to the "prototype_end:" field are extracted and used as the prototype in the code documentation. Thus, the prototype may be part of the documentation input within the template and/or part of the code. Returning to FIG. 6A, operations 204 through 212 are repeated for each template until an eof mark is found. Preferably, as discussed above, an internal data structure is also generated that includes readily accessible documentation information for each template. When an eof mark is found, it may then be determined whether policy rules (if any) have been enforced in operation 302 (continued from "A" in FIG. 6B). Any suitable policy rules may be enforced to direct the programmer to include appropriate documentation input (e.g., templates, template specifications, tags or fields, field specifications, and/or text objects) for each type of prototype. For example, a programmer may be required to include "argument:" fields within a template preceding a function type prototype. If the programmer fails to follow the predetermined policy rules, any appropriate action may then be implemented. In this embodiment, an error message is output in operation 304 and the documentation procedure continues. Alternatively, the documentation procedure may halt while the programmer enters the required documentation input. After the error message is output or after it is determined that the policy rules have been followed, the fields and associated remarks may be reordered in operation 306. Next, the templates may be reordered in operation 308. The particular order may be predefined or set by the user with the invocation flags and/or based on the order of the field tags. The reordered templates, fields, and remarks may then be output in the form of an internal data structure, such as the previously discussed binary tree, in operation 310. After the tree is created, the operation 106 ends and the process 100 proceeds to operation 108 of FIG. 1, wherein the documentation information is formatted and then output in operation 110. Returning to FIG. 1, after the documentation information is extracted from the code file, the documentation information is formatted in operation 108 based on the value of the invocation flags and/or tags. Any suitable format for arranging the documentation information may be implemented for generating the code documentation. The format may be predefined or set by the user. For example, the user may set invocation flag values when executing the documentation procedure. As discussed above, FIG. 5 includes a list of invocation flags that are definable by the user. By way of example, the user may select the "-c" flag to cause the lines of code that follow the template to be printed after the template documentation information is printed within the code documentation. By way of another example, the code documentation may be output in the form of an HTML or RTF file. After the documentation information is formatted, in operation 110 the formatted documentation information may be output in a form that is based on the values of the invocation flags. For example, the user may set an invocation flag to output the formatted documentation information to the display screen. Alternatively, the output format or type may be predefined and not alterable by the user. FIG. 8 illustrates a typical, general-purpose computer system suitable for implementing the present invention. A computer system 530 includes at least one processor 532, also referred to as a central processing unit (CPU), that is coupled to memory devices. Processor 532 may be part of a network computer, e.g., processor 532 may be in communication with a network computer. The memory devices may generally include primary storage devices 534, such as a read only memory (ROM), and primary storage devices 536, such as a random access memory (RAM). ROM 534 acts to transfer data and instructions uni-directionally to CPU 532, while RAM 536 is used typically to transfer data and instructions to and from CPU 532 in a bi-directional manner. Both primary storage devices 534, 536 may include substantially any suitable computer-readable media. A secondary storage medium 538, which is typically a mass memory device, may also be coupled bi-directionally to CPU 532. In general, secondary storage medium 538 is arranged to provide additional data storage capacity, and may be a computer-readable medium that is used to store programs including computer code, computer program code devices, data, and the like. In one embodiment, secondary storage medium 538 may be a system database which is shared by multiple computer systems. Typically, secondary storage medium 538 is a storage medium such as a hard disk or a tape which may be slower than primary storage devices 534, 536. Secondary storage medium 538 may take the form of a well-know device including, but not limited to, magnetic and paper tape readers. The information retained within secondary storage medium 538, may, in appropriate cases, be incorporated in a standard fashion as part of RAM 536, e.g., as virtual memory. A specific primary storage device 534 such as a CD-ROM may also pass data uni-directionally to CPU 532. CPU 532 is also coupled to one or more input/output devices 540 that may include, but are not limited to, video monitors, track balls, mice, keyboards, microphones, touch-sensitive displays, transducer card readers, magnetic or paper tape readers, tablets, styluses, voice or handwriting recognizers, as well as other well-known input devices, such as other computers. Finally, CPU 532 may be coupled to a computer or a telecommunications network, e.g., an internet network or an intranet network, using a network connection as shown generally at 512. With such a network connection 512, it is contemplated that the CPU 532 may receive information from a network. CPU 532 may also output information to the network. Such information, which is often represented as a sequence of instructions to be executed using CPU 532, may be received from and outputted to the network, for example, in the form of a computer data signal embodied in a carrier wave. The above-described devices and materials will be familiar to those of skill in the computer hardware and software arts. Although the foregoing invention has been described in some detail for purposes of clarity of understanding, it will be apparent that certain changes and modifications may be practiced within the scope of the appended claims. It should be noted that there are many alternative ways of implementing both the process and apparatus of the present invention. For example, the extraction operations may be performed in any suitable order for extracting documentation information, i.e., the global default tags may be extracted after the templates and prototypes are extracted. By way of another example, the internal data structure of the documentation information may be created prior to reordering the fields and templates. Accordingly, the present embodiments are to be considered as illustrative and not restrictive, and the invention is not to be limited to the details given herein, but may be modified within the scope and equivalents of the appended claims.
|
Same subclass Same class Consider this |
||||||||||
