Structures and methods for representing and processing documents4959769Abstract A document processing system including a document structure and a library of routines for manipulating the document structure. The components of the document structure are made up of individually-locatable blocks. The components include a chain of text blocks which contains at least one document page and includes at least one block, one or more chains of reference blocks, each chain containing a reference and including at least one reference block, information attributes in the text blocks which relate locations in the text of the document to item numbers referring to references, a page index which relates page numbers to the text blocks at which the pages begin, and a reference index which relates each item number to the first reference block in the chain containing the reference. The document structure may only be manipulated by means of routines in the document manager library. The routines in the library are accessible to programs such as editor programs and printing programs which manipulate documents. Claims What is claimed is: Description BACKGROUND OF THE INVENTION
______________________________________
GRAPHIC DELETE COLUMN
CURSOR REPLACE SAVE
SCREEN VISUAL RECALL
PAGE INFORMATIONAL HELP
GOTO PAGE FORMAT SUPER SEARCH
INSERT MARK SUPER COPY
SEARCH COMMAND SUPER REPLACE
COPY GLOSSARY SUPER COMMAND
MOVE PRINT DEFAULT
EXECUTE NAME VIEW
CANCEL
______________________________________
ST 72, as described above, contains information relating machine state and keystroke class to corresponding routine vectors and is arranged as a set of rows wherein each row contains, in order by keystroke class, the vectors for each keystroke class for a particular state. ST 72 may thereby be indexed by state, to select a corresponding state row of vectors, and by keystroke class, to select a vector for that keystroke class in that state. Considering now the operation of the keystroke processing and state machine, KS 70 receives information regarding keystroke inputs from KB 46 through CPU 44. This information identifies both keystroke class and the specific keystroke within the class. As previously described, information regarding current machine state resides in VARS 66. The keystroke class, keystroke and state information is provided, as indicated in FIG. 3, to the keystroke handling routines of KS 70. The keystroke handling routines in turn generate a corresponding input to ST 72 to index the state tables by state and keystroke class. ST 72 responds by providing as an output a vector identifying the appropriate SR 28 or OR 30/OAR 42 routine for the keystroke and machine state. As previously described, the vector output of ST 72 is loaded into EP 64 and the appropriate action, depending upon routine Type, is initiated by SR 28. As also indicated in FIG. 3, and as described further below. ST 72 concurrently provides the resulting vector as inputs to SR 28's stack mechanism. 3. Stack Mechanism As previously described, SR 28 provides a stack mechanism performing three primary functions, which are identifying which overlay should currently reside in OAR 42, identifying which routine is currently being executed, and storing the memory image of saved overlays. The saving of overlays, that is, OAR 42 routines, by SAVES 62, which is a part of the ST 28 stack mechanism, has been previously described with reference to the operation of AOR 42. The remaining stack mechanism functions are performed by Reload Stack (RLDS) 76, previously mentioned, and Module Stack (MODS) 78 which as indicated in FIG. 3 and previously described, receive inputs from the vector output of ST 72. RLDS 76 receives and stores vectors from ST 72 and the top of RLDS 76 always contains the vector of the overlay which should currently be in OAR 42. RLDS 76 allows the vectors of interrupted routines to be saved so that interrupted routines may be returned to at the completion of execution of the interrupting routines. In this respect, RLDS 76 is the primary means of saving routines when it is not necessary to save the actual routine, for example, by stacking the vector of an interrupted overlay routine. As previously described, SAVES 62 is provided to save routines in their entirety, that is, the actual code, when necessary. As indicated in FIG. 3, RLDS 76 provides an input to EP 64 to allow the loading into EP 64 and subsequent reinitiation of interrupted routines. MODS 78 receives and stores only the Type field of the vector of the currently executing routine. The information residing in MODS 78 is used by SR 28 in determining the appropriate handling of interrupted and returned routines and may be updated as the Type of a routine changes, for example, from overlay to resident. SM 74 includes certain routines which are of interest in understanding the operation of SR 28's stack mechanisms; these routines include FREE, LOAD, PUSH, POP, RELOAD and ENTRY and will be described below in that order. The primary function of FREE is to free the overlay area, that is, AOR 42, for the loading of another overlay, by setting a flag indicating that AOR 42 is to be `reloaded` with the proper overlay. Other routines in SR 28 detect the state of this flag and initiate the appropriate operation to reload AOR 42. For example, if the information residing in the top of RLDS 76 indicates that the current overlay must be saved. SR 28 will initiate an operation to save that overlay in SAVES 62 before initiating a request to load AOR 42 with the new overlay. LOAD is used to initiate overlay routines and is called after EP 64 is loaded with a vector to the new overlay routine. If the routine must be overlayed, that is, loaded into AOR 42. LOAD will in turn call FREE. In addition, LOAD will save, on RLDS 76, the vector of the routine calling LOAD for subsequent use by PUSH or ENTRY, described below. The function of PUSH is to stack information concerning the last loaded overlay so that the overlay may be recovered if destroyed in some manner. PUSH first pushes the Type field of the last loaded overlay onto MODS 78. If the routine is not resident, PUSH will also push the routine's entire vector onto RLDS 76. If the routine is of the type which must be saved, PUSH will push the routine onto SAVES 62 and place the FRSN of the routine's location in SAVES 62 into the FRSN/Address field of the routine's vector on RLDS 76. In addition, PUSH will change the vector's Type field from `saved` to `internal` to reflect the change in Type of the routine. The function of POP is to `throw away` the top entry of MODS 78. If this routine is not of the resident Type, POP will also throw away the top entry of RLDS 76. If the routine is of the saved Type, POP will also delete the entry in SAVES 64. Finally, POP will reset a `reload` flag to indicate to SR 28 that the correct overlay is not resident in AOR 42. The function of RELOAD is to ensure that the routine specified by the top vector of RLDS 76 is currently resident in AOR 42. If the reload flag is set and the current routine, as indicated by the top entry in MODS 78, is a nonresident Type. RELOAD will load the correct overlay into AOR 42. ENTRY operates in conjunction with LOAD to provide the entry point of the last LOADed routine. Having described the structure and certain aspects of the operation of SR 28 and, in particular, the keystroke processing and stack mechanisms of SR 28, the fundamental operating sequence of SR 28 as a whole will be described next below. 4. Basic Operating Sequence of SR 28 The primary functions of SR 28 and the state machine implemented therein are, as described in part above, to maintain and operate the state machine, to overlay routines as required, to handle critical displays, for example, messages and menues, and to accept and process keystrokes. To perform these functions, SR 28 and the state machine repeatedly execute, in order, a sequence of four phases of operation. These phases are referred to, in the order executed, as the Overlay, Reload, Display and Keystroke phases and will be described next below in that order. a. Overlay Phase Overlay Phase is responsible for ensuring that the selected routine is in AOR 42, and for executing the selected routine. This phase begins with the vector of the selected routine residing in EP 64. If the routine is resident in memory, that is, in AOR 42 or SR 28, the machine skips to execution of the routine. If the routine is not resident, that is, is an overlay routine not resident in AOR 28, SR 28 calls LOAD by loading LOAD's vector into EP 64 and the overlay is loaded into AOR 28. With the routine resident in memory, SR 28 proceeds to execution of the routine by first calling ENTRY to determine the entry point of the routine and then executing the routine. The Overlay Phase is usually completed at this point, that is, when the execution of the selected routine is completed. In certain cases, however, the selected routine may invoke routines residing in other overlays. In such cases, EP 64 is loaded with the vector to the invoked routine and the Overlay Phase is restarted. Reload Phase Reload phase is provided and initiated to ensure that the currently active routine is resident in AOR 42 in certain cases when the Overlay Phase does not perform this function. The first such case is that of certain overlay routines which invoke other routines that, when completed, return control to the general SR 28 routines rather than to the invoking routine. The second case occurs when a routine, when completed, calls POP rather than returning to the invoking routine. In both cases, the function of Reload Phase is to reload the correct overlay into AOR 42 and does so by calling RELOAD, described above. c. Display Phase SR 28 performs all critical display functions to Display 48 during this phase. Such displays include providing prompts and messages to the user, displaying menu choices available to the user, and updating the display of attributes, described further below. d. Keystroke Phase During this phase, SR 28 performs the keystroke processing operations previously described. That is, SR 28 receives a keystroke from KB 46 and state information from VARS 66, indexes ST 72 for the class of keystroke and current state to obtain the correct vector for the selected routine, and loads the vector into EP 64. At this point, the machine has returned to the initial condition of the Overlay Phase and the four phases are repeated in the order and as described above. Having described the control structure of the present system, the document structure of the present invention will be described next below. B. Document Structure (FIG. 4) The document structure of the present invention, that is, DS 32, is, as previously described, designed for efficient use of memory capacity while providing the flexibility required to generate very complex documents and to support advanced editing features. The primary function of the document structure is the storage and ready access of sequential text, organized into logical user specified pages of arbitrary length. The structure allows fast and efficient character and page editing and allows for the application of a large number of visual attributes, or enhancements, to the characters of a text. Certain of these editing features include visual attributes, such as underlining, bold type and various fonts, and information attributes, such as notes, footnotes and voice. The document structure also allows the application of character related information which is not primary visual in nature, such as optionally printed text, table of content and index generation, and temporary markers used for editing aids. Additional features allow the user to assign names to various portions of a document and to access and operate upon named portions through those names. The basic element of a document is a fixed size block of information, the size of which is determined by a convenient and efficiently sized unit of memory space in which the document is created and operated upon. In the present embodiment in System 10, the block size is determined to be two Disc 18 sectors, that is, 512 bytes. In another system, for example, a centralized system based upon a general purpose computer, the block size may be determined by the size of the data blocks transferred between the computer main memory and a cache memory, or a multiple thereof. As will be described below, a document structure is constructed of several different types of blocks, each having a unique internal structure and serving different, specific purposes and assembled as required to create a document. Of these blocks, certain blocks are required in any document while others are used as required. Certain blocks are always located at fixed points in the document and others are located through the pointers which form an integral part of the document structure. In addition, certain blocks, for example, blocks containing text, may be chained together as required. The document structure is thereby flexible and expandable, occupying no more memory capacity than is required for particular document but capable of accommodating very large and complex documents, and provides fast and easy access to any part of a document. 1. Basic Block Structure All blocks in the present document structure have a fixed internal structure comprised of a Header area and a Data area. The Header area in turn has a standard, fixed structure while the structure of the Data area depends upon the block type. The Header area includes a Block Type field identifying the block type, Forward and Backward Pointer fields used to chain together blocks of the same type, and Top and Bottom Offset fields identifying the location of the block data within the Data area. Other Header fields include a Number Of Items field used in data compression and recovery operations, a Document ID field used to identify the document to which the block belongs, and certain Checksum information for error detection. Not all blocks require the use of all of the fields defined within the standard block Header area; in such cases the unused fields are undefined and are not used but are not deleted from the Header area. 2. Basic Block Types As described above, each document s comprised of a combination or assembly of various types of blocks, which can be divided into three major functional categories. Management Blocks, Indexing Blocks and Text/Data Storage blocks, referred to as Information Item Blocks. Certain blocks are required in any document while other blocks may appear only in complex documents and the document structure allows the addition of further block types as required. Management blocks are required in any document and contain printing and statistical information and user defined editing parameters for the document. Presently defined Management Blocks include an Administrative/System Block, a Style Block and a Free Block Bit Map Block. Indexing Blocks are used to locate the various Information Item Blocks which contain the actual text and information of the document. Presently defined Indexing Blocks include a Document Table, a Named Item Index, and Primary and Secondary Indexes. The Document Table is located at a fixed point in the document and is used to locate the Named Item Index and the Primary Indexes. The Primary Indexes are used in turn to locate the Secondary Indexes and the Secondary Indexes are used to locate Information Item Blocks. Certain Information Item Blocks, and the Named Item Indexes, may be chained together through the Forward and Backward Pointers contained in their Header areas, thus providing yet another level of linking of blocks. It should be noted that when a document does not contain more Information Item Blocks of a given type than can be identified within the capacity of a single Secondary Index, the Primary Indexes for that block type are not created and the Document Table entry for that type points directly to the single Secondary Index for that block type. Finally, the Information Item Blocks contain, as described above and described in detail below, every type of information appearing in a document. Most Information Item Blocks having text can have that text enhanced by visual attributes, such as color and font, and can contain references to information attributes, such as format lines and footnotes. The presently defined types of Information Item Blocks, each of which will be described in further detail below, include:
______________________________________
Text Formats
Headers/Footers Pictures
Free Form Regions Text Shelves Footnotes
Notes Equation Regions
Voice Messages Merge Data
Data Shelves
______________________________________
Certain embodiments of the present invention may also provide Matrices Blocks and External Data Blocks, as described below. As described above, additional Information Item Block types may be defined as required and incorporated within the doCument structure in the same manner as the types listed above. Other types of references which may be inserted into a document include, in addition to attributes, described below, Text Insertion References and Named Marks. The document structure described below also includes, as described below, means for handling text appearing in column form. 3. Minimum Document Blocks As described above, certain of the blocks described above are required in any document. In the present embodiment of the document structure, these blocks include, for a minimum document, the:
______________________________________
Document Table Secondary Text Index
Administrative/System Block
Text Block
Style Block Secondary Format Index
Free Block Bit Map
Format Information Item Block
______________________________________
It should be noted, with regard to the two Secondary Indexes entries listed above, that, as previously described, a minimum document may contain a single Secondary Index for a particular Information Item Block and the Secondary Index may be located directly through the corresponding Document Table entry. Having described the major categories of block type, and briefly the types of block within each category, each of the block types will be described in further detail below. 4. Management Blocks The Administrative/System Block contains keystroke interpretation and administrative information and may be chained to other Administrative/System Blocks for very complex documents. The Style Block contains user definable defaults concerning, for example, document character style to be used if the user defaults, that is, does not define a different style. The Free Block Bit Map Block contains information identifying, for each block in a document, whether a particular block is currently in use. Bit Map Blocks are used by the system to efficiently allocate and deallocate blocks, that is, memory space. Bit Map Blocks may be chained, thereby allowing a complete physical mapping of every block or, in the present embodiment, disc sector. 5. Indexing Blocks The following descriptions of the Indexing and Information Item Blocks will refer to FIG. 4, which illustrates the document structure of the present invention and the relationships between Indexing and Information Item Blocks. As previously described, the Indexing Blocks include the Document Table, Primary Indexes and Secondary Indexes. Referring to FIG. 4, each document contains a single Document Table (DT) 80, which contains a pointer to a Primary Index (PI) for each type of Information in Block type appearing in a particular document. Each PI 82 in turn contains pointers to one or more Secondary Indexes (SIs) 84 for that Information Item Block type and each SI 84 contains, in turn, pointers to the Information Item Blocks (IIBs) 86 of that type appearing in the document. As previously described, in those cases wherein the number of IIBs 86 of a certain type is less than the number of pointers which may be accommodated in a corresponding single SI 84, the corresponding PI 82 is not used and the DT 80 entry points directly to the SI 84 for that IIB 86 type. It should be noted that, in the present embodiment, the pointers used in the Indexing Blocks, that is, in DT 80, PIs 82 and SIs 84 are comprised of File Reference Serial Numbers, that is, the logical as opposed to physical addresses of the elements pointed to. As will be described further below, IIBs 86 of certain types may be chained together with other IIBs 86 of the same type through the Forward and Backward Pointers in the IIB 86 Header areas. In such cases, an SI 84 pointer to a chain of IIBs 86 may point to the first IIB 86 of the chain and the remaining IIBs 86 of the chain may be located through the Forward and Backward Pointers. a. The Document Table The DT 80 is always located at a fixed point in the document structure, that is, at the start of the document, and there is only one DT 80. The Header area of DT 80 is of the standard, fixed structure previously described. The Data area contains a space or location for a pointer to the PI 82 or SI 84 for each possible type of IIB 86. If a particular type of IIB 86 does not appear in a document, the DT 80 entry for that type is null entry, for example, zero. In the present embodiment, the DT 80 Data area contains the following pointers: Named Item Index Primary (or Secondary) Text Index Primary (or Secondary) Format Index Primary (or Secondary) Note Index Primary (or Secondary) Free Form Region Index Primary (or Secondary) Footnote Index Primary (or Secondary) Header Index Primary (or Secondary) Footer Index Primary (or Secondary) Matrix Index Primary (or Secondary) Picture Index Primary (or Secondary) Voice Index Primary (or Secondary) External Data Index Primary (or Secondary) Merge Data Index Primary (or Secondary) Equation Region Index Text Insertion Index Named Marks Index Binary Indexes As previously described, there is a PI 82 for each IIB 86 type appearing in a document and the Data area of each PI 82 contains pointers to the SIs 84 for the corresponding block type. In the Header area of a PI 82, the Number Of Items field will contain the number of SIs 84 referenced from the PI 82. There will be, in the present embodiment, only one PI 84 for each block type; in other embodiments, for example, PIs 82 may be chainable within each block type. When a document is first created there will be, as previously described, only SIs 84 and probably only two such SIs 84, one for a Text Page IIB 86 and one for a Format Line IIB 86. As the document grows in complexity, the capacity of single SIs 84 will be exceeded and further SIs 84 will be created. As a second such SI 84 is created for a particular block type, a PI 82 for that type will also be created with pointers to the SIs 84 of that type, and the DT 80 entry for that type will be changed to point to the PI 82 for that type. c. Secondary Indexes The general structure of SIs 84 is similar to that of LIs 82 described above. As previously described, an SI 84 is pointed to by an entry in a corresponding PI 82 and contains pointers to the IIBs 86 of that block type. There may be multiple SIs 84 for a particular block type and, if so, the Header area will contain a flag indicating this fact, SIs 84 may not, however, be chained in the present embodiment, but may be chained in other embodiments. The SI 84 Data area contains a pointer to each IIB 80 referenced through the SI 84, and for each such pointer, information as to whether the particular information item, that is, IIB 86, is named, the number of times it is referenced, and whether it is referenced from another IIB 86. 1. Secondary Text Page Indexes Although the structure of a SI 84 for a Text Page IIB 86 is the same as any other SI 84, such SI 84s are unique in that the index contained therein is continuous, that is, no vacant entries are allowed. This restriction provides for a special property of Text Page IIB 84s; that is, that the number of a document page, which as illustrated in FIG. 4 is comprised of one or more IIBs 84, is always the same as that of the IIB 84. For example, the entry of the 45th page in a document is always the 45th entry within the first SI 84 Text Page Index. The Secondary Text Page Index may therefore always be used to fine the first Text Page Block of a document's page. A document page can be comprised of any number of Text Page Blocks chained together by the Forward and Backward Pointers in the Block Header areas. 2. Secondary Header and Footer Indexes Secondary Header and Footer Indexes have the same structure as all other SI 84s except that all item numbers must be assigned on even boundaries when new Header and Footer IIB 86s are created. This restriction provides space in the indexes to allow for the generation of either primary or first and second alternate Headers and Footers. d. Named Item Index The Named Item Index, which appears as a PI 82 in FIG. 4, provides a parallel access path to IIBs 86 which have been assigned names by the user. That is, an IIB 86 can be located by its name as well as by its Item Number, described below, or FRSN. The Named Item Index Data area contains an entry for each IIB 86 which has been assigned a name. Each entry includes the IIB 8s's type, name and Item Number, Text Shelves, a type of IIB 86 described below, are identified by their FRSNs rather than by their Item Numbers, Entries are maintained in ascending order by type and name, no blank entries are permitted in the index, and Named Item Indexes may be chained through their Forward and Backward Pointers. 6. Information Item Blocks As previously described, the actual text or other information of a document is contained in Information Item Blocks (IIBs) 86 and there is a type of IIB 86 for each type of information that appears or may appear in a document. An IIB 86 may, for example, contain test and/or attributes, text and/or attributes to be interpreted as columns or rows of columns, file names for information stored externally to the document, and any other form of information. Each IIB 86 has an associated Item Number that is used to locate the IIB 86 within the Index Blocks described above. For Information attributes, described below, the Item Number is arbitrary. For text pages, however, which have been previously described as comprised of one or more IIBs 86, the item number is implied and is the same as the page number. In all cases, however, the Item Number leads to the first IIB 86 of an information item of arbitrary length and the blocks may be chained together through the Forward and Backward pointers residing in their Header areas. The general structure of an IIB 86 is similar to that of the Index Blocks described above, that is, with a standard Header area and a Data area. The Data area differs, however, and may contain text or attributes or both. Text is entered from the top to the bottom of the Data area and attributes are entered from the bottom to the top. A typical Data area may therefore have text in its upper portions, attributes in its lower portion and free area between, which becomes filled as text and/or attributes are entered. Either text or attributes may occupy the entire Data area, or as much of the Data area as is not occupied by, respectively, attributes or text. In addition to the Forward and Backward pointers and other Header elements, the Top and Bottom Offset fields of the Header are used to point. respectively, to the last valid character in the Data area and the last valid attribute in the Data area. Having described the general structure of IIBs 86, the individual types of IIBs 86 of the present embodiment will be described next below. It should be noted that further types may be added as required and that a type described below need not appear in a particular document or implementation. a. Text Blocks The most common form of IIB 86 is the Text Block which contains the text of the document and the attribute information, described further below, pertaining to the text contained therein. Text Blocks contain the actual body of a document text, including all visual and descriptive attributes and all information comprising references. Text Blocks can be chained together or can exist as independent blocks with the main body of a document's text existing as a single chain of blocks, beginning with the first block of the first page of the document and ending with the last block of the last page. Document pages wherein the text occupies more than one Text Block are created, to any arbitrary length, be chaining together Text Blocks. As described above, text occupies the Data area from top to bottom and attribute information from bottom to top. The last text character appearing in a block is always an End of Text Character to identify the end of a page. Any number of Text Blocks may be chained and a Text Block is referenced either through a Text SI 84 by Item Number or through a Secondary Named Text Index by page number or name. b. Format Blocks A Format Block contain data pertaining to format lines, that is, lines defining the physical layout characteristics of a text line, for example, the locations of Tabs. All documents must contain at least one format line and a format line may be referenced any number of times from an location within the document and may be named. As described above, a format reference is used to specify data to control text display, formatting, and printing characteristics, as well as the width of a single or multiple columns. A format reference will be found at the beginning of every text page, at the start of every distinct column region, and at other arbitrary user specified locations within text pages. In addition, a format reference is required at the beginning of item chains for all notes, footnotes, headers and footers and may be found at other locations within such items. A format reference is a `forced-break` reference, that is, the attribute character, described below, with which the reference is associated is always the first character in the text block in which it is found. If a new format line is inserted into a Text Block, the block is split into two blocks at the point of insertion and an End Of Text Character inserted at the end of the text in the block before the inserted format line. This feature allows text to be easily inserted before format lines and page breaks. Format references are also used to control the placement and configuration of column regions and to specify special conditions, such as the presence of soft or hard page breaks. c. Text Shelf Blocks Text shelves are named storage areas used during editing to same and retrieve portions of text and are not normally printed. A text shelf contains both the text and the attributes pertaining thereto and is a permanent part of the document but cannot be referenced as are other IIBs 86. A Text Shelf Block may be referenced only through the Named Item Index and no SI 84 exists for Text Shelf Blocks. d. Note Blocks A Note Block contains the text and any applicable attributes of notes appearing in the document and a single note may be comprised of several chained Note Blocks. e. Free Form Region and Equation Blocks A Free Form Region of a document may contain any non-wordwrapped text or any graphic that can be entered through KB 46 and any attributes applicable thereto. Every space in a Free Form Region is defined, that is, it does not contain any `white space`, and graphics and text may be entered at any point in the region. Examples of Free Form Regions include scientific equations and charts. Free Form Region Blocks may be chained to create as large a Free Form Region as required. An Equation Block is similar to a Free Form Region Block, or a Graphics Block, but is particularly designated to contain information in the form of equations. f. Footnote Blocks A Footnote Block contains the text and applicable attributes of a footnote and a single footnote may be comprised of chained Footnote Blocks. g. Header/Footer Blocks Headers and Footers are restricted attributes, that is, they can be placed only at the top of a page, immediately after the format line. There are three types of Headers and Footers. A Primary Header/Footer is printed on every page of the document, a First Alternate Header/Footer is printed on every other page, and a Second Alternate is printed on the pages interleaved with the pages having First Alternate Header/Footers. Headers and Footers contain options which may pertain to specific Headers and Footers, such as print styles, lines printed on, and page numbering. The Header area of a Header/Footer Block contains unique information pertaining to these options. h. Matrix Element Text Blocks A matrix is a two dimensional table, or array, of areas of wordwrapped text with each such area being referred to as a cell. The text and attributes of a single such cell are contained in a corresponding Matrix Element Block, a type of IIB 86. Format lines defining the columns of the matrix are contained in Format IIB 86s are treated as elements of the matrix. The first element of a matrix column is always a format line, there is always a format line for each column of a matrix and a format line may be referenced by any number of Matrix Element Blocks. This restriction on the assignment of format lines, that is, one for each column, allows the columns and rows of the matrix to be easily rotated or interchanged. The text within a cell is unique in that it cannot be modified by any other format line than that appearing with reference to the column containing the cell. The Matrix Element Blocks and Format Blocks of a particular matrix are located through a Matrix Description Table, which also contains the definition of the matrix. Matrix Description Tables are in turn located through Primary and Secondary Matrix Indexes. A Matrix Description Table has the same structure as the blocks previously described and contains, as described, the information necessary to completely define a matrix. The Data area contains FRSNs pointing to the text blocks and format lines of the matrix with each FRSN pointing to the beginning of a Matrix Element Block, the smallest unit of a matrix. In addition to the standard information, the Header area identifies the number of rows and columns of the matrix. Each Matrix Element Block contains normal wordwrapped text and any applicable attributes of a cell of the matrix and are referenced in the Matrix Description Table in row order from left to right. i. Picture Blocks A Picture Block contains the name of a file containing, in turn, a graphic, that is, picture and may contain additional information identifying the area of the document to be occupied by the picture. As previously described, Picture Blocks will normally be used with system having bit mapped display and printing capabilities. j. Voice Blocks Voice Blocks may contain the names of files containing voice messages, for example, in Digital Voice Store and Forward (DVX) systems. k. External Data External Data Blocks may contain the names of files external to the system which contain programs or data operating upon data within the system or used by the system. The provision of External Data Blocks allows, for example, programs residing in external files to be overlayed to operate upon data within a file in the document. External data may also be incorporated into a document through an attribute reference, as described below. l. Merge Data Blocks A Merge Data Block is a chain of text which contain encoded instructions for performing merge operations between an external text source and the document. The position of a merge attribute character in a text chain specifies the position at which the merging is to occur. The instructions indicate how to perform the merge operation and there is no restriction on the contents of the merge data chain. Merge data text may contain additional references to other formats, so that columns may be placed in merge chains. m. Text Insert A text insert reference is a temporary local reference attribute which does not bear an item number and which consists only of a reference attribute character and a reference word, as described below. The purpose of a Text Insert is to create a forced block break at a point where text is to be inserted. n. Named Marks Named Marks are user specified permanent position markers. When applied, the character to be marked is moved to the beginning of a new block and the occurrence of a Named Mark s indicated in the header of the new block, resulting in a forced block break. The block or item number of the new block is then placed in the Named Item Index. o. Columns Parallel columns of text appearing in a document are treated as a special case of normal word-wrapped text. The text in a column consists of a portion of a text page chain containing text, visual attributes, and reference attributes. Each column begins with a format line controlling the display of text therein, and has essentially unlimited length. A column may be interrupted by a format break or page break. A column is terminated by another format, which may in turn contain one or more columns and may be at a page break. It is therefore possible to have, in a single page, a region of three columns followed by a region of two columns, and so on. In addition to format data, columns require block linking pointers to connect columns together, if necessary. Format line and data specifications of columns appearing in a single page are all included into a single format line with multiple codes to delimit the extent of each column. Column text is stored in a text page chain in sequential form, with the text of the first column in a multi-column region following immediately after the text of the preceding region. The last block of text of the preceding region is chained to the first block of text in the column region, which contain a reference to the formats for the column regions. The last block of the first column is chained to the first block of the next column, and so on to the end of the column region, wherein the last block of the last column is changed to the next succeeding block. In order to easily perform whole-column operations, the top blocks of each column in a column region are linked together by side pointers located in the format attribute words found at the start of each column. Having described the various types of IIB 86, the relationship between text and attributes, referred to in the above descriptions, will be described next below. 7. Text and Attributes As previously described, any IIB 86 may contain, in the Data area, both text and attributes. Attributes, which appear as words written in the lower part of the block Data area may, as previously described, effect the visual appearance of the text, may be descriptive in indicating that a character is to be optionally printed or is to be used in generating a table of contents or an index, or may contain information pertaining to the text, for example, footnotes. Visual and descriptive attributes are always applied to a range of characters, which may be as short as one character. There may be a number of distinct visual/descriptive attributes appearing in a single block. If the same visual or descriptive attribute is applied to characters separated by at least one characters, to attributes will be present; if, however, the same attribute is applied to consecutive characters, a single attribute will result. Informational attributes usually appear as units of text or data existing between two text characters and are referenced or incorporated into the text through a reference to a block containing the informational text or data. Attribute words occupy space in an IIB 86 Data area only when defined. In an IIB 86 containing only text with no assigned attributes, therefore, the text may occupy the entire Data area. Conversely, it is possible to have an IIB 86 wherein the entire Data area is occupied by attribute words. Attribute words are defined only within a Text Block and have meaning and are applicable only within the Text Block; attributes cannot span over two or more Text Blocks. 1. Visual/Descriptive Attributes Visual/Descriptive attributes are applied by the user over a range of characters appearing in the text, from one character to all characters appearing in the Text Block. Whether or not certain visual attributes are displayable, depends upon the capabilities of Display 48. A visual/descriptive attribute word will contain information identifying whether the attribute is visual or informational, the position of the first character in the Text Block affected by the attribute, and the position of the last character in the Text Block affected by the attribute. Also included is information identifying the attribute to be applied. Only one attribute is specified by each attribute word and, if text characters have more than one visual attribute, multiple attribute words are required. Attributes implemented in the present embodiment of the document structure include, but are not limited to, the following:
______________________________________
Underline Color Change
Double Underline Revision Mark
Superscript Subscript
Bold Table of Contents Mark
Font Change Index/Occurrence Mark
Merge Hyphen
Character Set Change
Table Of Contents
No Break Strike-Through
Optional Text Index Generation
______________________________________
2. Informational Attributes As described above, informational attributes are units of text of data that exist between two text characters. Informational attributes are represented by a unique, unprintable character and by informational attribute words appearing in the attribute area of the Text Block Data area. Only one informational attribute may be associated with the informational character in a single occurrence of the informational character and each informational word may define only one informational attribute. The data associated with the information character is, for each occurrence, kept in IIB 86s and are located through the Indexing Blocks through their Item Numbers. An informational attribute word contains information identifying the word as referring to an informational attribute, the type of attribute, and the Item Number of the attribute. The word also contains information identifying the location within the text where the informational attribute takes effect and, in the case of, for example, Picture or Free Form Regions, may identify the horizontal and vertical space required in the document for the attribute. The forms of informational attribute implemented in the present embodiment include, but are not limited to:
______________________________________
Format References Matrix References
Note References Picture References
Free Form Region References
Voice References
Footnote References
External Data References
______________________________________
3. Attribute Sorting Order The attribute words stored in the attribute area of a Text Block are maintained in a specific order to provide ready and logical access to the words while fetching characters and associated attributes. If two or more attributes begin or are located at the same point in the text, their order is determined first by attribute type, that is, reference attributes, such as informational attributes, will occur prior to visual or descriptive attributes. Additional Description of a Preferred Embodiment The present additional description of a preferred embodiment provides more detailed descriptions of certain aspects of the document structure described in U.S. Ser. No. 538,644. The additional description will begin with a discussion of the relationship between disk files, document files, and the document structure of the present invention, will then present overviews of the fully-developed document structure and the document manipulation programs of the present invention, and will finally discuss salient details of the document structure and of document manipulation. I. Overview of the Document Structure As pointed out in U.S. Ser. No. 538,644, the document structure is implemented in a document file. FIG. 5 shows the logical structure of the document file in which the present invention is implemented. Document file 501 consists of a set of equal-sized blocks 503. Each block 503 has a number, indicated in FIG. 5 by a value in parentheses. Within a document file, a given block 503 may be located by specifying the block's number. In a present embodiment, each block 503 contains 512 bytes of data and corresponds to two 256-byte disk blocks. In other embodiments, there may be other relationships between a block 503 and a disk block. The document structure of the present invention is made up of blocks 503 in a document file 501. FIG. 6 presents an overview of the document structure as it is implemented in document file 501. Each block 503 contains a header (H 602). As will be explained in more detail hereinafter, the header contains information including at least the type of its block 503, and may also contain the numbers of other blocks 503. By means of H 602, blocks 503 may thus be linked into chains. The contents of the remainder of block 503 depends on the type of the block. In some blocks 503 (for example, PRIX 611), the remainder contains pointers to other blocks; in others (for example, TB 621), the remainder contains the text 629 of the document represented by the document structure and attributes 631 further describing the text 629. A. Administrative Blocks The block types used in the document structure fall into three functional classes: management blocks, index blocks, and text/data blocks. In FIG. 6, the management blocks are represented by bit map (BM) block 601, administration block 1 (AD1) 603, administration block 2 (AD2) 605, and Document Table (DT) 80, These blocks contain information required for the management of the document represented by the document structure. In a present embodiment, they occupy blocks having the same block numbers in all documents. BM 601 contains a bit map which includes a bit corresponding to each block 503 in document file 501 containing the document. The state of the bit indicates whether that block 503 is currently being used in the document structure. If the number of blocks in a document file 501 exceeds the capacity of a single BM 601, additional BMs 601 are chained to the first. Each additional BM 601 has a predetermined block number. AD1 603 and AD2 605 contain information such as the title and author of the document, the date of creation, the date of last access, the location currently being edited in the document, and default values for formatting the document. DT 80 contains pointers to the index blocks 503 by which the remaining blocks 503 may be located. In a preferred embodiment, there are four pointers, each one pointing to a different type of index. Each pointer specifies the index by specifying the block number of the first block in the index. Here and elsewhere in FIG. 6, the block pointed to by a pointer is represented in the pointer by a letter representing the block's number. Thus, NIP 609 contains the letter a, indicating that it points to block 503(a), which bears the legend NIX 607. NIX 607 is the first block of an index by which names consisting of text strings may be associated with components of the document structure. RIP 613 points to PRIX 611, the first block 503 of an index by which references specified by attributes may be located. PIP 617 points to SPIX 619, the first block of an index by which pages of the document may be located. FIP 629, finally, points to FIX 619, the first block of an index by which the display fonts available for a document may be specified. B. Index Blocks Turning to the indexes and beginning with the name index, that index is made up of at least one NIX block 607. NIX block 607 contains NPs 610. Each NP 610 establishes a relationship between a user-supplied character-string name (NA) referring to a component of the document and a value which locates the first block 503 of the component. The value may be a block number, an item number, or a page number. Item numbers are used in the reference index, and page numbers are used in the page index. Thus, a component identified by a name may be located by searching for a NP 610 containing the name in NIX 607, and when it is found, using the value associated with the name in NP 610 to locate the first block 503 of the component. If the number of named components exceeds the capacity of a single NIX 607, an additional NIX 607 is chained to the first, as shown in FIG. 6. In order to simplify searching in a present embodiment, the NPs 610 are arranged by the type of component they point to, and within a type, they are arranged alphabetically by name. In FIG. 6, two NP 610s are shown, one referring to block 503(d) and the other to block 503(f). As will be explained in more detail later, certain components of the document structure may be referred to only by name and are therefore accessible only by means of NIX 607. Most components, however, are accessible through either PRIX 611 or SPIX 619 and may additionally have a name and be accessible through NIX 607. The reference index establishes a relationship between an item number identifying a reference and the block number of a block 503 containing the reference. A reference is information which is required in addition to the text of a document to produce the document. One example of a reference is the format specifier for a section of a document. The format specifier specifies the form of the document's page, including such matters as the left and right margins, the number and width of columns, and the intervals between tab stops. The reference index in FIG. 6 contains three blocks 503: primary reference index, (PRIX) 611 and two secondary reference index blocks (SRIX) 615. PRIX 611 is used when there are more references in a document than can be contained in a single SRIX 615. PRIX 611 contains secondary reference index pointers (SRIP) 637. Each SRIP 637 associates the high-order bits of an item number (HIN) with the block number of SRIX 615 which contains the block number of the reference itself. SRIX 615 associates the low-order bits of an item number (LIN) with the number of the block 503 containing the reference. Thus, between them, PRIX 611 and SRIX 615 associate an item number specifying a reference with the number of the block 503 containing the reference. For example, in FIG. 6, the SRIP 637 associates the high-order bits with c, the block number of block 503(c), which contains a SRIX 615, and that SRIX 615 associates the low-order bits with d, the block number of REFB 617, which actually contains the reference corresponding to the item number. The page index establishes a relationship between a page number identifying a page of the document and the block 503 which contains the beginning of the page. Like the reference index, the page index may have primary and secondary index blocks. However, the document of FIG. 6 does not have enough pages to require more than one index block, and thus, there is only one secondary page index block (SPIX) 619. SPIX 619 contains page pointers (PP) 620. Each PP 620 associates a page number (PN) with the number of the block 503 containing the beginning of the page. In FIG. 6, page 627 begins at block 503(f), and thus, PP 620 contains block number f. Within the page index, the PPs 620 are ordered consecutively by page number. If a page is deleted or added, the page numbers in the document are changed and the PPs 620 in SPIX 619 are rearranged accordingly. In order to facilitate renumbering of pages, the SPIXs 619 in documents having more than one SPIX 619 are linked into chains. The font index, finally, is an index of the type fonts used in the document. The font index supplements a list of fonts contained in AD2 605. When the number of fonts used exceeds the capacity of the list in AD2 605, the overflow is placed in FIX 633; if there are more than can be contained in a single block 503, another FIX 633 is chained to the first. Each font is represented in the font list and the index by a font descriptor (FD) 635. There is an integer corresponding to each FD 635, and a given font is represented in the document by means of the integer corresponding to that font's FD 635. C. Text Blocks and Text Pages The text of a document is contained in text blocks (TB) 621. The text blocks making up a document are chained together in sequential order. If the document is subdivided into pages, there is for each page a PP 620 in the page index containing the block number of the first TB 621 in the page. As described in U.S. Ser. No. 538,644, each TB 621 contains two parts in addition to H 602: text 629 and attributes (ATTR) 631. Text 629 contains the characters which will actually appear to a person editing the document. ATTR 631 contains attribute words (AW) 625 which specify additional material to be included in the text when it is printed or displayed, different display fonts, and modifications to the appearance of the text such as underscoring, highlighting, or varied spacing. The additional material may include items such as page headers and footers, footnotes, index entries, graphs, or pictures. As will be explained in more detail hereinafter, there are three kinds of AWs 625: visual/descriptive AWs 625, font AWs 625, and informational AWs 625. Visual/descriptive AWs 625 specify modifications such as underlining or highlighting and contain all of the information required to specify the modification. Font AWs 625 specify a display font and the range of characters to which it applies. The display font is indicated in the font AW 627 by the number of the font in the font list. Informational AWs 625 specify the additional material. The additional material is contained in a reference, and the informational AWs 625 contain a type value indicating the kind of reference they represent and an item number (IN) identifying the specific reference. As explained in the discussion of the reference index above, the IN is used to locate a block or chain of blocks 503 which contains the information in the reference referred to by the informational AW 625. For many types of references, the position in text 629 at which the reference applies is marked by an attribute character (AC) 623. The AC 623 has a code different from that of any character code, and during an editing session, a character corresponding to AC 623 appears on the user's terminal. D. References and Reference Blocks As mentioned in the discussions of reference indexes and informational AW 625, a reference is information which is referred to by an item number in an informational AW 625 and is consequently locatable by means of the reference index. A given reference has only a single item number, but may be referred to by any number of AWs 625. Each of the AWs 625 referring to the reference will contain the same item number. A field in SRIP 637 for a reference indicates how many AWs 625 refer to the reference. The value in the field is incremented each time an AW referring to the reference is added to the document and decremented each time such an AW is removed. SRIP 637 for a reference and its REFBs 617 will be deleted only if the reference count is 0. The information belonging to a reference is stored in reference blocks. REFB 617 in FIG. 6 is such a reference block. Like TBs 621, REFBs 617 may contain text 629 and ATTRs 631. However, to avoid circular references, there are certain limitations on the use of informational AWs 625 in REFBs 617. A REFB 617 may also contain an external reference, i.e., a reference to another document or to a file containing an item such as a graph or a picture. If the information required for an informational AW 625 exceeds the capacity of a single REFB 617, additional REFBs 617 are chained to the first REFB 617. However, only the location of the first REFB617 is included in the reference index. E. Named References and Pages As mentioned above in the discussion of the name index, any reference or text page may be referred to by a name as well as by its item number or page number. Such a reference or text page has an entry in the name index as well as in the reference index or page index and is accessible via the name index as well as via the reference index or page index. F. The Indexes and Document Restructuring As may be seen from the above discussion, neither a TB 621 or a REFB 617 contains a reference by block number to a block 503 which is contained in a different chain of blocks 503. The only references to blocks in other chains are via item numbers in AWs 625. Further, references by block number to other blocks within a chain are found only in Header 602 and, as will be explained in more detail later, in a certain type of AW 625. The references in the header are only to the immediately preceding and following blocks 503 in the chain, and those in the AW 625 are limited to blocks 503 in the same page 627 as that containing the block 503 with the AW 625. Similarly, names in the name index refer to pages 627 and references by page number and item number, not by block number. A major advantage resulting from the use of page numbers and item numbers to reference components of a document outside of a given chain of blocks 503 is that the blocks 503 in a chain of TBs 621 or REFBs 617 may be rearranged without affecting chains of REFBs 617 referred to within that chain. Further, the effects of any rearrangement of TBs 621 in a page 627 are limited to the TBs 621 in that page and to the TB 621 in the chain immediately preceding the first TB 621 in the page and the TB 621 immediately following the last TB 621 in the page. The above-described limitations on the effects of rearrangement makes it easy to compact a document in the present invention. Such an operation is advantageous because the process of editing a document tends to produce chains in which many TBs 621 or REFBs 617 occupy adjacent positions in the chain but widely separate positions in the document file 501 containing the document and many TBs 621 or REFBs 617 contain relatively little text. The compaction operation takes such a document and produces a copy in which as many TBs 621 and REFBs 627 as possible are full and in which blocks 503 which occupy adjacent positions in a chain tend to occupy adjacent positions in document file 501. The compacted document thus requires a smaller document file 501 than the original document and may be accessed more quickly for operations such as display on a terminal or printing. The properties of text and reference chains and indexes described above permits compaction of a document by page 627 or reference. Rearrangement or reduction of the number of blocks within a page 627 or reference affects only that page 627 or reference. Once rearrangement of the page or reference is complete, only the PP 620 for the page or the RP 614 for the reference and any NPs 610 referring to the page or reference need be reset to point to the new first block 503 of the page or reference. II. Overview of the Document Manipulation System Document structures of the present invention are manipulated by a documentation manipulation system. The discussion now turns to that system, beginning with the manner in which the routines which make up the system are organized, continuing with a discussion of certain data structures used by these routines, and ending with a discussion of the general mode of operation of the document manipulation system. A. Organization of the Document Manipulation System As pointed out in U.S. Ser. No. 538,644, the document structure of the present invention is manipulated by a single library of routines termed the Document Manager Library or DMLIB. The use of a single DMLIB by all programs which access a document represents a significant departure from the prior art shown in FIG. 7. In the prior art, each application program (AP) which altered or read the contents of a document 701 had its own group of document manipulation routines (DMR) for manipulating the document. For example, an editing program for writing a document 701 had one group of such routines, while a program for printing a document had another group of such routines. Thus, as shown in FIG. 7, there could be APs 703(A) . . . (N), each with its own different corresponding DMR 705(A) . . . (N). Such an approach has numerous disadvantages. First, there is a great deal of duplication of programming effort. Most of the APs 703 will perform functions like reading characters from document 701, and each of the DMRs 705 for these APs will have its own routine for reading characters. Second, each essentially duplicate routine in the various DMRs requires its own storage, and thus increases the storage requirements for the application programs. Third, maintenance and enhancement of the document structure used in documents 701 is made more difficult. Any of the DMRs 705 may have a bug which damages the document structure, and when such a bug occurs, the DMR 705 in each of the APs 703 must be checked for the error. Further, any change in the document structure requires modification of all of the DMRs 705 which manipulate the document. In the present invention, as shown in FIG. 8, all manipulation of documents 805 having the present invention's document structure is done by invoking routines in DMLIB 801. The various APs 803(A) . . . (B) no longer have their own DMRs. For example, if an editing AP 803 wants to display a page of a document on a CRT screen, it invokes the same routine in DMLIB 801 to read the page from the document as does a printing AP 803 which wishes to print the page. By thus combining all of the document manipulation routines into DMLIB 801, the present invention eliminates duplication of programmer effort, reduces the amount of storage required for APs 803, and makes it easier to maintain and enhance the present invention's document structure. While the basic principle of using a single DMLIB 801 for all APs 803 is not limited to document structures like those of the present invention, the present embodiment of DMLIB 801 is organized into groups of routines which correspond to the components of the present invention's document structure. These groups include the following: Buffer Routines: These routines manage buffers 54 in which blocks 503 are stored when they are being manipulated. File Initialization Routines: These routines initialize a document file 501 as required for a document. Administration Block Routines: These routines insert information into and retrieve information from the administration blocks in the present invention's document structure. Character Routines: These routines retrieve characters from and place them in text 629 of a TB 621 or a REFB 617. They further return attribute information about a character. Sequential Access Routines: These routines take advantage of the fact that TBs 621 are chained together to provide fast sequential access to the text of the document. The access includes reading characters and their attributes. Visual Attribute Routines: These routines set, read, and delete visual/descriptive AWs 625 and the ACs 623 associated with them. Reference Attribute Routines: These routines set, read, and delete informational AWs 625 and the ACs 623 and REFBs 617 associated with them. Page Routines: These routines locate, make and delete pages and page-related information such as headers and footers. Format routines: These routines get, set, and otherwise manipulate document format information. Column routines: These routines create, delete, and manipulate columns in a document. Header and Footer Routines: These routines permit an AP 803 to add, delete, and modify page headers and footers. External Reference Routines: These routines permit AP 803 to make or read an external reference, i.e; a reference to information located elsewhere than in the document. Examples of external references are other documents, pictures, voice documents, drawings, and graphs. Shelf Routines: These routines permit an AP 803 to add, delete, or modify text shelves. Text shelves are chains of blocks 503 used as temporary storage in a document. Editing Routines: These routines permit an AP 803 to delete text from a document, move text from one position in a document to another, copy text from one position to another, move or copy text between a other parts of a document and a shelf, and create a place in a document at which text may be inserted. Name Routines: These routines permit an AP 803 to name an item or page by creating a NIP 610, delete a name, retrieve the item or page number identified by a name, and determine the name corresponding to an item or page number. They also permit an AP 803 to insert a named mark identifying a location in the text of a document into or remove it from a document. Font Routines: These routines permit an AP 803 to add or remove fonts from the font list and determine a font from the font number in a font AW 625. Document Management Routines: These routines permit an AP 803 to create a compacted copy of a document. Compaction involves combining the contents of chained blocks 503 so that the fewest number of blocks 503 possible are required for the contents and using blocks 503 which are adjacent to each other in document file 501 as adjacent blocks 503 in chains of blocks 503. One of the particular features of the present invention is the availability of text editing routines such as character insertion routines (in the Character Routines), deletion, move, and copy routines (in the Editing Routines) to any AP 803. In prior art document processing systems, such routines were generally part of an editing AP 803 and were not available to other APs 803. In consequence, the easy and powerful text handling interface of the editing AP 803 was not available to other APs 803 which required some text handling capability. For example, in an interactive AP 803 in prior art systems, the user only had limited editing capabilities available for editing his input to the interactive program; in the present invention, an interactive AP 803 may use any of the editing capabilities available from DMLIB 801. B. Data Structures Employed in Operation of the Document Manipulation System As mentioned in U.S. Ser. No. 538,644, the document processing system in which the present invention is embodied provides data structures stored in DAS 60 to the routines of DMLIB in order to manipulate documents and obtain information about them. These data structures include the File Reference Block (FRB), which contains information about the document file 501 which contains the document currently being edited, the Buffer Table (BT), which is a table of the buffers in Buffers 54 being used to hold blocks 503 belonging to the document being edited, the Document Control Block (DCB), which contains information about the document currently being processed, and one or more position blocks (PB), which specify positions within a document In the following, the DCB, BT, and PB are discussed in detail. All three are shown in FIG. 9. 1. DCB 901 DCB 901 relates the document currently being manipulated by an AP 803 to document file 501 in which it is contained and to the BT which describes the buffers in Buffers 54 which contain blocks 503 from the document. DCB 901 contains the following fields: FRB PTR 903, which contains the lo | ||||||
