Document processing using frame-based templates with hierarchical tagging5845303Abstract A system and method for manipulating and displaying information in a computer system includes a display screen, a processor, a storage device, and a data input device. Input data is received in the system through the input device. The system determines a display format of the data. The display format includes a number of constraints on the display. The system associates the input data with the appropriate display frame and flows the data into the frame. Constraints on the display are solved as the data is flowed (or moved) into a frame or frames. Upon resolution of constraints, the display frame is sized to accommodate the input data and the frame is displayed on the computer display screen. Claims What is claimed is: Description BACKGROUND OF THE INVENTION
TABLE I
______________________________________
FLOW TAGS CODE FLOW TAGS CODE
______________________________________
Abstract ab ISBN IS
Addresses as Keywords ky
Address ad Legend lg
Animation an Level 1 L1
Arrows ar Level 2 L2
Author au Level 3 L3
Banner ba Level 4 L4
BCC bc Level 5 L5
Bibliography
bb List L2
Body BD List 2 L3
Border br List 3 L4
Byline by List 4 L5
Callout ca Logo lg
Caption cp Masthead MH
CC cc Music ms
Chart ch Page Number pg
Colophon co Phone Number pn
Copyright cr Picture pc
Credit ct Postscript ps
Cross Reference
cx Preface pf
Data dt Publication Title
pt
Dedication dd Publisher pb
Email Address
em Pullquote pl
Fax # fx Rules ru
Figure fg Salutation sa
Footer ft Sidebar sb
Footnote fn Signature sg
Foreward fw Sincerely Line
sn
Graphic gr Story st
Header HD Subject SB
Identifier id Sublist sl
Illustration
il Table tb
Index ix Table of Contents
yc
Ink ik Video vd
Introduction
in Voice vc
______________________________________
The second two character field is referred to as a style tag 106. The style tag 106 identifies any modifications to the paragraph type that might be required by a potential conflict of structure. As an example, a portion of a document might be tagged as BD (body) with a style tag 106 indicating that the content of the body is a list, or that the body should be emphasized (i.e., structured) in a particular way (such as with boxing, outdents, or the like). Example style tags 106 are listed in Table II.
TABLE II
______________________________________
STYLE CODE
______________________________________
Tags s
Force Just fj
Indented in
Justified ji
Large lg
Ragleft rl
Ragright rr
Small sm
______________________________________
The third two character field of the tag 100 is a substyle tag 108. This two-character field identifies any modifications to be made to individual character formatting within a particular paragraph. Typical substyle tags include bl (bold), cn (condensed), double underline (du), italic (it), and the like. Other substyle tags 108 are shown in Table III.
TABLE III
______________________________________
SUBSTYLE CODE SUBSTYLE CODE
______________________________________
Bold b1 Larger lr
Condensed cn Light li
Double du Marked mr
underline
Dropfirstcap
dc Smaller sr
Expanded ex Strikethru st
Fat ft Subscript s1
Heavy hv Superscript s2
Indexed in Underline un
Italic it
______________________________________
A depiction of an input file 120, formatted for display by a computer system 40 utilizing the present invention, is shown in FIG. 3C. Note that the format of the file 120 shown in FIG. 3C differs from the format of the file 30 shown in FIG. 1B only in that tags and graphics information are included. The tags 100 are used in applications software 78 to direct (or "flow") text or graphics information to a particular frame. For example, in file 120 of FIG. 3C, the tag 100 (".backslash.LG.....backslash.") informs applications software that the data which follows is graphics data to be flowed to a logo frame 82 for display on display device 44. The use of a pair of periods ("..") for the style and substyle tags 106, 108 indicates that default styles are to be used. Specifically, in one particular implementation of the present invention, if the flow tag 104 contains "..", the system assumes that the tagged content belongs in the current frame (i.e., that no frame change is required). If the style tag 106 contains "..", the system assumes that the tagged content is to take on the character formatting of the previous paragraph. And when substyle tag 108 contains "..", the tagged content will assume the prevailing character formatting of the current paragraph. Graphics tags (e.g., LG, or logo tags) are slightly different than text tags. Binary graphics data typically cannot be parsed into a text stream. Therefore, a graphics tag, in one specific implementation, is followed by text indicating a path to the graphic file (which may be stored in any standard graphics format, such as GIF). The path to the graphics file may then be followed by a backslash character ".backslash." to alert the system that the end of the graphics tag has been reached. The tag 100 of the present invention has several characteristics. The tag language used is generic. That is, the same tag may be assigned several different human-readable names so that the names may be properly descriptive when stored with a particular metaform. For example, a newspaper metaform and a report metaform may both use the same flow tags 104 (e.g, the flow tags for headline, subhead level 1 and subhead level 2). However, the tag (listed in human-readable form) associated with the newspaper metaform may be termed "headline", "subhead" and "section head" to more descriptively refer to the frames used in particular applications. The human-readable tags associated with the report metaform may be termed "title", "chapter" and "section". This ability to map tags 100 of the present invention permits easy internationalization of metaforms, as only the human-readable names of the tags, not the tags themselves, need to be localized. Further, the hierarchical tagging scheme of the present invention permits ready importation of HTML-formatted data. Data formatted using certain other SGML dialects and HTML variants (such as HTML+) may also be readily imported or easily translated for use in the present invention. This simplified and efficient tagging scheme enables easy formatting of files for use in the present invention. The scheme minimizes the amount of tag information required to format a file for use in the present invention, thus ensuring that file size is kept to a minimum. File size is frequently a concern when files are transferred to small computers such as personal communicators. These systems operate most efficiently with small files. Further, users are transmitting more and more files over wired and wireless media (such as cellular or wired telephone systems). Transmission costs and transfer times are reduced when smaller files are used. In addition, storage and archival costs are minimized when files are kept as lean as possible. Other features of the present invention, which will now be described by first referring to FIG. 4, allow dynamic sizing of frames displayed on display device 44. FIG. 4 shows a specific metaform 80 displayed on display device 44. Again, metaform 80 is a sample newsletter metaform displaying the information from the newsletter 10 of FIG. 1A. Metaform 80 contains a plurality of frames 82-92 in which specific pieces of information are displayed. Metaforms for use in the present invention are created, in one specific embodiment, using Microsoft Visual C++. In defining a metaform, the frames are first defined using a base grid. The base grid, or page grid, provides a regular division of the overall display area. Frames are established and defined to present a specific presentation of data, such as the newsletter metaform 80 of FIG. 4. Once the frames 82-92 are initially defined, each frame is tagged to define the type of data it is to accept. For example, frame 82 is tagged to accept logo (or .lg) data. Frames may include several tags. Frame 82, being a graphics frame, may also be defined as accepting a graphic (.gr) or even video data (.vd), for example. As will be discussed, by providing several acceptable tags for each frame, the system may dynamically port incoming data to alternative frames for display. After each frame 82-92 has been defined and tagged, the next step in creating a metaform 80 is to define constraint relationships between each of the frames. FIG. 4 includes a number of links 130, 132, and 134 which demonstrate constraint relationships for each of the frames 82-92. A constraint describes a relationship that must hold between multiple variables. For example, a constraint can be defined which will maintain an alignment between two objects, despite the ability of the two objects to expand or contract. In the present invention, constraints are employed to maintain consistent relationships between frames as the frames change size and/or location on the display device 44. Links 130, 132, and 134 demonstrate the constraint relationship between various frames of metaform 80. For example, links 130a-g (displayed as thick lines) indicate fixed relationships or required constraints. Link 130a represents a required constraint between logo frame 82 and the edge of display device 44. This relationship may be defined, e.g., to maintain a fixed size border along the outside of the display 44. Thus, if the size of the logo in frame 82 is increased, the fixed relationship identified by link 130a will not change. Other links 130b, 130c, 130g representing margins may also be fixed using required constraints. Similarly, distances between columns 130d or widths of columns 130e, 130f may also be defined using required constraints. Not all relationships between frames are fixed. For example, certain relationships may be overridable to allow repositioning of one or more frames as input data is flowed in the frame. If the header frame 84 expands to accommodate additional data, for example, the title frame 86 may need to be repositioned in the Y-direction. Such relationships, in this particular example, are indicated by links 132a-d. Still other constraints may be classified as grow constraints. Examples are indicated by links 134a-h. In frame 90, e.g., when additional data is flowed into the first column of the newsletter, the body column frame 90 must expand to accommodate the additional data. Thus, a constraint system is employed in the present invention to dynamically size and position frames as data is flowed into them. The constraint system is also used to define basic screen layouts for different display devices 44. For example, if display device 44 is a six inch diagonal screen for a PDA, individual frame sizes will be more severely constrained than if display device 44 were a fifteen inch diagonal monitor on a desktop computer system. Constraints may be one-way or multi-way. A one-way constraint operates in only one direction. As a simple example, a constraint establishing that a variable "B" will be set equal to the value of a variable "A" when "A" changes in value is a one-way constraint. A multi-way constraint system ensures that "B" will change as "A" changes, and vice versa. The constraint system of the present invention is a multi-way constraint system. In one specific implementation, the present invention utilizes binary constraints (i.e., constraints limited to two variables) to avoid synchronization issues. The newsletter metaform 80 includes both one- and multi-way constraints. For example, link 132c is a multi-way constraint. That is, if the size (position) of title frame 86 changes, the position of body column frame 90 must also change. Likewise, if the position of body column frame 90 changes, the position of title frame 86 may also be affected. The constraint system of the present invention, in one specific embodiment, solves each constraint relationship sequentially before the final image is displayed on display device 44. In one specific embodiment of the present invention, a variety of constraint types are used which are subclasses of an abstract base class termed "MBConstraints". The abstract base class contains variables which identify the constrained objects and whether the constraint is one way or multi-way. Subclasses override critical methods, fill in variables, and initialize subclass variables. New types of constraints may be added to the system by overriding execution methods of existing constraints. Constraints may be used to allow an infinitely scrolling display or they may be used to ensure that single pages of information are displayed together. For example, referring again to the newsletter metaform 80, if body columns 88 and 90 are not constrained to a single page, scrolling via user input device 54 may be required to view all the information contained in those frames. However, if the newsletter metaform 80 demands that all frames be constrained to a single page (or a single screen on display device 44), a user may need to page down using input device 54 to view additional pages of information. For instance, in the sample newsletter metaform 80 of FIG. 4, the footer frame 92 has a fixed constraint 130g tying it to the bottom of the page (or screen). Fixed stand-off constraints 135a and 135b ensure that body column frames 88, 90 do not encroach upon the footer frame 92. If either of the body column frame 88 or 90 increases in size such that constraints 135a or 135b are violated, the system of the present invention commences flowing data to a second page, or a second metaform for the new information. This ensures that incoming data is flowed in a coherent and logical manner into predefined frames. Alternatively, or additionally, the system may first locate another appropriate frame on the current page for the overflow information. An appropriate frame is one which is tagged to accept the same or similar data (e.g., another frame that is tagged to accept .BD or body information). The solution of constraints and the general operation of the present invention will now be described by referring to the flow diagram of FIG. 5. For the purposes of this sample description, it will be assumed that the specific metaform of FIGS. 3 and 4 has been selected for use (i.e., the newsletter metaform 80) and that the input data resides in file 120 of FIG. 3C. Operation commences as the applications software 78 receives input data 140. This input data 140 may be received from information source 62 via transmission channel 60 or from any other channel and source. Input data may, e.g., be in the form of a retargetable data stream (RDS) file containing no formatting information other than tags 100. Applications software 78 then, in step 142, functions to identify the first set of tagged information from input file 120. Tagged information in an RDS file is identified by locating double backslash characters ".backslash..backslash." which set tagged information apart from text or graphics information. Therefore, the first flow tag 104 identified in this example would be "LG". Once a flow tag 104 has been identified, applications software 78 operates to determine the proper frame within a metaform into which data is to be flowed. Applications software 78 also functions to compose both text and graphics for insertion into a frame. Specifically, composition of the data for a frame includes modifying text or graphics as indicated by style and substyle tags 106, 108. In one specific embodiment, applications software 78 performs the composition and flow of data to frames using an object-oriented composition engine responsive to specific flow, style, and substyle tags 104, 106, 108. As required, applications software 78 grows or shrinks the frame as new lines of text or graphics are composed. In this particular example, applications software 78 will determine that the logo data identified by the flow tag LG is to be flowed to frame 82 of metaform 80. Default style and substyle tags will be used. As applications software 78 flows the information to the target frame (frame 82), the system determines whether the size of the frame will be affected. When a particular frame of a metaform is changed in size in any way, a constraint solver is invoked. As shown by decision block 146, if no change in frame is required, applications software 78 returns to receive more input data 140. If a change in frame size occurs, applications software 78 determines whether any constraints are affected 148. Each metaform includes a list of constraints to be managed for that particular metaform. When it is determined that a frame of a metaform requires a change in size, applications software 78 determines which constraints require solving as a result of that change in size. This includes determining those constraints which are directly affected and those constraints which may be affected downstream 152. For example, when logo data is flowed into frame 82 of metaform 80, five constraints may be directly affected depending upon the size of the image imported. Two of the constraints are fixed constraints 130a, 130h which maintain proper spacing of logo frame 82 on the screen of display device 44. Applications software 78 will not allow expansion of the logo 82 beyond those initial fixed spacings. Two of the constraints are grow constraints 134a and 134b which allow expansion of the image in two directions. A fifth constraint 132a is an overridable constraint which may be overridden if necessary. Thus, if the logo input in file 120 is larger than the default size of logo frame 82 of metaform 80, a number of direct constraint relationships must be solved. Before solving a constraint, applications software 78 first determines whether the constraint has already been solved in the current solution cycle 150. This avoids repetitive solution of constraints for a frame which is affected by more than one constraint relationship. Constraints are then executed in step 154. Thus, for an expanding frame which has a number of constraints (immediate and downstream) to be solved, each constraint is solved sequentially until all relationships are resolved. Once all the constraints have been solved for a particular frame, applications software 78 returns control to step 140 to receive further input data. For example, for the example metaform 80 of FIG. 4, one possible constraint solution sequence would be 134a, 134b, 130h, 132a, 134c, etc., until each constraint of the metaform 80 is solved. Typically, in english-language applications, the constraints will be solved from the top of the form to the bottom, and from the left of the form to the right. It is also possible, however, to utilize other solution sequences. The input of data 140, identification of tags 142, composition and flow of information 144, and constraint execution 154 are repeated until all information is generated and flowed for a particular metaform. Thus, the actual image displayed on a display device 44 depends upon the nature and amount of data input to each frame. The title frame 86 may, e.g., be larger than body column frames 88, 90. Footer frame 92 may not be displayed in certain situations where no footer data exists in an input file. Further, depending upon the use or non-use of a fixed constraint at the bottom of the metaform (such as constraint 130g of metaform 80), the input data may be displayed over several different screens which may be viewed by scrolling using a user input device 54. Each specific metaform, such as the newsletter metaform 80, may include definitions for the second, and subsequent pages of the newsletter. Thus, when constraints are executed in step 154, if a new page of the metaform is affected, a pagination step 156 must also be performed. The newsletter metaform 80, for example, may be defined as displaying only two columns of text on all pages after the first. As text is flowed into each column and the end of a page or screen is reached (i.e., the page bottom constraints are violated), the applications software determines the format of the next page, and flows the text into any appropriate frame(s) available. The present invention may also incorporate criteria which first attempts to locate another suitable frame on the current page for the extra information. For example, when a frame has grown such that a pagination should occur, applications software 78 may first review other frames on the current page to determine if any of them may accept the overflow data from the current frame. Those skilled in the art will appreciate the number and variety of combinations of metaforms which may be utilized using techniques of the present invention. Further, specific metaforms may be developed for particular screen and display types to effectively utilize available space and size. Thus, a system 40 may include a variety of metaforms. Applications software 78 may be augmented with additional capabilities to effectively manage a library of metaforms. Conversely, a system 40 may (due to storage or other limitations) include only a few commonly-used metaforms. To allow a system with a limited number of metaforms to accept a wide array of data, applications software 78 may include the ability to reflow incoming data to those frames which are available. For example, if the particular metaform for which a document was originally formatted is not available on the computer system 40 that receives it, applications software 78 may be equipped with an ability to substitute another metaform from those metaforms which are available in system 40. If, for example, a newsletter formatted for the particular newsletter metaform 80 discussed above is received in a PDA that does not have a newsletter metaform, applications software 78 may be used to reflow the data from the input file to, e.g., a newspaper metaform. To achieve this substitution, metaforms according to the present invention may be identified by a type code for each form, indicating a particular metaform type. One specific format of a type code for use in the present invention comprises three fields, including: a field indicating the "style" of the metaform; a field indicating the "form" of the metaform; and a field noting the output format for the metaform. For example, the style field may indicate whether the form is used with graphics, text, or the like. Hexadecimal values may be used to note the style of each metaform. Hex values may also be used to indicate the form of the metaform, e.g., a newsletter metaform may be noted by a hex value of "D" and a report metaform noted by a hex value of "FO". If a particular form or style of metaform is not available on a system, substitution may be made by utilizing an available form which has the nearest style and form values (in hex). The output format of a particular metaform is a user proffered selection, and may include options such as print, fax, small screen, or large screen. These fields, and others, may be used by applications software 78 to select the most appropriate available metaform 80 when the requested metaform is not present in the system. Use of such criteria allows the system to provide the most suitable alternative format for the display of particular information. While the invention is described in some detail with specific reference to a specific preferred embodiment and certain alternatives, there is no intent to limit the invention to that particular embodiment or those specific alternatives. For example, a sample utilizing a newsletter metaform has been referred to throughout the specification. However, those skilled in the art will realize that the present invention may be employed for the display and use of a variety of data and information. Further, although the sample discussed in the specification showed only one specific page of information, the present invention is well-suited to the display and flow of multiple pages of information. The present invention may also be used for the input of user data from a keyboard or other input device. Thus, the true scope of the present invention is not limited to any one of the foregoing exemplary embodiments but is instead defined by the following claims.
|
Same subclass Same class Consider this |
||||||||||
