Document pagination based on hard breaks and active formatting tags6789229Abstract Pagination of a document is achieved through the use of an index of predetermined hard breaks within the document. When a selected portion of the document is identified, an immediately prior hard break relative to the selected portion is identified. Active formatting tags applicable to content following the identified hard break are identified to determine proper layout for any intervening pages between the identified hard break and the selected portion. In this manner, a complete and reproducible page can be associated with the selected portion, independent of page number determination. To determine a page number, page counts between hard breaks are stored. A sum of page counts between hard breaks is calculated up to the identified hard break. The final page number is this sum plus the number of pages determined between the identified hard break and the reproducible page. Claims What is claimed is: Description TECHNICAL FIELD
TABLE 1
Selected Portion Prior Hard Break
204 202a
205 202a
206 202b
207 202b
208 202b
209 202b
210 202b
211 202c
212 202c
213 202c
214 202c
In the context of the present invention, a selected portion of a document is that portion of a document chosen by a user for immediate viewing and will typically constitute that amount of content (text, images, etc.) capable of display within a single display screen. Where the document is divided into block each comprising content capable of being displayed over more than one display screen, designation of a given block as the selected portion will result in a predetermined section of the blocks being displayed, e.g., the first reproducible page of the block. Of course, other sections could be used for this purposes as a matter of design choice, e.g., the last reproducible page of the block. The significance of a selected portion of a document will be described in further detail below. As known in the art, the formatting tags illustrated in FIG. 2 are used to control the structure and format of the underlying content. As used throughout, the term tags is meant to include any corresponding attributes, if any. In accordance with the principles of well-formed XML, the formatting tags included in the document 200 comprise, where possible, paired start and end tags. That is, for each of the start tags shown (i.e., <HTML>, <HEAD>, <BODY>, <DIV>, <H1>, <P> and <B>), there is a corresponding end tag shown (i.e., </HTML>, </HEAD>, </BODY>, <DIV>, </H1>, </P> and </B>). As known in the art, any content encompassed by a given pair of start and end tages is processed in accordance with the meaning of those tags. For example, any text between a pair of <B> and </B> tags will be formatted as bold text. Furthermore, tags can be nested within other tag pairs, with the result that multiple formatting instructions will apply to text encompassed by the nested tags. For example, in FIG. 2, a <B> and </B> tag pair are nested within several other tag pairs. Regardless, at any point within the document 200, any hard break within the content will be subject to any number of active formatting tags. In the context of the present invention, an active formatting tag relative to a given hard break is any tag having an effect on the formatting of at least some of the content after the hard break. Table 2 below illustrates active formatting tags relative to each of the exemplary hard breaks identified by corresponding reference numerals in FIG. 2.
TABLE 2
Hard Break Active Formatting Tags
202a HTML
202b HTML, BODY, DIV
202c HTML, BODY, DIV
As shown in Table 2, each hard break is subject to any number of active formatting tags. Note that even distally related hard breaks can be subject to the same active formatting tags (e.g., those hard breaks identified by reference numerals 202b and 202c). FIG. 3 illustrates a data structure 300 that supports pagination in accordance with the present invention. In particular, the data structure 300 is representative of a document and includes a root storage 302, a data storage 304, a hard break index (pb1) 306, an active formatting tag index (pb2) 307, a content storage 308 and a content stream 310 arranged as shown. Although a particular structure of storages and streams is illustrated in FIG. 3, other structures encompassing the same functionality may be equally employed. The data structure 300 may be stored in a computer-readable format in a suitable computer-readable medium, such as a computer hard drive, magnetic disk, etc. The data structure 300 preferably forms a part of a larger document data structure. In the parlance of shared objects programming and, in particular, the so-called Object Linking and Embedding (OLE) framework, a storage object is similar to a directory and may contain other storage and stream objects. In the same vein, a stream object is similar to a sequential file and comprises unformatted or unstructured data. Thus, the content 201 illustrated in FIG. 2 comprises one or more streams that would be stored within the content storage 308, e.g., content stream 310. In a preferred embodiment, the content 201 comprises a numerically-encoded equivalent of document content that includes formatting tags as described above. As part of the preferred embodiment, the hard break and active formatting tag indices 306, 307 are streams stored within the root storage 302. As illustrated in FIG. 3, storages at any given level of the hierarchy may comprise additional storages and/or streams not shown in FIG. 3. The dotted arrows shown in FIG. 3 indicate the existence of references between the elements shown. Thus, the hard break index 306 includes references to the active formatting tag index 307, and both indices 306, 307 include references to one or more content streams 310 encompassed by the content storage 308. A preferred embodiment of the indices 306, 307 is illustrated in greater detail in FIG. 4. An example of a hard break index 402 and an active formatting tag index 404 is illustrated in FIG. 4. Any given document will have its own uniquely corresponding hard break and active formatting tag indices 402, 404. The hard break index 402 comprises a plurality of fixed-length records 406 uniquely corresponding to each hard break within the document. The active formatting tag index 404 comprises a plurality of variable-length records 408 also uniquely corresponding to each hard break within the document. Each record 406 within the hard break index 402 comprises a hard break offset value 410 that indicates, relative to the beginning of a document's content, the location of a particular hard break. Referring to the example of FIG. 2, such an offset 215 is illustrated indicating the location of the hard break labeled with reference numeral 202b. As noted above, the content of a document may be stored in multiple content streams. Thus, the document may be viewed as a concatenation of multiple content streams. Where multiple content streams are used, hard break offset values each corresponding to the start of a content stream are included in the hard break index 402 in the preferred embodiment. Because each hard break offset value is relative to the beginning of a document, as opposed to the beginning of a single content stream, the hard break offset values may point to different content streams, i.e., they are calculated as if all of the content streams were concatenated into one larger stream. As an alternative to a unified index covering multiple content streams, and as a matter of design choice, it is also possible to having multiple indices having a one-to-one correspondence with multiple content streams. Additionally, each hard break index record 406 comprises an offset 412, relative to the beginning of the active formatting tag index 404, to a corresponding portion of the active formatting tag index 404, as illustrated by the dashed arrows. Each record 408 within the active formatting tag index 404 comprises a count value 414 indicative of the length of the remainder of that record. Additionally, each active formatting tag index record 408 comprises at least one offset value 416 that indicates, relative to the beginning of a given content stream, the location of one or more active formatting tags applicable to the hard break. Referring again to the example of FIG. 2, such an offset 216 is illustrated indicating the location of the active <DIV> tag applicable to the hard break labeled with reference numeral 202b. Because the number of active formatting tags applicable to hard breaks varies according to each hard break's position within the content of the document, each record 408 likewise varies in length. The hard break and active formatting tag indices 402, 404 described herein may be used to efficiently paginate a document, as described in greater detail with reference to FIG. 5. Although particular structures and relationships have been described above for the hard break and active formatting tag indices 402, 404, the present invention is not limited in this regard. For example, the count values 414 could be replaced by a process of taking the differences between the sequential offset entries 412 found in the hard break index 402. In another alternative, the information encompassed by the active formatting tag index 404 may be incorporated directly into the document content such that active formatting tags may be determined "on the fly" based on the information incorporated into the document content. For example, up pointers may be associated with start and/or end tags that point to enclosing tags. In this manner, a chain of pointers may be followed to ascertain the active formatting tags. In yet another alternative, such up pointers may be stored in a separate structure apart from the content stream. Although particular alternatives to the structure shown in FIG. 4 have been described herein, those having ordinary skill in the art will recognize that other implementations within the scope of the present invention may be readily devised. FIG. 5 is a flowchart illustrating a method for paginating a document in accordance with the present invention. In a preferred embodiment, the method illustrated in FIG. 5 is implemented using stored computer-readable instructions executed by a suitable processing platform, although any combination of such software instructions and hardware-based implementation may be used. At step 501, an indication of a selected portion of a document (e.g., any of the selected portions 204-214) is received. This may be achieved in any manner, such as where a user opens a document to a previously-marked portion of the document, or where the user quickly flips or scrolls through the document before finally selecting a portion to view. At a minimum, the indication of the selected portion points to or otherwise identifies a location within the document to be viewed by the user. At step 502, the immediately prior hard break relative to the selected portion is determined. For example, referring to FIG. 2, if an indication of the selected portion labeled 206 is received, the immediately prior hard break labeled 202b would be identified at this step. Where the indication of the selected portion is represented as an offset relative to the beginning of the document's content, this step may be carried out using the hard break index 402. That is, the indication may be compared with the set of hard break offset values 410 to find the greatest hard break offset value that is less than the offset value in the indication is determined. The hard break corresponding to the record 406 comprising this hard break offset value is therefore deemed to be the immediately prior hard break relative to the selected portion. At step 503, a reproducible page encompassing the selected portion is determined. As its name would imply, a reproducible page is a portion of formatted content that is unchanging relative to the hard break identified at step 502 and therefore may be consistently reproduced for display. To determine such a reproducible page, all content starting from the identified hard break up to and at least including the selected portion is formatted in accordance with any active formatting tags beginning at the identified hard break and any additional tags intervening between the hard break and the selected portion. In a preferred embodiment, the active formatting tags for the hard break are identified by accessing the hard break's corresponding record 406, following the offset 412 within that record 406 to a corresponding record 408 in the open formatting tag index 404 and, for each active formatting tag offset 416 included in the record 408, accessing the content to identify the relevant tag. For example, assume that an indication of the selected portion labeled 214 is received and the hard break labeled 202c is subsequently identified. Beginning at the hard break 202c, initial content after the hard break 202c and before the selected portion 214 will be formatted in accordance with the active <HTML>, <BODY> and <DIV> tags. However, some of the content between the hard break 202c and the selected portion 214 will not be subject to the <DIV> tag's active formatting, because that content-portion follows the </DIV> end-tag. Additionally, at least some content between the hard break 202c and the selected portion 214 will also need to be formatted in accordance with the intervening <I>, </I> tag pair. So long as the occurrence of formatting tags within the content is static (i.e., tags cannot be added or removed or their attributes modified between accesses to the document), the above-described procedure will ensure that the content encompassing the selected portion will always be reproducibly formatted, thereby leading to a reproducible page. Because this formatting process is done from a known hard break, rather than the beginning of the content, it takes considerably less time to perform, particularly where the selected portion occurs relatively close to the end of the content. In a preferred embodiment, the formatting process of step 503 is performed such that at least a portion of the content subsequent to the selected portion is also formatted and determined to be acceptably formatted before the final determination of a reproducible page is made. For example, if a complete page corresponding to the selected portion has been formatted, at least one more line of text on a page immediately after the formatted selected portion is also formatted and checked to make sure it is acceptably formatted. For example, it is generally considered inappropriate to format a page of text such that a widow (the last line of a paragraph printed by itself at the top of a page) or orphan (the first line of a paragraph printed by itself at the bottom of a page) occurs on the subsequent page. Of course, the relative acceptability of a subsequently formatted page need not be based solely on the occurrence of widows and the like; other criteria readily apparent to those having ordinary skill in the art may be equally applied. At step 504, the reproducible page may be optionally displayed, typically via a display device such as a computer screen. If displayed at this point, the reproducible page will not include a page number since no such number has yet been determined. In a preferred embodiment, the reproducible page is displayed prior to or, at the least, concurrent with the processing required to determine a page number for the reproducible page. As an alternative, the page number can be calculated prior to the step of displaying the reproducible page, at which point the reproducible page will be displayed along with the page number. Regardless, at step 505, a number of pages inclusively between the hard break identified at step 502 and the reproducible page determined at step 503 is determined. In a preferred embodiment, this is done by incrementing the number of pages as successive pages are formatted during the processing of step 503. In this manner, the number of pages between the hard break and the reproducible page is determined at the same time the reproducible page is determined. Continuing at step 506, a sum of page counts between hard breaks prior to and including the identified hard break is calculated. To this end, it is assumed that the entire document has been paginated at least once at some point in time prior to the execution of the process illustrated in FIG. 5. When such prior pagination occurs, it is therefore possible to determine the number of pages normally existing between hard breaks (assuming that the formatting of the content, i.e., font size, font face, page size, etc., has not changed). Referring to FIG. 2, and assuming that the content illustrated therein has been fully paginated at least once before, there should be a static number of pages (page count) between the first hard break 202a and the second hard break 202b. This is illustrated in FIG. 2 by the .DELTA..sub.l symbol. A similar static page count between the second and third hard breaks and static page counts between any subsequent pairs of hard breaks may also be determined and stored in memory. In a preferred embodiment, the page counts described above are stored separately from the information regarding hard breaks and active formatting tags, i.e., the hard break index 402 and active formatting tag index 404. Furthermore, it is anticipated that that multiple sets of page-counts may be associated with a publication to accommodate, for example, different format settings that may be selected by a user. That is, separate page counts for each setting are stored so that if a user has read the book in accordance with one setting, switches to another setting, and then later switches back, the appropriate page count information will continue to be available. Assuming that the page counts are reliable, the page counts between the beginning of the content and the identified hard break may be summed together to arrive at a page total up to the identified hard break. By adding, at step 507, this sum of page counts to the number of pages inclusively between the hard break and the reproducible page, a page number for the reproducible page may be readily calculated. Because this method does not require each page from the beginning of the content to the reproducible page to be first formatted and numbered, the present invention allows page numbers to be efficiently determined, particularly in those cases where the reproducible page occurs relatively close to the end of the document's content. Finally, at step 508, the page number thus determined may be displayed. Again, this may be subsequent to, or concurrent with, the step of displaying the reproducible page as described above. In the foregoing specification, the present invention has been described with reference to specific exemplary embodiments thereof. Although the invention has been described in terms of a preferred embodiment, those skilled in the art will recognize that various modifications, embodiments or variations of the invention can be practiced within the spirit and scope of the invention as set forth in the appended claims. All are considered within the sphere, spirit, and scope of the invention. The specification and drawings are, therefore, to be regarded in an illustrated rather than restrictive sense. Accordingly, it is not intended that the invention be limited except as may be necessary in view of the appended claims.
|
Same subclass Same class Consider this |
||||||||||
