System for altering elements of a text file to mark documents5388194Abstract A system for generating documents. A text file representing a document is encoded using a marking code. This marking code is encoded into the document by altering elements of the contents of the text file. Preferably the spaces between words are altered to be short and long spaces. The marked version therefore indicates, in an encoded fashion, the marking code that is applied. Claims I claim: Description TECHNICAL FIELD
TABLE 1
______________________________________
TABLE OF CHARACTER CODES
______________________________________
SLSLSLSLSL A SSSLSLSLSL 0
SLSLSLSLLS B SSSLSLSLLS 1
SLSLSLLSSL C SSSLSLLSSL 2
SLSLSLLSLS D SSSLSLLSLS 3
SLSLLSSLSL E SSSLLSSLSL 4
SLSLLSSLLS F SSSLLSSLLS 5
SLSLLSLSSL G SSSLLSLSSL 6
SLSLLSLSLS H SSSLLSLSLS 7
SLLSSLSLSL I LLLSSLSLSL 8
SLLSSLSLLS J LLLSSLSLLS 9
SLLSSLLSSL K LLLSSLLSSL #
SLLSSLLSLS L LLLSLSLSLS SPACE
SLLSLSSLSL M
SLLSLSSLLS N
SLLSLSLSSL O
SLLSLSLSLS P
LSSLSLSLSL Q
LSSLSLSLLS R
LSSLSLLSSL S
LSSLSLLSLS T
LSSLLSSLSL U
LSSLLSSLLS V
LSSLLSLSSL W
LSSLLSLSLS X
LSLSSLSLSL Y
LSLSSLSLLS Z
LSLSSLLSSL *
LSLSSLLSLS +
LSLSLSSLSL ,
LSLSLSSLLS
LSLSLSLSSL .
LSLSLSLSLS /
______________________________________
As can be seen from table 1, the marking codes for most characters have been chosen to ensure a maximum of two consecutive long or short spaces, except that the characters 0-9. #and space begin with three long spaces or three short spaces. This scheme allows the beginning of the characters starting with three long spaces or three short spaces to be located unambiguously even if the starting position of the coding is not known, for example when only a fragment of the document has been recovered. For this reason it is desirable that marking strings include at least one space or numeric character. It is another feature of this embodiment that spaces following punctuation are ignored by the marking function, this being desirable to avoid ambiguity arising from multiple spaces commonly used after punctuation. This encoding scheme will be appreciated fully by reference to the following example in which the name "J Smith" is encoded into a document, shown in its original form in FIG. 2. The marking codes (as per Table 1) corresponding to the characters to be encoded are:
______________________________________
J SLLSSLSLLS
SPACE LLLSLSLSLS
S LSSLSLLSSL
M SLLSLSSLSL
I SLLSSLSLSL
T LSSLSLLSLS
H SLSLLSLSLS
SPACE LLLSLSLSLS
J SLLSSLSLLS
SPACE LLLSLSLSLS
S LSSLSLLSSL
M SLLSLSSLSL
I SLLSSLSLSL
T LSSLSLLSLS
H SLSLLSLSLS
(repeat for entire document).
______________________________________
Applying this sequence of long and short spaces to a document yields a marked document as illustrated in FIG. 3, in which for clarity the asterisks indicate the positions of long spaces. The marked document is illustrated in FIG. 4. Operation of this embodiment of the invention will be understood fully by referring to the flow diagram of FIG. 5, in which the following abbreviations are used:
______________________________________
I.F. Input file
M.S.F. Marking string file
E.O.R. End of record
CHAR Current character
PREV Previous character
E.O.T. End of text
PATTERN 10 bit marking code pattern
BIT COUNT Number of bits of PATTERN remaining to be
used
DEC Decrement
M.S.B. Most significant bit
______________________________________
Referring to FIG. 5, the process begins with the creation of the first output file. The first character is then read from the input file, and a test performed to determine whether the character read (CHAR) is the space character. If not, CHAR is written to the output file, CHAR is stored in a temporary location PREV, and a test is performed to see if the input file is at end of text. If not, the next character is read from the input file, and this process is repeated until a space character is encountered. When a space is encountered, PREV is tested to see whether the previous character was a punctuation character. If it was, the space is ignored. If not, a test is performed to see if BIT COUNT has reached zero, indicating that the end of the 10 bit marking code pattern has been reached. If not, the most significant bit of PATTERN is then tested, and if it is set (=1) a long space is required to be written to the output. In this embodiment, the long space is generated by using two consecutive space characters. This is achieved by writing the space (CHAR) to the output before returning to the main loop which writes it a second time. If the M.S.B. of PATTERN is not set, this extra write is not performed. The BIT COUNT is then decremented and the PATTERN shifted one bit to the left, in preparation for the next iteration. If, on encountering a space in the input file, the test BIT COUNT=0 is true, the next byte of the marking string file is read. If the marking string file is at the end of a record, the file is reset to the beginning of the same record, so that the marking string will be repeated throughout the pass of the input file. If it is not at the end of the record, the byte read from the marking string file is used to obtain a 10 bit marking pattern using a lookup table containing the data of Table 1, and BITCOUNT is reset to 10. When the test "I.F. AT E.O.T.?" returns true, the M.S.F. is set to the beginning of the next record, so that the next marking string will be used to mark the next file generated. A test is then performed to see if the M.S.F. is at E.O.T. A true result indicates that all required marked copies have been generated, in which case the process ends. If M.S.F. is not at E.O.T., another output is created, and the process repeats, generating an output file with the markings dictated by the next marking string of the M.S.F. The marking of a particular document can be decoded manually, by observing the pattern of long and short spaces in the document and looking up the corresponding character codes as per Table 1. Alternatively, the source of a marked document can be identified by optically comparing the document with a set of copies made prior to circulating the document and identifying the one which matches. For identification, the copies should be labelled with the name of the recipient. For better security, it may be desired not to keep copies of the circulated documents, but to generate a new marked set for comparison purposes should the need for identification arise. Visual comparison can be aided by producing a transparency of the document which can be used to overlay the copy to be compared. If desired, the decoding process can be automated, using a document scanner to input the document to be decoded to a computer which can then decode the marking, for example by ascertaining the spacing between words. In some embodiments of the invention, certain additional features are provided for the purpose of streamlining the production of multiple marked documents. One such feature is additional software which provides the ability to command the computer to automatically use a recipient list (marking string file of the above-described embodiment) as a source of names to be inserted into a prescribed merge field of a document. This is useful for automatically annotating each marked copy with an identifying message. For example, it may be desired to print the message "THIS DOCUMENT IS MARKED AND UNIQUE TO J. SMITH" at the head of each version. In this case, the name (J. SMITH) would take the form of a merge field in the original document, the name being automatically inserted in the output text as part of the marking process of the invention. In another embodiment of the invention, the basic marking scheme of the invention is used to encode a sub-text of arbitrary length within the spaces between words of a document. Such an embodiment is shown schematically in FIG. 6. Referring to FIG. 6, a marking function 63, as described in the context of the embodiment above, is applied to a text file 61 to produce a marked document 64, except that in this case the marking corresponds to the characters of sub-text file 62. Sub-text file 62 contains a message of arbitrary length which is encoded into the marked document, instead of individual short strings used in the marking string file 6 of FIG. 1. Using this or similar embodiments, the invention can be used to convey information within the spaces between words, at a density of approximately one character per ten words. Another useful extension of the invention is its application to word-processing systems in which a plurality of users may have access to a document. One example of such an application is a multi-terminal word-processing system incorporating electronic mail facilities whereby a document can be circulated to a number of people in electronic form, that is, without printing on paper. Another example is a computer to which a number of users have access. In these and other cases, the invention can be used to deter those with access to the document from printing a copy and disclosing it to unauthorised persons. To achieve this, the invention can be adapted so that text files used by the word-processing system carry within them an indicator that the document which they represent is to be marked when printed, this indicator being accessible only to the author of the document. The printing software of the data processing system is arranged so that if any person instructs that the document be printed, the document will be marked, for example with the name of the person requesting the print. The name can for convenience be automatically retrieved from a file containing a correspondence between the password of a user of the system and that user's name. Alternatively, the marking can be determined by a marking string designated by the original author of the document at the time it is electronically mailed to each person, the string and the indicator that the document is to be marked when printed being linked to the file when mailed. While the invention has been described with reference to particular embodiments thereof, it will be understood by those skilled in the art that changes in the form and detail may be made without departing from the scope and spirit of the invention. The marking process of the invention is independent of the nature of the document originating means and document printing means and it is anticipated that the invention can be realised in many ways other than those specifically mentioned herein. In particular, the invention can be realised as an integral part of a word-processing system, by adding suitable software to the word-processing software, or it can be realised as a stand-alone device interposed between a source of text data, such as a word processor, and a printer, or it can be realised as a printer adapted to carry out marking according to the invention. It will also be understood that the scheme for relating a given marking code to a given sequence of inter-word space variations utilised by the embodiments described above are exemplary only and many other schemes, obvious to those skilled in the art, can be used without departing from the scope of the invention. Furthermore, whereas the inventor believes that the spacing between words is the most suitable characteristic of a document to modify for the conveyance of the marking of the invention, it is envisaged that other characteristics of the formatting or visual presentation can be varied in like fashion without departing from the scope of the invention. For example, the spacing between letters can be varied as well as or instead of the spacing between words, or the typestyle or weight of individual characters or words can be varied according to the principle of the invention. It is also envisaged that in cases where it is desired to make it readily apparent that a document has been marked, one or more printing characters can be used instead of or as well as variations of spacing between words. For example, in the case of the embodiment described above in which two consecutive spaces are used to generate a long space, a space followed by an asterisk can be used, yielding a marked document similar in appearance to the example of FIG. 3. A variety of other extensions of the invention are envisaged to accommodate special formatting requirements, such as right justification of text, in which case special care must be taken to ensure that the marking process does not adversely affect the appearance of the document. In the case of right justification specifically, some spaces between words can be shortened to ensure that the total line length remains unchanged. It is further envisaged that whereas the embodiments described above utilise marking information provided by an operator, other adaptations of the invention can be provided whereby the marking codes can be automatically generated by the invention, for example by forming an ascending number sequence, or a sequence of random numbers, thereby assuring that each copy of a document is distinctively marked, without requiring the operator to provide specific marking information. In such cases it is desirable to maintain a set of duplicates of the documents before circulation for identification purposes. Other embodiments are possible in which the marking information is taken from a data field already serving another purpose within the memory of the data processing apparatus. For example the invention can be made to use the time and date information commonly resident in memory as the marking information, with the result that each copy of a marked document generated will be marked with the time and date at which the document was generated. Other fields such as the name of the author of the document, operator's password, or addressee's name can also be used in like manner. Whereas the embodiments described herein refer to the document as being marked throughout its text, it is also possible to use the invention to mark only a selected portion of the text. It will also be understood that whereas the exemplary embodiments described herein refer to the marking process taking place immediately prior to printing a document, the invention can also be beneficially applied for marking documents in electronic form, that is, documents in the form of files of data which may or may not be printed to form hard-copy at a later time. INDUSTRIAL APPLICABILITY The invention is particularly beneficial when used as part of a word-processing system, in which case the operator can request a number of copies of a confidential document to be printed, each being uniquely marked so as to identify the recipient. The invention provides a means for reducing the incidence of unauthorised distribution of confidential documents. The invention can also be used to provide marking of any computer-printed information, such as business reports. The invention also finds application in marking of documents to deter breach of copyright. The invention is also useful for encoding messages within the formatting of a document. For example, using this invention, a book could be produced which conveys within the formatting of the words a sub-text which is only readable by those possessing the knowledge of the method of decoding the marking, while not detracting from the readability of the text.
|
Same subclass Same class Consider this |
||||||||||
