Document marking system employing context-sensitive embedded marking codes5467447Abstract A system for generating documents which, as well as presenting the usual alpha-numeric text information, contain a distinctive marking. The marking scheme of the invention can convey identifying information which, in the event of a document coming into the possession of an unauthorized person, allows a particular copy of a document to be traced to its source. The invention can also be used to distinctively identify each of a number of photocopies made from an original document. Claims I claim: Description TECHNICAL FIELD
TABLE 1
______________________________________
TABLE OF CHARACTER CODES
______________________________________
SLSLSLSLSL A SSSLSLSLSL 0
SLSLSLSLLS B SSSLSLSLLS 1
SLSLSLLSSL C SSSLSLLSSL 2
SLSLSLLSLS D SSSLSLLSLS 3
SLSLLSSLSL E SSSLLSSLSL 4
SLSLLSSLLS F SSSUSSLLS 5
SLSLLSLSSL G SSSLLSLSSL 6
SLSLLSLSLS H SSSLLSLSLS 7
SLLSSLSLSL I LLLSSLSLSL 8
SLLSSLSLLS J LLLSSLSLLS 9
SLLSSLLSSL K LLLSSLLSSL #
SLLSSLLSLS L LLLSLSLSLS SPACE
SLLSLSSLSL M
SLLSLSSLLS N
SLLSLSLSSL O
SLLSLSLSLS P
LSSLSLSLSL Q
LSSLSLSLLS R
LSSLSLLSSL S
LSSLSLLSLS T
LSSLLSSLSL U
LSSLLSSLLS V
LSSLLSLSSL W
LSSLLSLSLS X
LSLSSLSLSL Y
LSLSSLSLLS Z
LSLSSLLSSL *
LSLSSLLSLS +
LSLSLSSLSL ,
LSLSLSSLLS -
LSLSLSLSSL .
LSLSLSLSLS /
______________________________________
As can be seen from table 1, the marking codes for most characters have been chosen to ensure a maximum of two consecutive long or short spaces, except that the characters 0-9, # and space begin with three long spaces or three short spaces. This scheme allows the beginning of the characters starting with three long spaces or three short spaces to be located unambiguously even if the starting position of the coding is not known, for example, when only a fragment of the document has been recovered. For this reason it is desirable that marking strings include at least one space or numeric character. The marking can be applied by means of varying the spaces between words, characters or lines of text, according to the nature of the document. For example, for plain text in the usual paragraph form, varying spacing between words is generally the most visually acceptable means of marking. However, where text takes the form of tabulated columns, such variations can spoil the appearance of the text, and in such circumstances it may be desired to use variation of another class of text elements, such as inter-character spacing or inter-line spacing. The selection of which type of space variations are used can be made responsive to the content of the surrounding document. For example, inter-line space variation can be selected automatically if the line of text being marked contains tabulation characters or if the positioning of words on the page is determined to be characteristic of tabular text. Alternatively, selection of class of text elements to which spacing variations are applied can be controlled by directives embedded in the document. When varying inter-word spacing, this embodiment of the invention is arranged so that spaces following punctuation are ignored by the marking function, in order to avoid ambiguity arising from multiple spaces commonly used after punctuation. By way of illustration, FIGS. 3 and 4 show the appearance of the different types of marking provided by this invention when applied to the sample document shown in FIG. 2. As seen in FIG. 2, the sample document comprises a paragraph of text followed by a table. This document is typical of that produced by a conventional word-processor, where the spacing between characters, words and lines is constant. As seen in FIG. 3, the document has been marked using the invention. The paragraph of text has been marked by varying the inter-word spacing, and the table has been marked by varying the inter-line spacing. As seen in FIG. 4, the paragraph of text has been marked by varying the inter-word spacing, and the table has been marked by varying the inter-character spacing. Operation of this embodiment of the invention will now be further explained with reference to a number of flow diagrams, in which the following abbreviations are used:
______________________________________
I.F. Input file
M.S.F. Marking string file
E.O.R. End of record
CHAR Current character
PREV Previous character
E.O.T. End of text
PATTERN 10 bit marking code pattern
BIT COUNT Number of bits of PATTERN remaining to be
used
DEC Decrement
M.S.B. Most significant bit
L.F. Line feed character
______________________________________
Referring to FIG. 5, the process begins with the creation of the first output file. The first record of the marking string file (M.S.F.) is then read. Each record of the M.S.F. corresponds to one string that is to be marked into each copy of the document output, that is, the number of records in the M.S.F. determines the number of marked copies that will be generated. The input file (I.F.) is then read, marked with the current marking string, and output as a marked file. This process is repeated until the M.S.F. reached end of file (E.O.F.). Within the process shown as "Mark I.F." are a number of sub-processes arranged to vary the type of marking applied to the document according to user instructions or according to the format of the text being marked. For example, in one preferred embodiment, variation of spacing between words, characters or lines of text can be used to convey the marking, the most appropriate method being controlled by directives placed within the text. Examples of such directives are:
______________________________________
<<on>> turn marking on
<<off>> turn marking off
<<char>> use inter-character spacing for marking
<<word>> use inter-word spacing for marking
<<line>> use inter-line spacing for marking
______________________________________
These directives are not written to the output file, but instead are used as directives to the marking routines to switch to the requested mode from the point at which the directive occurs in the text. Details of the various marking processes of this embodiment will now be given with reference to FIGS. 6-9. The flow diagram of FIG. 6 corresponds to the marking process which varies the inter-word spacing of the text. The M.S.F contains a number of records each of which corresponds to a text string to be marked into a copy of the document. Each ASCII character of the marking strings corresponds to a 10 bit marking pattern, as described above. The marking process is performed on a character-by-character basis. As each character is read from the input file, a test performed to determine whether the character read (CHAR) is the space character. If not, CHAR is stored in a temporary location PREV, CHAR is written to the output file, and a test is performed to see if the input file is at end of text. If not, the next character is read from the input file, and this process is repeated until a space character is encountered. When a space is encountered, PREV is tested to see whether the previous character was a punctuation character. If it was, the space is ignored. If not, a test is performed to see if BIT COUNT has reached zero, indicating that the end of the 10 bit marking pattern has been reached. If not, the most significant bit of PATTERN is then tested. If it is set (=1) a large space is written to the output, if it is clear (=0) a short space is written to the output. The BIT COUNT is then decremented and the PATFERN shifted one bit to the left, in preparation for the next iteration. If, on encountering a space in the input file, the test BIT COUNT=0 is true, the next byte of the marking string file is read. If the marking string file is at the end of a record, the file is reset to the beginning of the same record, so that the marking string will be repeated throughout the pass of the input file. If it is not at the end of the record, the byte read from the marking string file is used to obtain a 10 bit marking pattern using a lookup table containing the data of Table 1, and BITCOUNT is reset to 10. When the test "I.F. AT E.O.T.?" returns true, the M.S.F. is set to the beginning of the next record, so that the next marking string will be used to mark the next file generated. A test is then performed to see if the M.S.F. is at E.O.T. A true result indicates that all required marked copies have been generated, in which case the process ends. If M.S.F. is not at E.O.T., another output is created, and the process repeats, generating an output file with the markings dictated by the next marking string of the M.S.F. In this embodiment, the inter-word space modifying routine is arranged so that the overall line length is not changed by the marking function. This is particularly important in the case of right-justified text, where it is desirable to maintain a straight fight margin. The system by which this is achieved will now be described with reference to FIG. 7, which is a program listing in the BASIC language. This routine is called once a line of text has been processed by the marking scheme described above, which has nominated long or short spaces for each of the inter-word spaces in the line and built an array (modarray) of space-size indicators, in this example "L" for long or "S" for short. The routine of FIG. 7 first builds an array of space sizes (spacearray) which contains the size of each space in the line of the input file (current.sub.-- space) and calculates the total size of space in the line (total.sub.-- space) by summing each element of spacearray. Next the modulation of spaces is performed by increasing the size of spaces where the corresponding element of modarray is "L" and decreasing the size of spaces where the corresponding element of modarray is "S". The amount of change introduced is determined by the constant "factor", which in this embodiment can be selected by the user depending on the required degree of modulation of spaces. Values for "factor" can range from 0, which results in no modulation, to 1, which results in words touching each other where the space between them is "short". Using this embodiment, a factor of 0.3 has been found to yield good results. The process described so far will in most cases change the total line length, and it is necessary to further adjust the spaces to return the total amount of space, and hence the line length, to the original value. This is achieved by the last two steps of FIG. 7. First, the change of line length is calculated by subtracting the original total space size from the new total space size. This error is then divided by the number of spaces in the line and the result added to each space, so that the correction is distributed evenly throughout the spaces of the line. This space-modulating process is applicable to text processing systems which allow spaces between words to be finely controlled, for example using a page description language. In other cases, such as where only fixed-size spaces are available, the long space can be generated by using two consecutive space characters. When it is desired to mark the document by varying inter-line spacing, the scheme shown diagrammatically in FIG. 8 is used. The process described in FIG. 8 is similar to that described above in relation to FIG. 6, except that the detection of line feed characters is used to invoke the marking routines, instead of space characters. Also, it is not necessary in this case to test whether the character prior to the line feed was a punctuation character. Modulation that does not effect the overall page length can be achieved in a method similar to that described above in relation to inter-word space modulation. When it is desired to mark the document by varying inter-character spacing, the scheme shown diagrammatically in FIG. 9 is used. The process described in FIG. 9 is similar to that described above in relation to FIG. 6, except that detection space modulation is performed after every alpha-numeric character (A-Z, 0-9). Modulation that does not effect the overall page length can be achieved in a method similar to that described above in relation to inter-word space modulation. In some embodiments of the invention, certain additional features are provided for the purpose of streamlining the production of multiple marked documents. One such feature is additional software which provides the ability to command the computer to automatically use a recipient list (marking string file of the above-described embodiment) as a source of names to be inserted into a prescribed merge field of a document. This is useful for automatically annotating each marked copy with an identifying message. For example, it may be desired to print the message "THIS DOCUMENT IS MARKED AND UNIQUE TO J. SMITH" at the head of each version. In this case, the name (J. SMITH) would take the form of a merge field in the original document, the name being automatically inserted in the output text as part of the marking process of the invention. Another useful extension of the invention is its application to word-processing systems in which a plurality of users may have access to a document. One example of such an application is a multi-terminal word-processing system incorporating electronic mail facilities whereby a document can be circulated to a number of people in electronic form, that is, without printing on paper. Another example is a computer to which a number of users have access. In these and other cases, the invention can be used to deter those with access to the document from printing a copy and disclosing it to unauthorized persons. To achieve this, the invention can be adapted so that text files used by the word-processing system carry within them an indicator that the document which they represent is to be marked when printed, this indicator being accessible only to the author of the document. The printing software of the data processing system is arranged so that if any person instructs that the document be printed, the document will be marked, for example with the name of the person requesting the print. The name can, for convenience, be automatically retrieved from a file containing a correspondence between the password of a user of the system and that user's name. Alternatively, the marking can be determined by a marking string designated by the original author of the document at the time it is electronically mailed to each person, the string and the indicator that the document is to be marked when printed being linked to the file when mailed. Although the effectiveness of the present invention is generally not diminished by the ease with which anyone can detect and decode the marking of a document it has generated, it is desirable in some cases to prevent unauthorized persons decoding a marking. In some cases it may also be desirable to prevent unauthorized persons from making a marked document with a marking identical to that of another. For example, it is possible that a person possessing the present invention could generate a document marked with someone else's name and then use that copy improperly, with the result that the person whose name was used would be blamed. To overcome these potential abuses, the invention can be further extended to encrypt the marking using an encryption key known only to the authorized user. This modification is shown diagrammatically in FIG. 10. This flow diagram shows the basic document marking scheme, as described in relation to FIG. 5 above, and includes the further step of encrypting each record of the marking string file prior to applying the marking to the document. The encryption system can be any of the schemes well known in the art. The encryption key is known only by the authorized user, and must be applied to the decoded spacing variation data to recover the correct marking information from a marked document. It will be understood that this scheme of encrypting is only one of many that will achieve the desired result. For example the encryption could also be applied with good results to the "look up pattern" step of FIG. 6. In another embodiment of the invention, the basic marking scheme of the invention is used to encode a sub-text of arbitrary length within the spaces between words of a document. Such an embodiment is shown schematically in FIG. 11. Referring to FIG. 11, a marking function 23, as described in the context of the embodiment above, is applied to a text file 21 to produce a marked document 24, except that in this case the marking corresponds to the characters of sub-text file 22. Sub-text file 22 contains a message of arbitrary length which is encoded into the marked document, instead of individual short strings used in the marking string file 6 of FIG. 1. Using this or similar embodiments, the invention can be used to convey information within the spaces between words, at a density of approximately one character per ten words. Although the invention is very useful when realized as a word-processing system, it is also envisaged that the inventive concept can be adapted for use with other document processing or document reproducing devices. For example, in many cases it is desirable to provide a system for marking documents which have already been printed as hard copy. Another useful embodiment of the invention which achieves this object will now be described with reference to FIG. 12, which shows in schematic form a photocopier adapted to perform document marking according to the invention. Referring now to FIG. 12, copier 31 is a document copying machine of the type now commonly referred to as a "digital copier". Digital copiers commonly comprise electro-optical scanning means coupled to an electronically controlled print engine, using a scanning laser, light-emitting diode array or liquid crystal array to form an image on a drum which is then transferred to paper. In this embodiment of the present invention, as well as comprising scanner 32 and print engine 36, copier 31 is equipped with image processor 33 which in turn comprises text element detector 34 and marking processor 35. When a marked copy of a document is to be made, the image data output by scanner 32, which usually feeds print engine 35, is routed instead through image processor 33. Within image processor 33, the image data is first acted on by text element detector 34, which is arranged to group objects on the scanned page into elements such as characters, words and lines. There are many techniques for achieving this, well known to the optical character recognition art. For best results, a scheme which works irrespective of orientation of the document relative to the scanner is used. Once the elements of interest have been identified, a marking process is performed by marking processor 35, according to the methods of the present invention. For example, if inter-word spacing modulation has been selected as the marking method for a particular document, text element detector 34 is arranged to group elements into words and marking processor 35 then moves the words slightly relative to each other in the plane of the lines of text to achieve the lengthened and shortened spaces required. The choice of class of text elements to which space variation will be applied can be effected manually by an operator or software which determines the most suitable form of marking according to the format of the document being copied can be provided within image processor 33. When marking is complete, the image data is transmitted from image processor 33 to print engine 36, and the image is printed on paper. To aid in identifying a particular copy, the marking text can also be printed in a convenient position on the document, for example, at the foot of each page. The marking string used by this embodiment of the invention can be input in a number of ways. For example, the operator can key in a text string or serial number using a keyboard, or load in list of recipients, to be used as marking strings, from a floppy disk. Another useful variant is to provide further processing means within the copier of this embodiment so that a list of recipients can be entered for use by marking processor 35 by scanning a printed list on the copier. In this case optical character recognition software within image processor 33 reads the recipient list off the document in bitmapped image form, converts the image to text, and uses this text as the marking strings. One adaptation of this embodiment of the invention is directed at the problem of preventing breach of copyright in places such as libraries, where people have access to books although it is common to provide photocopier facilities for uses which constitute fair use, it is desirable to curtail use of the photocopier in ways which would breach copyright. For such applications, the present invention is provided with an electronic locking system that allows the copier to operate only after a code number or word is entered by the user. The information entered is then used as the marking string, according to the scheme described above. One way of entering the code is to use a coded card, such as a magnetic-stripe credit card, which carries identification unique to the cardholder. Alternatively, the locking system can be adapted so that codes must conform with certain requirements to be accepted as valid, such valid codes being issued by an authorized person at the library and keyed in by the user or remotely entered by the authorized person. For example, the generation of valid codes can be accomplished by an algorithm which combines the user's name with an encryption key and produces a large but finite set of valid outputs, so that the copier's controller can detect entered codes which have not been legitimately generated. In another useful variant of the embodiment of FIG. 12, image processor 33 is further adapted to apply pseudo-random variations in spacing between elements of text so that each copy produced is distinctively marked. The pseudo-random algorithm is arranged so that aesthetic constraints are not violated, for example, total line lengths are maintained and tables or columns are not unduly disturbed. This embodiment has an advantage in circumstances it is not desirable or practical to enter a recipient list into the device. In this case, to enable documents to be traced to their source, the system can be arranged to generate two identical copies of each pseudo-randomly marked copy. When the documents are distributed, the name of each recipient is written on the duplicate and the duplicates are filed in case it is later necessary to match them up to a particular marked version. Alternatively, the invention can be adapted to mark copies with a unique serial number, the number being generated automatically and output so that a record of to whom a copy with a given serial number was given can be kept. As well as application to photocopiers, the present invention is of great value when applied to other document processing or reproducing systems, including facsimile machines. In many cases, the benefits of the marking scheme extend beyond the ability to monitor distribution of confidential or copyright documents. For example, when used with facsimile transmissions, the marking can be used to authenticate documents. In this case, encrypted marking can be used for extra security. Irrespective of the embodiment of the present invention used to mark the document, there are a number of schemes which can be used to decode the marking of a particular document. In the simplest case, the marking can be decoded manually, by observing the pattern of long and short spaces in the document and looking up the corresponding character codes as per Table 1. The process can be partially automated by providing a means for decoding the marking information from spacing information determined visually and input into a processing device programmed to perform the inverse of the marking process. Alternatively, the source of a marked document can be identified by optically comparing the document with a set of copies made prior to circulating the document and identifying the one which matches. For identification, the copies should be labelled with the name of the recipient. For better security, it may be desired not to keep copies of the circulated documents, but to generate a new marked set for comparison purposes should the need for identification arise. Visual comparison can be aided by producing a transparency of the document which can be used to overlay the copy to be compared. If desired, the decoding process can be automated, using a document scanner to input the document to be decoded to a computer which can then decode the marking, by ascertaining the spacing between words, characters or lines as appropriate. The embodiment of FIG. 12 can readily be adapted for decoding markings, thereby providing a photocopier which outputs a page printed with the text of the decoded marking when presented with a page of a marked document. RAMIFICATIONS AND SCOPE This invention provides a useful and novel system for curtailing unauthorized distribution of documents, identifying individual ones of multiple copies of a document, authenticating documents, deterring breach of copyright of documents, and many other applications. The system is simple to use and can be conveniently and inexpensively combined with other document production equipment such as word-processors, data-processing systems and photocopiers. It can also be used to convey a secondary message within a potentially unrelated text. Further advantages are the ability to directly convey, within the document, the name of a person to whom a document has been entrusted, the name of a person making a photocopy, or the name of a person instructing a computer to produce hard-copy of a document. The marking scheme is very versatile and robust. Being distributed throughout the document it is practically impossible to remove the marking without re-keying the entire text. Unlike prior-art marking schemes which rely on changes of character formation or other subtle idiosyncrasies, the marking provided by this invention is conveyed intact in spite of blurring, reductions, facsimile transmission, or other forms of image degradation. Using prior-art marking schemes, it has generally been necessary to impart obvious markings on the subject document, lest the marking be lost by such degradation. The invention achieves all the above objectives with minimal disturbance to the appearance of the document. While the invention has been described with reference to particular embodiments thereof, it will be understood by those skilled in the art that changes in the form and detail may be made without departing from the scope and spirit of the invention. The marking process of the invention is independent of the nature of the document originating means and document printing means and it is anticipated that the invention can be realized in many ways other than those specifically mentioned herein. In particular, the invention can be realized as an integral part of a word-processing system, by adding suitable software to the word-processing software, or it can be realized as a stand-alone device interposed between a source of text data, such as a word processor, and a printer, or it can be realized as a printer adapted to carry out marking according to the invention. It will also be understood that the scheme for relating a given marking code to a given sequence of space variations utilised by the embodiments described above are exemplary only and many other schemes, obvious to those skilled in the art, can be used without departing from the scope of the invention. It is also envisioned that in cases where it is desired to make it readily apparent that a document has been marked, one or more printed characters can be used instead of or as well as variations of spacing between words. For example, in the case of the embodiment described above in which two consecutive spaces are used to generate a long space, a space followed by an asterisk can be used, yielding a marked document with asterisks distributed in a distinctive pattern throughout. Although this technique significantly affects the appearance of the text, it is nevertheless useful in cases where it is desirable that the marking be highly apparent, easy to decode, and unlikely to be obscured through tampering, blurring or otherwise distorting the image. It is further envisioned that whereas the embodiments described above utilize marking information provided by an operator, other adaptations of the invention can be provided whereby the marking codes can be automatically generated by the invention, for example, by forming an ascending number sequence, or a sequence of random numbers, thereby assuring that each copy of a document is distinctively marked, without requiring the operator to provide specific marking information. In such cases it is desirable to maintain a set of duplicates of the documents before circulation for identification purposes. Other embodiments are possible in which the marking information is taken from a data field already serving another purpose within the memory of the data processing apparatus. For example the invention can be made to use the time and date information commonly resident in memory as the marking information, with the result that each copy of a marked document generated will be marked with the time and date at which the document was generated. Other fields such as the name of the author of the document, operator's password, or addressee's name can also be used in like manner. It will also be understood that whereas the exemplary embodiments described herein refer to the marking process taking place immediately prior to printing a document, the invention can also be beneficially applied for marking documents in electronic form, that is, documents in the form of files of data which may or may not be printed to form hard-copy at a later time. Whereas the invention is described herein in relation to marking documents comprising characters, words and other text objects, the invention can equally well be used to mark documents comprising any indicia, including icons, special symbols, pictures, glyphs and the like.
|
Same subclass Same class Consider this |
||||||||||
