Digital reference hyphenation matrix apparatus for automatically forming hyphenated words4028677Abstract Apparatus for automatic hyphenation of input words from a word processing system. The apparatus includes a digital reference matrix memory containing a vector representation of all legal hyphenations for each dictionary word in the form of a calculated magnitude and associated unique vector angles. The vector magnitude constitutes the address data for accessing the memory. When an input word is received for hyphenation, a hyphen is added to the word and its magnitude is calculated. The memory is accessed for an address which equals the calculated magnitude. If the address is not found, a signal is generated indicating that the word cannot be legally hyphenated. If the address is found, then the corresponding angles, representing legal hyphenations of the input word, are compared with test words generated by sequentially inserting hyphens in the input word. All equal compares are flagged and the corresponding hyphenated input words are gated onto the output line. Claims I claim: Description BACKGROUND OF THE INVENTION
Table 1.
__________________________________________________________________________
Numeric Extraction of Alpha Field
__________________________________________________________________________
A = 1, B = 2, C = 3, D = 4, E = 5, F = 6,
G = 7, H = 8, I = 9, J = 10, . . . , Z = 26
Step 1
##STR1##
Step 2
##STR2##
Angle Magnitude = Function of Characters
in Word
##STR3##
Angle = Function of Character Position
##STR4##
__________________________________________________________________________
where R is the reference vector for each word length (M) comprising an "N"-tuple of linearly independent terms corresponding to the position of each letter in the word e.g., .sqroot.2, .sqroot.3, .sqroot.5, .sqroot.6, . . . .sqroot.J or log 3, log 5, log 7, log 11, . . ., log K, where .sqroot.J is irrational and K is a prime number, and with .vertline.R.vertline.= .sqroot.(.sqroot.2).sup.2 +(.sqroot.3).sup.2 +(.sqroot.5).sup.2 + . . . (.sqroot.J).sup.2, etc. Since the hyphen is considered as a legal character with a numeric value assigned, no matter where the hyphen resides in a word, the word's magnitude value remains unchanged. This follows from the magnitude being just a sum of integers, and hence, the order of summation is immaterial. The angle, however, changes uniquely based on the location of the hyphen within a word. Hence, all hyphenation possibilities for a given word are stored in one row of the memory 38 in FIG. 2 by using the magnitude as an address and storing all the corresponding angles at the address. DESCRIPTION OF APPARATUS The digital reference hyphenation matrix apparatus is shown in FIG. 1. A word to be hyphenated enters candidate word buffer 4 over input line 2. An end of word detector 3 connected to input line 2 detects the end of the input word and activates gate 8 to pass the candidate word plus a hyphen from hyphen adder 6 into the conversion memory 10. The end of word detector 3 determines the end of word code from the input means connected to input line 2. For example, if the input means were a keyboard, then a space, tab, or carrier return would indicate the end of a word. End of word detectors are known in the prior art. An example of a suitable end of word detector which could be used in the present invention is disclosed in FIG. 5C and column 11 of U.S. Pat. No. 3,537,076, issued Oct. 27, 1970, to F. J. Damerau, assigned to the same assignee as the present invention and hereby expressly incorporated herein by reference. The hypen adder 6 contains a permanent representation of the hypen which it adds to the end of the input word from the canidate word buffer 4. The conversion memory 10 is a read only memory in the preferred embodiment which contains the alpha numeric equivalency scheme which relates the alphabetic characters and the hyphen with weighted numeric values as determined by the technique discussed under "Theory" and more fully disclosed in W. S. Rosenbaum referenced above. The numerical weighting value for a character "N" is designated L.sub.N. The conversion memory 10 outputs the value L.sub.N on the data bus 11. The accessing means 60 for addressing magnitude listings in the memory 38 comprises the multiplier 12, the adder 14, the register 16, and the magnitude register 17. The value of L.sub.N on the data bus 11 is squared in the multiplier 12 and added to the sum of previous squared values of L.sub.N in the alpha word under analysis by the adder 14 and register 16. The process of calculating the value of the sum of L.sub.N.sup.2 continues until all characters in the candidate word have been converted. At this time the final value of the sum of L.sub.N.sup.2 is loaded into magnitude register 17 as the address for the magnitude of the word in memory 38, based upon the values of L.sub.N assigned to the characters of which the input alpha word is composed. Magnitude register 17 is connected to memory 38 and accesses memory 38 for an address equal to the calculated magnitude for the hyphenated candidate word. If no address in memory 38 is found which equals the magnitude in magnitude register 17, gate 38 sets flip-flop 42 to indicate on output line 44 that the input word cannot be validly hyphenated. If the contents of magnitude register 17 do match an address in memory 38, the angles stored at the address in memory 38 are passed by gate 39 into angle storage buffer 45. A read only memory is preferred for memory 38 but a random access memory can also be used. The hyphenation test word generator 61 comprises vowel detector 25, counter 21, gate 23, vowel detector 5, counter 13, hyphen inserter 7, gate 9, and hyphenation test word buffer 15. Vowel detector 25 is connected to input line 2 and detects the presence of the first vowel in the input candidate word. Vowel detector 25 activates counter 21 which counts the number of characters in the input word after the first vowel. Counter 21 activates gate 23 to pass the input word from candidate word buffer 4 to hyphenation test word buffer 15 in accordance with the count stored in counter 21. Vowel detector 5 is connected to the output of gate 23 and detects the presence of the first vowel in the input word during the first pass of the input word from candidate word buffer 4 to hyphenation test word buffer 15. Both vowel detector 5 and vowel detector 25 compare the letters on their input streams to the known vowels and produce an output when an equal compare occurs. For example, the vowel detectors may comprise a read only storage means and a comparator. An example of a suitable vowel detector, which is expressly incorporated herein by reference, is disclosed in Cols. 21-22 and FIG. 5b of Damerau. Vowel detector 5 activates gate 9 to insert a hyphen from hyphen inserter 7 into the input word following the first vowel. The hyphen inserter 7 contains a permanent representation of the hyphen which is present at its output at all times and can be accessed by operating gate 9. An example of a hyphen inserter in the prior art is shown in FIG. 5L and functionally described in column 20 of Damerau incorporated herein by reference above. Vowel detector 5 also activates counter 13 to store the character count up to the first vowel. On subsequent passes of the input word from candidate word buffer 4 to hyphenation test word buffer 15, counter 13 activates gate 9 to insert a hyphen from hyphen inserter 7 into the test word one character later than on the previous pass. Hyphenation test word buffer 15 in FIG. 2 shows an example of the contents of the hyphenation test word buffer after generation of test hyphenations for the word "corporation." The angle calculation means 62 for the hyphenation test words comprises the counter 18, character position decode 19, multiplier 20, adder 22, register 24, multiplier 26, divider 28, arcsecant table 29, multiplier 30, adder 32, register 34, square root calculator 27, and square root calculator 36. The hyphenation test words are passed serially from hyphenation test word buffer 15 to counter 18. Counter 18 counts the position of the characters in the test word and causes character position decode 19 to output the present value of R.sub.N to the multiplier 20. The character position decode 19 decodes the position of the letter and the word to produce the predetermined value of R which corresponds to that position as discussed under Theory of Operation above. The counter 18 is reset to zero after each test word by the interword space code stored in the hyphenation test word buffer 15. Character position decodes are known in the prior art. For example, Damerau, incorporated by reference above, discloses such a character position decode in FIG. 5c with a functional description in column 24 thereof. The value of R.sub.N is a linearly independent number for each character position in the word. The value of L.sub.N on the data bus 11 is input to the multiplier 20 and multiplied times the present value of R.sub.N and the product is input to adder 22. Adder 22 and register 24 maintain the running sum of the products of L.sub.N times the R.sub.N for the test word under analysis. At the end of each test word, register 24 outputs the final sum of L.sub.N times R.sub.N to the divider 28. The present character position value R.sub.N is output from the character position decoder 19 to the multiplier 30 generating the value R.sub.N.sup.2 which is output to the adder 32. Adder 32 and register 34 maintain a running sum of the squares of R.sub.N and at the end of the test word the final sum of R.sub.N.sup.2 is output to the square root calculator 36. The square root calculator 36 takes the square root of the sum of the R.sub.N squares yielding the value .vertline.R.vertline. which is input to the multiplier 26. Square root calculator 27 calculates the square root of the sum of L.sub.N.sup.2 to yield .vertline.Y.vertline.. In the preferred embodiment, the square root calculators 27 and 36 are implemented as hard wired arithmetic devices. However, it is understood that programmed computers may be substituted for the hard wired devices without altering the scope of the invention. For example, the IBM Systems Reference Library manual number GC28-6596-4 entitled "IBM System/360 Fortran IV Library Subprograms," copyrighted in 1966, contains programs for a general purpose computer which can be used to calculate the square root. The square root subprogram is called by inserting SQRT(X), where X is the term whose square root is to be taken. An example of a hard wired square root calculator of the type used in the present invention in the prior art is disclosed in column 64 and shown in FIG. 5N of Damerau incorporated herein by reference above. Multiplier 26 multiplies the value of the magnitude of Y times the magnitude of R from the square root calculator 36 and outputs the product as the numerator to the divider 28. The value of the sum of the L.sub.N times R.sub.N which is input from the register 24 to the divider 28 serves as the denominator and the calculated quotient is secant of the angle for the test word which is output to the arcsecant calculator 29. In the preferred embodiment, arcsecant table 29 is a memory containing a reproduction of the secant-angle values from a standard trigonometry table and is used to look up the angle value corresponding to the secant value output by divider 28. The angle value output from the arcsecant table 29 is loaded into the hyphenation test word angle buffer 43. This calculation is repeated for each hyphenation test word in the hyphenation test word buffer 15. After all the hyphenation test word angles have been calculated, the angles in hyphenation test word angle buffer 43 are compared to the angles in angle storage buffer 45, which contains the legal hyphenation angles for the word under test from memory 38, in angle compare register 41. Angle compare register 41 generates a hyphenation flag for each equal compare and outputs the hyphenation flags to angle flag register 46. A flag decoder 48 is connected to the output of hyphenation flag register 46 and detects which flags are set in the hyphenation flag register. The flags may be represented by binary bits. Flag decoder 48 decodes the hyphenation flags in hyphenation flag register 46 and activates gate 50 to pass the hyphenation test words which match the flags from hyphenation test word buffer 15 onto output line 52. Flag decoders of the type used herein are well known in the prior art. For example, see FIG. 5F and column 16 of Damerau incorporated herein by reference above. The hyphenated words on output line 52 are the legal hyphenations for the input word. OPERATION: The operation of the automatic hyphenation insertion apparatus will be described with reference to FIGS. 1 and 2. A word to be hyphenated is received by the candidate word buffer 4 over input bus 2. Hyphen adder 6 adds a hyphen to the word and end of word detector 3 actuates gate 8 to pass the word plus the added hyphen to conversion memory 10. Conversion memory 10 serially outputs a numerical representation L.sub.N for each character in the word on bus 11. The value of L.sub.N on the data bus 11 is squared in multiplier 12 and added to the sum of previous squared values of L.sub.N by the adder 14 and register 16. The calculation of the value of L.sub.N.sup.2 continues until all characters in the candidate word have been converted. The final value of the sum of the L.sub.N.sup.2 is loaded into magnitude register 17 as the address for the magnitude of the candidate word in memory 38. Magnitude register 17 accesses memory 38 for an address equal to the calculated magnitude for the hyphenated candidate word. If no corresponding address is found in memory 38, gate 39 sets flip flop 42 to indicate on output bus 44 that the input word cannot be validly hyphenated. If the contents of the magnitude register 17 do correspond to an address in memory 38, the angles stored at that address in memory 38 are passed by gate 39 to angle storage buffer 45. Additionally, if the memory 38 contains angles corresponding to the calculated magnitude for the candidate word indicating that the candidate may be validly hyphenated, a set of hyphenation test words is generated and their angles calculated for comparison with the angles stored in memory 38 in order to determine the valid hyphenation points of the candidate word. Vowel detector 25 detects the presence of the first vowel in the input candidate word and activates counter 21 to count the number of characters in the input word after the first vowel. Counter 21 activates gate 23 to pass the input word from candidate word buffer 4 to hyphenation test word buffer 15 a number of times equal to the number of letters in the word after the occurrence of the first vowel. Vowel detector 5 detects the presence of the first vowel in the input word during the first pass of the input word from candidate word buffer to hyphenation test word buffer 15 and activates gate 9 to insert a hyphen from hyphen inserter 7 into the input word following the first vowel. Vowel detector 5 also activates counter 13 to store the character count up to the first vowel. On subsequent passes of the input word from candidate word buffer 4 to hyphenation test word buffer 15, counter 13 activates gate 9 to insert a hyphen from hyphen inserter 7 into the test word one character later than on the previous pass. Hence, a set of test words is generated similar to the set of hyphenated test words for "corporation" in hyphenation test word buffer 15 of FIG. 2. After all the possible hyphenation test words have been generated in hyphenation test word buffer 15, the hyphenation test words are passed serially from hyphenation test buffer 15 to counter 18. Counter 18 counts the position of the characters in the test word and causes character position decode 19 to output the character position value R.sub.N to multiplier 20. The output R.sub.N of the character position decode 19 for each character position in the input at the word is received by multiplier 20 and multiplied times the numerical representation for the corresponding character L.sub.N from conversion memory 10 and added by adder 22 to the contents of register 24 to provide a running sum of L.sub.N times R.sub.N. Also, the letter position value, R.sub.N, is input to multiplier 30 where it is multiplied times itself and added by adder 32 to the contents of register 34 to provide a running sum of the squares of R.sub.N. For each of the test words in the hyphenation test word buffer 15, the final sum of R.sub.n.sup.2 is output to the square root calculator 36, the contents of the magnitude register 17 are output to square root calculator 27, and the contents of register 24 are output to divider 28. The square root calculator 36 calculates the square root of the sum of the R.sub.N squares yielding the magnitude of the vector R and the square root calculator 27 calculates the square root of the magnitude, which is the sum of the L.sub.N squares, yielding the magnitude of the vector Y. Multiplier 26 multiplies the magnitude of the vector R times the magnitude of the vector Y and outputs the product as the numerator to divider 28. The value of the sum of the L.sub.N times R.sub.N which has been input from register 24 to the divider 28 serves as the denominator to divider 28 which produces the secant of the angle for the word. The quotient is output to the arcsecant table 29 where the corresponding angle is looked up. For each of the hyphenation test words, the arcsecant table 29 contains an angle value for the secant from divider 28 and outputs the angle value to the hyphenation test word angle buffer 43. After angle values have been determined and stored in hyphenation test word angle buffer 43 for all the hyphenation test words stored in hyphenation test word buffer 15, angle compare register 41 compares the value hyphenation angles stored in angle storage buffer 45 to the test word hyphenation angles stored in hyphenation test word buffer 43. A flag is generated for each equal compare and stored in hyphenation flag register 46. The contents of hyphenation flag register 46 are decoded by flag decoder 48 and activate gate 50 to pass the corresponding hyphenation test words from hyphenation test word buffer onto output bus 52. The hyphenated words on output bus 52 are the valid hyphenations for the output candidate word. While in the preferred embodiment of the invention the complete set of hyphenated test word angles is generated and compared in parallel to the stored set of legal hyphenation angles, it is well understood that the hyphenated word angles could just as easily be compared serially with the stored set of legal hyphenation angles. While the invention has been particularly shown and described with reference to the preferred embodiment thereof, it will be understood by those skilled in the art that the foregoing and other changes in form and detail may be made therein without departing from the spirit and scope of the invention.
|
Same subclass Same class Consider this |
||||||||||
