Abbreviating and compacting text to cope with display space constraint in computer software6279018
Abstract
This invention relates to text abbreviation methods to cope with display or print space constraint in computer software. In particular, abbreviation of text into predetermined field widths (with single or multiple rows), utilizing an operating system (121), an application program (122), and an abbreviation control data program (123), along with combinations of prioritized shortening methods in preference to or in addition to glossaries of acronyms and word abbreviations using an abbreviation function (127) are disclosed. The special handling of segments of input contained within pairs of pre-defined characters, as well as omission of spaces, and conversion of enumeration word or word sequences to numbers utilizing an abbreviation data file (124), a parameters sets file (125), and a parameters list (126), are also disclosed. The omission of spaces and phonetically less significant characters compacts word sequences, which saves display space and enables use of larger type sizes.
Claims
We claim:
1. A method for abbreviating text to cope with display or print space constraint in computer software such that loss of word recognizability is minimized, wherein said text includes a plurality of words, said space constraint is defined in terms of a predetermined abbreviated text length limit and said method comprises the steps of:
a) selecting one or more words from the text as being abbreviatable words;
b) shortening only those abbreviatable words whose length exceeds a predetermined minimum word length limit while the length of the text is in excess of the predetermined abbreviated text length limit, said shortening comprising at least one of:
(i) replacing a sequence of alphabets in any abbreviatable word with a shorter sequence, wherein said sequence of alphabets does not include the initial of the abbreviatable word; and
(ii) deleting one or more alphabets from any abbreviatable word, but excluding from deletion the initial of the abbreviatable word; and
c) truncating only those abbreviatable words whose length exceeds a predetermined minimum truncated word length limit while the length of the text is in excess of the predetermined abbreviated text length limit.
2. The method of claim 1 further comprising at least one of:
replacing in the text a sequence of words, being a phrase, with its corresponding commonly used acronym, if an entry containing the phrase and the acronym is found in a predetermined list; and
replacing in the text an abbreviatable word with its corresponding commonly used word abbreviation, if an entry containing the abbreviatable word and the word abbreviation is found in a predetermined list.
3. The method of claim 1 further comprising at least one of:
replacing in the text a sequence of words, being a phrase, with its corresponding acronym, if an entry containing the phrase and the acronym categorized as less commonly used is found in a predetermined list and the length of the text is greater than the predetermined abbreviated text length limit even after shortening the abbreviatable words in the text; and
replacing in the text an abbreviatable word with its corresponding word abbreviation, if an entry containing said abbreviatable word and said word abbreviation categorized as less commonly used is found in a predetermined list, such replacement yields greater reduction than the reduction that is obtained by shortening said abbreviatable word and the length of the text is greater than the predetermined abbreviated text length limit even after shortening the abbreviatable words in the text.
4. The method of claim 1 further comprising converting a continuous sequence of at least two enumeration words in the text, of which at least one of the enumeration words is a bundle word, into a shorter sequence using a predetermined list of enumeration words and corresponding abbreviations, wherein said bundle word is a single enumeration word which connotes a value greater than hundred and said shorter sequence contains at least four numeric characters.
5. The method of claim 4 further comprising step (a) and at least one of steps (b), (c), and (d):
a) replacing bundle word abbreviations with predetermined corresponding figures, each said corresponding figure comprising either a punctuation character or a sequence of one or more punctuation characters and numeric zero(s);
b) inserting into the sequence of converted figures a numeric one;
c) locating and deleting occurrences of superfluous numeric zero(s), if any, from the sequence of converted figures; and
d) inserting into the sequence of converted enumeration words one or more numeric zero(s).
6. The method of claim 1 further comprising truncating the abbreviated text finally, starting from the right end, until the text is reduced to the predetermined abbreviated text length limit or the entire text is dealt with, but excluding from truncation the initial alphabet of any word and at least one of:
a) any numeric character or decimal point;
b) any character contained in a predetermined set of non-deletable symbols; and
c) predetermined protected segments.
7. The method of claim 1 wherein the abbreviatable words on which the abbreviating steps are carried out include partially abbreviated words.
8. The method of claim 1 wherein the length of any word is the number of characters in the word.
9. The method of claim 1 wherein the selecting step includes:
locating sequences of one or more contiguous alphabets preceded by a space, punctuation or beginning of text and followed by a space, punctuation or end of text and recognizing such sequences as words; and
locating words containing at least two alphabets and no upper case alphabets other than the first alphabet and classifying such words as abbreviatable words.
10. The method of claim 1 wherein the replacing step comprises at least one of:
replacing a contiguous sequence of alphabets in any abbreviatable word with a shorter sequence of at least one alphabet, if an entry containing said contiguous sequence of alphabets and its corresponding shorter sequence is found in a predetermined list; and
replacing a sequence comprising a contiguously repeating consonant in any abbreviatable word with a shorter sequence of only one such consonant.
11. An abbreviated text generated by employing the method in claim 10.
12. The method of claim 10 wherein the replaced shorter sequence is identified so that said shorter sequence is not further shortened using the shortening step subsequently.
13. The method of claim 1 wherein the deleting step comprises deleting a contiguous sequence of one or more vowels from any abbreviatable word, provided said contiguous sequence is deleted entirely and the length of said abbreviatable word after deleting said contiguous sequence would not become less than the predetermined minimum word length limit.
14. An abbreviated text generated by employing the method in claim 13.
15. The method of claim 1 wherein the truncating step includes at least one of:
truncating only the truncatable part of every abbreviatable word in an approximately equal proportion such that the text is reduced to the predetermined abbreviated text length limit, said truncatable part comprising that part of every such word which is in excess of the predetermined minimum truncated word length limit; and
truncating abbreviatable words to the predetermined minimum truncated word length limit, starting from the right end of the text, while the length of the text is in excess of the predetermined abbreviated text length limit.
16. An abbreviated text generated by employing the method in claim 15.
17. The method of claim 1 wherein the selecting step includes at least one of:
classifying a word as a non-abbreviatable word, if said word is found in a predetermined list of words barred from abbreviation; and
classifying a word as a non-abbreviatable word, if said word is an acronym or a word abbreviation appearing in a predetermined list.
18. The method of claim 1 further comprising dealing with predetermined delimited segments in an exceptional manner, where said dealing includes at least one of:
a) abbreviating only the delimited segment containing an abstract after deleting the rest of the text;
b) protecting the delimited segment from abbreviation;
c) prioritizing deletion of the delimited segment before abbreviating the rest of the text; and
d) prioritizing truncation of the delimited segment before truncating the rest of the text.
19. An abbreviated text generated by employing the method in claim 18.
20. The method of claim 1 wherein the unit of measure for the predetermined abbreviated text length limit and for the length of the text is either a monospaced character or a unit of measure suitable for measuring proportionally spaced text.
21. A computer-readable medium embodying the method in one of claims 1-3, 4-6, 7, 8-12, 13, 15-18, 20.
22. An abbreviated text generated by employing the method in claim 1.
23. The method of claim 1 wherein the shortening step (b) is executed irrespective of the predetermined abbreviated text length limit, the truncating step (c) is not executed and the abbreviated text is split into two or more lines each not exceeding the predetermined abbreviated text length limit.
24. A computer system for abbreviating text to cope with display or print space constraint such that loss of word recognizability is minimized, wherein said text includes a plurality of words and said system comprises:
a) means for selecting one or more words from the text as being abbreviatable words;
b) means for shortening only those abbreviatable words whose length exceeds a predetermined minimum word length limit while the length of the text is in excess of a predetermined abbreviated text length limit, said shortening means comprising at least one of:
(i) means for replacing a sequence of alphabets in any abbreviatable word with a shorter sequence, wherein said sequence of alphabets does not include the initial of the abbreviatable word; and
(ii) means for deleting one or more alphabets from any abbreviatable word such that the initial of the abbreviatable word is excluded from deletion;
c) means for truncating only those abbreviatable words whose length exceeds a predetermined minimum truncated word length limit while the length of the text is in excess of the predetermined abbreviated text length limit; and
d) means for controlling abbreviation of the text, said means comprising one or more predetermined abbreviation data lists, abbreviation options and abbreviation control parameters.
25. The system of claim 24 further comprising at least one of:
means for replacing in the text a sequence of words, being a phrase, with its corresponding commonly used acronym, if an entry containing the phrase and the acronym is found in a predetermined list; and
means for replacing in the text an abbreviatable word with its corresponding commonly used word abbreviation, if an entry containing the abbreviatable word and the word abbreviation is found in a predetermined list.
26. The system of claim 24 further comprising at least one of:
means for replacing in the text a sequence of words, being a phrase, with its corresponding acronym, if an entry containing the phrase and the acronym categorized as less commonly used is found in a predetermined list and the length of the text is greater than the predetermined abbreviated text length limit even after shortening the abbreviatable words in the text; and
means for replacing in the text an abbreviatable word with its corresponding word abbreviation, if an entry containing said abbreviatable word and said word abbreviation categorized as less commonly used is found in a predetermined list, such replacement yields greater reduction than the reduction that is obtained by shortening said abbreviatable word and the length of the text is greater than the predetermined abbreviated text length limit even after shortening the abbreviatable words in the text.
27. The system of claim 24 further comprising means for converting a continuous sequence of at least two enumeration words in the text, of which at least one of the enumeration words is a bundle word, into a shorter sequence using a predetermined list of enumeration words and corresponding abbreviations, wherein said bundle word is a single enumeration word which connotes a value greater than hundred and said shorter sequence contains at least four numeric characters.
28. The system of claim 27 further comprising means as in means (a) and at least one of means (b), (c), and (d):
a) means for replacing bundle word abbreviations with predetermined corresponding figures, each said corresponding figure comprising either a punctuation character or a sequence of one or more punctuation characters and numeric zero(s);
b) means for inserting into the sequence of converted figures a numeric one;
c) means for locating and deleting occurrences of superfluous numeric zero(s), if any, from the sequence of converted figures; and
d) means for inserting into the sequence of converted enumeration words one or more numeric zero(s).
29. The system of claim 24 wherein the controlling means includes means for dealing with predetermined delimited segments in an exceptional manner, where said dealing means includes at least one of:
a) means for abbreviating only the delimited segment containing an abstract after deleting the rest of the text;
b) means for protecting the delimited segment from abbreviation;
c) means for prioritizing deletion of the delimited segment before abbreviating the rest of the text; and
d) means for prioritizing truncation of the delimited segment before truncating the rest of the text.
30. The system of claim 24 wherein the controlling means includes means for determining the points of separation while abbreviating text into predetermined plural number of rows of predetermined row widths.
31. The system of claim 30 wherein the determining means includes means for ensuring that said points of separation are determined such that at least one of the following conditions are satisfied:
a) each separated portion of the text when abbreviated results in uniform reduction, with the length reduction within each row after separation bearing approximately the same proportion to the sum of the excess length of every abbreviatable word over a predetermined minimum word length limit;
b) unutilized blank spaces are minimized in each of the abbreviated separate rows;
c) splitting of words between rows is minimized; and
d) words or sequences of characters which are too long to be accommodated at the end of any row and which will cause unutilized space in the row if accommodated in the next row, are split between the rows such that each split portion has at least two characters.
32. The system of claim 24 wherein the controlling means includes a file which facilitates abbreviation of the text by holding words or sequences obtained from the text along with an indication for every word that is abbreviatable.
33. The system of claim 24 wherein the abbreviation data list means includes at least one of:
a) a list of at least one entry containing a word barred from abbreviation;
b) a list of at least one entry containing an enumeration word and its abbreviation;
c) a list of at least one entry containing a phrase and its commonly used acronym;
d) a list of at least one entry containing a word and its commonly used word abbreviation;
e) a list of at least one entry containing a phrase and its less commonly used acronym; and
f) a list of at least one entry containing a sequence of alphabets and its shorter sequence for replacement in a word.
34. The system of claim 24 wherein the abbreviation option means includes at least one of:
a) an option for prioritized deletion or truncation of a delimited segment in the text;
b) an option for protection of a delimited segment from abbreviation in the text;
c) an option for barring predetermined words from abbreviation in the text;
d) an option for compulsorily replacing a phrase with its commonly used acronym in the text;
e) an option for compulsorily replacing a word with its commonly used abbreviation in the text;
f) an option for abbreviating an enumeration word sequence into a sequence containing at least one numeric character in the text;
g) an option for replacing an ending sequence of alphabets in a word with a shorter sequence;
h) an option for replacing an intervening sequence of alphabets in a word with a shorter sequence;
i) an option for replacing a sequence of a contiguously repeating consonant in a word with one such consonant;
j) an option for deleting a less significant alphabet in a word;
k) an option for need based replacement of a phrase with its less commonly used acronym in the text;
l) an option for truncating a word in the text; and
m) an option for final truncation of the text.
35. The system of claim 24 wherein the abbreviation control parameter means includes at least one of:
a) a group of one or more punctuations for deletion in the text;
b) a group of one or more less significant alphabets for deletion in a word;
c) a group of one or more non-deletable symbols;
d) a minimum word length limit;
e) a minimum truncated word length limit;
f) an abbreviated text length limit;
g) a separated row output width value; and
h) a number of separated output rows value.
36. A method for abbreviating text to fit into a display or print space constraint in computer software such that loss of word recognizability is minimized, wherein said text includes a plurality of words, said display or print space constraint comprises a predetermined plural number of rows of predetermined row widths and said method comprises the steps of:
a) selecting one or more words from the text as being abbreviatable words;
b) replacing in the text a sequence of words comprising a phrase with its corresponding acronym, if an entry containing the phrase and its corresponding acronym is found in a predetermined list;
c) after replacing phrases with corresponding acronyms as described in step (b), separating the text into at least two row strings such that the number of said row strings does not exceed the predetermined plural number of rows and each said row string is associated with its corresponding predetermined row width;
d) in any row string, shortening only those abbreviatable words whose length exceeds a predetermined minimum word length limit while the length of said row string is in excess of its corresponding predetermined row width, said shortening comprising at least one of:
(i) replacing a sequence of alphabets in any abbreviatable word with a shorter sequence, wherein said sequence of alphabets does not include the initial of the abbreviatable word; and
(ii) deleting one or more alphabets from any abbreviatable word, but excluding from deletion the initial of the abbreviatable word; and
e) in any row string, truncating only those abbreviatable words whose length exceeds a predetermined minimum truncated word length limit while the length of said row string is in excess of its corresponding predetermined row width.
37. The method of claim 36 further comprising converting a continuous sequence of at least two enumeration words in the text, of which at least one of the enumeration words is a bundle word, into a shorter sequence using a predetermined list of enumeration words and corresponding abbreviations, wherein said bundle word comprises a single enumeration word which connotes a value greater than hundred and said shorter sequence contains at least four numeric characters.
38. The method of claim 37 further comprising step (a) and at least one of steps (b), (c), and (d):
a) replacing bundle word abbreviations with predetermined corresponding figures, each said corresponding figure comprising either a punctuation character or a sequence of one or more punctuation characters and numeric zero(s);
b) inserting into the sequence of converted figures a numeric one;
c) locating and deleting occurrences of superfluous numeric zero(s), if any, from the sequence of converted figures; and
d) inserting into the sequence of converted enumeration words one or more numeric zero(s).
39. The method of claim 36 further comprising truncating any row string finally, starting from the right end, until said row string is reduced to its corresponding predetermined row width or the entire row string is dealt with, but excluding from truncation the initial alphabet of any word and at least one of:
a) any numeric character or decimal point;
b) any character contained in a predetermined set of non-deletable symbols; and
c) predetermined protected segments.
40. The method of claim 36 wherein the abbreviatable words on which the abbreviating steps are carried out include partially abbreviated words.
41. The method of claim 36 wherein the length of any word is the number of characters in the word.
42. The method of claim 36 wherein the selecting step includes:
locating sequences of one or more contiguous alphabets preceded by a space, punctuation or beginning of text and followed by a space, punctuation or end of text and recognizing such sequences as words; and
locating words containing at least two alphabets and no upper case alphabets other than the first alphabet and classifying such words as abbreviatable words.
43. The method of claim 36 wherein the replacing step comprises at least one of:
replacing a contiguous sequence of alphabets in any abbreviatable word with a shorter sequence of at least one alphabet, if an entry containing said contiguous sequence of alphabets and its corresponding shorter sequence is found in a predetermined list; and
replacing a sequence comprising a contiguously repeating consonant in any abbreviatable word with a shorter sequence of only one such consonant.
44. The method of claim 43 wherein the replaced shorter sequence is identified so that said shorter sequence is not further shortened using the shortening step subsequently.
45. The method of claim 36 wherein the deleting step comprises deleting a contiguous sequence of one or more vowels from any abbreviatable word, provided said contiguous sequence is deleted entirely and the length of said abbreviatable word after deleting said contiguous sequence would not become less than the predetermined minimum word length limit.
46. The method of claim 36 wherein the truncating step includes at least one of:
truncating only the truncatable part of every abbreviatable word in an approximately equal proportion such that the row string is reduced to its corresponding predetermined row width, said truncatable part comprising that part of every such word which is in excess of the predetermined minimum truncated word length limit; and
truncating abbreviatable words to the predetermined minimum truncated word length limit, starting from the right end of the row string, until the row string is reduced to its corresponding predetermined row width.
47. The method of claim 36 wherein the selecting step includes at least one of:
classifying a word as a non-abbreviatable word, if said word is found in a predetermined list of words barred from abbreviation; and
classifying a word as a non-abbreviatable word, if said word is an acronym or a word abbreviation appearing in a predetermined list.
48. The method of claim 36 further comprising dealing with predetermined delimited segments in an exceptional manner, where said dealing includes at least one of:
a) abbreviating only the delimited segment containing an abstract after deleting the rest of the text;
b) protecting the delimited segment from abbreviation;
c) prioritizing deletion of the delimited segment before abbreviating the rest of the text; and
d) prioritizing truncation of the delimited segment.
49. The method of claim 36 wherein the separating step (c) further comprises:
ca) in the text which has to be separated into row strings, selecting a word for splitting into two split portions;
cb) shortening the selected word using the shortening step (d) in said claim 36, if the selected word is an abbreviatable word;
cc) splitting the selected word such that each split portion has at least two characters;
cd) separating the text into at least two row strings such that one of the row strings ends with the first split portion and the next row string begins with the second split portion; and
cd) identifying each split portion to prevent further shortening.
50. The method of claim 36 wherein the unit of measure for the predetermined row widths and for the length of the row strings is either a monospaced character or a unit of measure suitable for measuring proportionally spaced text.
51. An abbreviated text generated by employing the method in claim 36.
Description
FIELD OF THE INVENTION
This invention relates to a method and system for abbreviating text to predetermined or undefined length through user controlled and selective methods such as deletion of alphabets in words, replacement of sequences of alphabets in words with representative shorter sequences, replacement of phrases and words with acronyms and word abbreviations respectively and truncation of words and text to make up for the spatial limitations of the display screen or the printed page within any computer software. The methods are effective for language scripts which use capital letters apart from lower case and separate alphabets for consonants and vowels.
BACKGROUND OF THE INVENTION
Human beings have devised words in script form for representing the contents of their vocal communication and intellectual pursuits. The computer with its binary code can hold, process and reproduce information in audio-visual form avoiding the written word. Audio-visual form may be more convenient than the written word for many purposes. However, the written word may not yet be avoidable altogether and may well be more practical in many situations.
Newspapers and periodicals continue to be popular, though with crowding of information, type sizes tend to be reduced. Reading fine print is strainful to the eyes, especially within transportation systems which are not vibration free or with advancing age.
In the world of commerce, industry, business, management and other professions there is an increasing tendency to tabulate and present information in sets of predesigned forms. A tabulated display of text (including numeric values) on screen, unlike a serial replay of voice file or audio-video recording, allows the user to skim across the display screen at his or her own pace to spot, read and comprehend portions in isolation or to read related portions back and forth recurrently for overall comprehension without changing the display. Forms essentially entail demarcation of columns or rows to predetermined sizes--e.g., in spreadsheets, database files or other application packages. Accommodation of text strings of varying length into predetermined columns or rows of fixed length is problematic. Some solutions offered in computer software are:
a) manual editing for abbreviation,
b) change or adjustment of column width or row height and
c) synonym search and replacement with any shorter synonym.
These solutions require user interference with discretion and the results may not be uniform, at each occurrence of the same problem.
There is increasing use of computers for word processing and for a variety of other applications with precise and consistent fonts. The miniaturization of computers is leading to hand-held personal computers packed with tremendous inbuilt or accessible computing power and a variety of software applications with stored data apart from direct and instant access to the information highway. However, the display unit cannot be subjected to unlimited miniaturization due to the physical limitations of the human eye in reading text or graphics. The display space is proving to be a serious constraint; and methods apart from miniaturization need to be found to overcome the display unit constraint.
Conventional methods and prior art which are being used to accommodate more text in display or print include:
a) use of glossaries for replacement of words or phrases with word abbreviations or acronyms,
b) deletion of blank spaces separating words (in excess of one),
c) deletion of all blank spaces separating any two words in a line, after capitalizing the initial of the second word,
d) deletion of blank space(s) around punctuation characters,
e) deletion of all vowels from word,
f) deletion of all vowels from word, excluding the first character,
g) truncation of word or text string,
h) reduction of space between lines of text,
i) finer crafting of fonts, using proportional spacing,
j) compression, size reduction or congesting of characters and
k) vertical or horizontal scrolling of text interactively (in display).
U.S. Pat. No. 5,691,708 includes an abbreviation command, controlled by five parameters, used prior to placement of text message in buffer for abstraction. The first parameter allows use of word abbreviation or acronyms from an abbreviation text file which is a common practice. There are no control features to prioritize acronym replacement over word abbreviation replacement, to prioritize commonly used acronyms or word abbreviations over those which are less commonly used and to use less commonly used acronyms or word abbreviations only if other methods do not yield the desired reduction. The second and the third parameters allow deletion of all vowels from words excluding or including the first characters. There are no control features to ensure that deletion of vowels from words does not render them unrecognizable, nor to allow the user to be selective as to which vowels or other less significant alphabets are open for deletion.
U.S. Pat. No. 4,486,857 is a "Display System For The Suppression And Regeneration Of Characters In A Series Of Fields In A Stored Record". "Suppression" comprises the methods of vowel deletion and truncation. There are no control features to ensure that the use of these methods does not render the contents of the fields unrecognizable.
Certain rules for development of abbreviations as speedy inputs to computers to obtain the full text are contained in U.S. Pat. Nos. 5,623,406, 5,305,205, 4,969,097 and 4,760,528. But these abbreviation rules are mechanistic and suitable only for computer processing and not for easy recognition by the users.
The method of expansion and resizing of data fields in forms as contained in U.S. Pat. No. 5,450,538 may not always be practicable or convenient.
U.S. Pat. No. 5,231,579 covers methods of compression, size reduction or congesting of characters; and these methods are strainful to the eyes.
Modern word processors with finely crafted fonts and using proportional spacing have fairly exhausted further scope for compacting of screen fonts and printer fonts.
Unlike the optical faculty which cannot be stretched beyond a point, the intellectual faculty to associate symbols or words with concepts, to interpret occurrences of words according to context and to recognize words in abbreviated forms can be cultivated almost without bounds. Such cultivation, training or practice through conscious and deliberate effort results in accrual to subconscious (and hence effortless) competencies.
Word abbreviations are recognized by common usage and repetitive association with the original words. A reader or writer is capable of associating printed or written symbols with spoken sounds. A listener is capable of associating spoken sounds with the objects, processes and concepts they represent. A silent reader is capable of directly associating printed or written symbols with the objects, processes or concepts they represent.
The spoken word is often a combination of several sounds. In many written languages each alphabet represents a single basic sound--though in English some alphabets--e.g., c, g, h, n, r and the vowels--are pronounced differently or are silent depending on their context. Phonetically all sounds are not equally significant and it is possible to classify each alphabet based on its usual phonetic significance. This would provide a criterion for prioritizing deletion of less significant alphabets from within words for progressive abbreviation with minimal loss of phonetic content. Such a criterion together with other complementary criteria can provide an alternative of automated phonetic abbreviation to the commonly used word or phrase abbreviation which may not necessarily be phonetic abbreviation.
Phonetic abbreviations would be quite convenient to users, when commonly used acronyms or word abbreviations are not well established or are not known to the users. By and large, only a few of all the words in any language have commonly used abbreviations; and it is necessary to devise alternate methods of word abbreviation for wider application.
Consequently, there is a clear and urgent need:
a) to devise phonetic abbreviation criteria, rules and methods to be used in preference to or in addition to the conventional or known abbreviation methods,
b) to devise fine controls for abbreviation methods including for conventional or known abbreviation methods, and
c) to allow the end user to make intelligent and optimal use of these methods and controls in accordance with personal or knowledge domain specific preferences, without requiring any programming skills. The preferences may be as regards predefinition of abbreviation database, choice of abbreviation options and control parameters and delimitation of segments for special handling. Each individual user should be able to instantly abbreviate text from any source entirely in accordance with his or her own personal preferences.
SUMMARY OF THE INVENTION
It is therefore an object of the invention to provide a comprehensive set of fully automated methods for abbreviation of text in any computer software.
Another object of the invention is to provide fine controls for the methods of the invention through user editable means. For example, the user editable means include abbreviation data lists, abbreviation options and abbreviation control parameters. Sets of the user editable means may be stored in data files so that appropriate sets may be recurrently and readily used in a variety of software applications in accordance with the context--viz. the language and or subject of the text, structure or length of text and space constraints within which the abbreviated text is to be placed.
An applications design related object of the invention is to provide for a versatile abbreviation function which can be used for instant abbreviation of any addressable text (entered through keyboard, voice recognition input device or other input device) for placement within the space constraints of any single or multiple row field with minimal loss of phonetic content and without splitting words between rows except to minimize word truncation.
An overall object of the invention is to provide maximum optical facility by abbreviating text and enabling use of larger types or precluding the use of smaller types, in display and print.
Another overall object of the invention is to accommodate more text by abbreviation in the available display space, thus overcoming the display unit constraint in computers and hand-held devices.
The abbreviation methods in this invention include the following steps:
1. selecting one or more abbreviatable words from the text,
2. prioritizing replacement of commonly used acronyms and word abbreviations over less commonly used acronyms and word abbreviations,
3. using the less commonly used acronyms and word abbreviations only if the other abbreviation methods do not yield the required reduction,
4. converting sequences of enumeration words in the text into sequences comprising numeric characters and punctuations,
5. replacing a sequence of alphabets in any abbreviatable word with a corresponding shorter sequence,
6. deleting one or more alphabets from any abbreviatable word,
7. checking length of abbreviatable words to ensure that abbreviatable words with length greater than a predetermined minimum word length limit are subject to abbreviation,
8. truncating abbreviatable words and text, if necessary,
9. dealing with pre-defined delimited segments in an exceptional manner, for example:
a) abbreviating only the delimited segment containing an abstract after deleting the rest of the text,
b) protecting the delimited segment from abbreviation,
c) prioritizing deletion of the delimited segment before abbreviating the rest of the text, and
d) prioritizing truncation of the delimited segment before truncating the rest of the text,
10. determining the points of separation while abbreviating text into predetermined number of rows of predetermined row width, if the points of separation have not been pre-defined before abbreviating text,
11. controlling abbreviation of text in accordance with abbreviation control parameters,
The abbreviation means used in this invention include the following:
1. abbreviation data list means:
a) a list of words barred from abbreviation,
b) a list of enumeration words and their abbreviations,
c) a list of phrases and their commonly used acronyms,
d) a list of words and their commonly used word abbreviations,
e) a list of phrases and their less commonly used acronyms, and
f) a list of alphabet sequences and their shorter sequence for replacement in words,
2. abbreviation option means:
a) an option for prioritized deletion or truncation of delimited segments in the text,
b) an option for protection of delimited segments from abbreviation in the text,
c) an option for barring pre-defined words from abbreviation in the text,
d) an option for compulsorily replacing phrases with their commonly used acronyms in the text,
e) an option for compulsorily replacing words with their commonly used abbreviations in the text,
f) an option for abbreviating enumeration word sequences into sequences comprising numeric characters and punctuations in the text,
g) an option for replacing ending sequences of alphabets in words with shorter sequences,
h) an option for replacing intervening sequences of alphabets in words with shorter sequences,
i) an option for replacing sequences of a contiguously repeating consonant in words with one such consonant,
j) an option for deleting less significant alphabets in words,
k) an option for need based replacement of phrases with their less commonly used acronyms in the text,
l) an option for truncating words in the text, and
m) an option for final truncation of the text,
3. abbreviation control parameter means:
a) a group of punctuations for deletion in the text,
b) a group of less significant alphabets for deletion in words,
c) a group of non-deletable symbols,
d) a minimum word length limit,
e) a minimum truncated word length limit,
f) an abbreviated text length limit,
g) a separated row output width value, and
h) a number of separated output rows value.
4. enumeration words conversion means:
A system for converting any continuous sequence of enumeration words in any text into a sequence containing numeric characters comprising:
a) means for replacing enumeration words with their corresponding abbreviations,
b) means for handling variations in style of expressing enumeration words sequences,
c) means for obtaining valid converted sequence suitable for arithmetic manipulation,
d) means for inserting into the converted sequence punctuation characters if required,
e) means for inserting into the converted sequence one or more numeric characters representing zero if required,
f) means for inserting into the converted sequence numeric character representing one if required, and
g) means for deleting occurrences of connecting word abbreviation such as "and" from the converted sequence, if superfluous.
How the Objects are Achieved
Phonetic abbreviation is achieved by selective deletion of blank spaces (after capitalizing the initials of words), deletion of pre-defined insignificant non-alphabet characters, replacement of sequences of alphabets within words with representative shorter sequences, deletion of pre-defined alphabets considered to be less significant for word recognition--i.e., phonetically or optically. Phonetic abbreviation results in minimal loss of phonetic content, saves display space and enables use of larger type sizes maximizing optical facility. The result is: maximum optical facility added concised script (abbreviated as Mofacs).
This invention is suitable for abbreviation of:
a) Text, comprising a string, into abbreviated text string of predetermined or undefined abbreviated text length.
b) Text, comprising a string delimited into several portions with user supplied row separator(s) (i.e., a unique delimitation character such as a vertical bar), into an abbreviated text string comprising several rows of predetermined equal length.
c) Text, comprising a string without any user supplied row separator(s), into an abbreviated string comprising predetermined number of rows of predetermined equal length, without splitting of words between rows except to minimize word truncation.
d) Text, comprising long multiple line textual matter (e.g., newspaper reports, essays, speeches and the like), into abbreviated text of predetermined width.
In the preferred embodiment of the invention described hereinafter, multiple line text is read from an ASCII text file and abbreviated output text is written to an ASCII text file. However the users of this invention may read multiple line text from any other type of file or from a memo field and abbreviated output text may be written to any other type of file or to a memo field.
In the preferred embodiment, abbreviation methods are always carried out on text strings. If a long multiple line text is input for abbreviation, smaller strings are picked up from the long text in sequence and abbreviated one at a time. Hereinafter, the text which is to be abbreviated is referred. to as either "text" or "text string".
The methods of acronym and word abbreviation replacement, deletion of alphabets (generally vowels) and truncation are known methods as outlined in the Background Of The Invention. In this invention, these methods are improved as explained below:
1. Improved acronym and word abbreviation replacement method:
This invention has two types of acronyms and word abbreviations namely, commonly used acronyms and word abbreviations and less commonly used acronyms and word abbreviations.
Replacement of commonly used acronyms and word abbreviation is compulsory and is prioritized before replacement of less commonly used acronyms and word abbreviations.
Replacement of less commonly used acronyms and word abbreviation is done only if necessary and if the other abbreviation methods yield lesser reduction.
2. Improved alphabets deletion method:
In this invention deletion of alphabets from words is subject to a minimum word length limit. Alphabets are not deleted from words if the word length does not exceed the minimum word length limit. Because of this control feature there is minimal loss of word recognition facility.
3. Improved truncation methods:
In this invention truncating methods are executed in stages in a controlled manner to minimize loss of word recognition facility. In the earlier stage words are truncated only if the word length exceeds a minimum truncated word length limit. In the later stage of truncation of text from the right end, the initials of words, numeric characters, decimal point, pre-defined non-deletable symbols and pre-defined protected segments are not truncated.
The prioritized and selective methods of the invention pertain to five broad groups:
1. Delimitation of segments with unique characters for special handling--namely:
a) Identifying a segment of a text string as being an intellectual abstract of the rest of the text string, so that the abstract may be abbreviated if the text string in itself or excluding the delimited segment cannot be abbreviated to the desired output length limit without resorting to word truncation.
Hereinafter, the desired output length limit is also referred to as the abbreviated text length limit.
b) Prioritizing segment(s) in a text string for deletion or truncation before abbreviation of the rest of the text string.
c) Protecting segment(s) in a text string from abbreviation and truncation until the final truncation of the text string.
2. Phonetic abbreviation methods:
a) Deleting blank spaces and pre-defined non-alphabet characters having no phonetic content.
b) Shortening of words with minimal loss of phonetic content by:
i) replacement of frequently occurring sequences of lower-case alphabets with representative shorter sequences,
ii) replacing occurrences of contiguously repeating consonants with one such consonant--repeating consonants being largely redundant phonetically,
iii) deletion of less significant alphabets in accordance with item 10 of the section entitled, "Logical criteria for abbreviation of text string or text file", presented later herein,
subject to a predetermined minimum word length limit.
3. Enumeration words conversion methods:
Converting enumeration words sequence to a sequence, comprising numeric digits and punctuations, without loss of phonetic content, the numeric sequence being a phonetic equivalent (e.g., `One Thousand` and `1,000` are both pronounced identically).
4. Abbreviation replacement methods:
after searching separate glossaries for:
a) commonly used phrases and corresponding acronyms,
b) commonly used words and corresponding word abbreviations,
c) less commonly used phrases and corresponding acronyms, and
d) less commonly used words and corresponding word abbreviations,
Generally, the following rules are observed:
a) Acronym replacement is prioritized over word abbreviation replacement.
b) Commonly used phrases and words are replaced compulsorily before phonetic shortening methods.
c) Less commonly used phrases and words are replaced before resorting to truncation methods, but after exhausting phonetic shortening methods.
d) While commonly used acronyms and word abbreviations, if opted for, will compulsorily replace corresponding phrases and words, the less commonly used acronyms and word abbreviations, if opted for, will replace corresponding phrases and words only if needed--i.e., only if the text cannot be reduced to desired output length limit without use of these acronyms and word abbreviations.
5. Truncation methods:
optionally deleting characters from word, abbreviated word or any sequence of characters.
The options include:
a) In personal name text string:
i) optional deletion of title word,
ii) truncation of all abbreviatable words (ignoring undeleted title word, if any), except the first word, to bare initials and
iii) truncation of the first word (ignoring undeleted title word, if any) from the right end to a predetermined minimum length.
b) In segment(s) of text string prioritized for deletion or truncation using pre-defined delimitation characters:
i) deletion of each prioritized segment, starting from the right end until the desired output length limit is reached or all the segments are dealt with,
ii) truncation of abbreviatable or shortened abbreviatable word to bare initials, starting from the right end until the desired output length limit is reached or all the words are dealt with or
iii) truncation of abbreviatable or shortened abbreviatable word to a predetermined minimum truncated word length limit, starting from the right end until the desired output length limit is reached or all the words are dealt with.
c) In text string (excluding prioritized segments):
i) truncation of abbreviatable or shortened abbreviatable word, such that a uniform proportion of the length of word which is in excess of the predetermined minimum truncated word length limit is deleted starting from the right end until the desired output length limit is reached or all the words are dealt with or
ii) truncation of abbreviatable or shortened abbreviatable word to a predetermined minimum truncated word length limit, starting from the right end until the desired output length limit is reached or all the words are dealt with.
d) In text string for final truncation: truncation of the text string, starting from the right end, but excluding:
i) bare initial of each word,
ii) pre-defined non-deletable symbols,
iii) numeric digit,
iv) decimal point and
v) segment protected from abbreviation and truncation by delimitation with any pair of pre-defined unique characters (if so opted),
until the desired output length limit is reached or all the words (or basic elements) are dealt with.
For abbreviation of text comprising of the input text string, all the opted methods are used in sequence, but the phonetic shortening and need based abbreviation replacement methods are stopped as soon as the predetermined desired output length limit is reached. The truncation methods are used last, if and to the extent required.
As a special option, a single row text string may be processed using all the opted methods, barring truncation and without any length limit if the desired output length limit is passed as zero (i.e., undefined). Thus, phonetic shortening options and need based abbreviation replacement options are fully exhausted but truncation methods are avoided altogether.
For abbreviation of multiple line text, compulsory abbreviation replacement methods are used first, if opted, followed by enumeration words conversion and all the opted word shortening methods. This is followed by need based abbreviation replacement methods, if opted and if these provide greater shortening. Punctuation deletion methods and truncation methods are generally not used while abbreviating multiple line text.
Generally, the punctuation deletion parameter is validated with reference to a system fixed comprehensive group of punctuations--e.g., ! ; ' .backslash. , _ : " ?. The less significant alphabet deletion parameter consists of lower-case alphabets for deletion and is validated with reference to a system fixed comprehensive group of low case alphabets, as appropriate to each input language. The truncation methods are controlled with a non-deletable symbols parameter. Generally, the non-deletable symbols parameter is validated with reference to a system fixed comprehensive group of symbols--e.g., @ # $ % + - .backslash.. This comprehensive group of symbols and the comprehensive group of punctuations, mentioned hereinbefore, are generally mutually exclusive. The other parameters of the function also control the abbreviation methods in several ways, as can be seen from the detailed description of the methods of the preferred embodiment hereinafter.
The delimited segments special handling, phonetic shortening, abbreviation replacement, enumeration words conversion and truncation methods of this invention are fully automated. The text for abbreviation may be accessed from addressable fields in databases, spreadsheets or other applications or from text or other files (or memo fields). The text may also be obtained by keyboard inputs or through special devices such as a voice recognition (to written or printed word) system. If the methods are used as a function, the abbreviated text string is returned for placement within any desired field in database, spreadsheet or other applications or the abbreviated text is appended to a text or other file. Generally, the names of the source and output files are defined and included in the parameter list.
In the preferred embodiment, repetitive use of the Abbreviate function is facilitated by predefining sets of choices of:
a) user created abbreviation data file version,
b) control options and
c) other control parameters
preferably into a data file.
Consistently abbreviated results will be obtained if the same pre-defined set of choices is used. However, appropriate sets of choices may have to be carefully pre-defined and chosen to optimize the results of the methods, in tune with the language of text abbreviated, personal preferences and specific knowledge domain. With its several optional features the abbreviate function as it applies to text files can be a useful component of any word processing application.
Uses of the Invention
The abbreviated text obtained using the methods and means of this invention may be used to overcome display space constraint or for greater optical facility (with use of larger types) in computers.
Busy officials and business executives may develop a preference for internal reports in Mofacs (maximum optical facility added concised script) for fast personal reading.
After the abbreviation data file and parameters list have been defined or determined and the abbreviation function (i.e., this invention) is called, the text is abbreviated in a fully automated manner without user intervention.
This invention can be used for many language scripts apart from English.
Some of the uses of the invention are as follows:
1. In computer screens:
In computer applications--e.g., spreadsheet package, database package, database management system (DBMS) or any other standard or customized application--screen form layout entails demarcation of columns and rows to predetermined sizes. Certain columns or rows in the layout contain fixed information (i.e., names or titles of items of information, but not the information itself) with which users develop familiarity. The invention can help to reduce the area allocated for such fixed information, thus saving space for variable information, which in fact is the subject matter for careful, selective and focused reading. The variable information also can be automatically abbreviated to predetermined field widths.
The methods of the invention may be used in menu bars, pull-down menus, windows for displaying text and dialog boxes to cope with display space constraint.
The several methods of the invention yield a wide range of reduction upto about 70%, if required, without any manual intervention. Horizontal or vertical scrolling and compressed printing of oversized forms is avoided. The abbreviated text can fit into varying field widths in different forms, though the original information elements are sourced every time from a commonly used unabbreviated data file.
The function format of the invention, with a comprehensive parameters list supported with a database of abbreviation rules with pertinent data and the provision for choice of pre-defined parameter sets ensures consistent results. The control panel showing combination of parameters list used offers total control to the user with complete transparency. The user can fine-tune his or her choices with experience and preserve the preferred parameter sets for future use on textual information obtained or downloaded from any source.
The methods of the invention in general and the conversion of enumeration words sequence to number in particular are quite suited to voice recognition input methods in database, spreadsheet, word processing or other application programs, if and when such input methods are generally accepted as practical. Inputs to numeric and other data fields through keyboard would normally involve the use of numeric digit keys. However, with voice recognition input systems the input capture may be in words form and such enumeration words sequences can be instantly converted to numeric characters using the methods of this invention.
2. In Web sites:
The methods of the invention can be used in Web sites, so that visitors are able to read the textual information either in original form or in abbreviated versions.
3. In newspaper columns:
Senior citizens (and perhaps the readership at large) may develop a preference for abbreviated text in newspaper columns, provided the abbreviation database version, control options and control parameters are carefully fine-tuned and consistently used. Sections in newspapers which are specially devoted to such readers can be produced with abbreviated text in larger type size, within the space constraints.
In columns reporting market quotations, the names of companies or items quoted can be abbreviated. The abbreviation options, pertinent database and control parameters, if adopted uniformly and consistently in reporting business performances, would facilitate focused reading by busy investors and executives.
Classified advertisements with abbreviated text within newspaper columns may be more economical and yet readable.
4. In pagers:
Pager being a tiny portable device, has a tinier panel for message display. Though the pager may not have the computing facility to abbreviate messages on-line, the methods of the invention may be quite feasible for pagers also. The messages can be abbreviated at a central computing facility before transmission to any pager. Use of appropriate versions of abbreviation rules and pertinent data in a central database will ensure consistency of abbreviated text. Task relevant acronyms and word abbreviations may be adopted for common use and uniform communication.
5. In control panels:
Control panels are an essential requirement within aircrafts, vehicles, manufacturing and household equipment, control rooms and computer applications. Modern control panels include context specific messages for obtaining response to faults or errors. Abbreviated text may help to make the most of the space constraints on the panel.
6. In Television screens:
Often films are telecast with subtitles in a different language. The reading convenience to the viewer is inversely proportional to the speed of the character train or the number of display changes in a given time. The speed or number of display changes can be reduced in direct proportion to the reduction obtained with the abbreviated text.
7. In billboards:
Electronic billboards are installed at prominent places to be visible from large distances. The display includes character trains and flashes of advertisement text. Abbreviated text of the invention offers the same optical facility as in television screens.
8. In teleprompters:
Abbreviated text can be instantly produced from plain text for teleprompters, using abbreviation options, pertinent data and control parameters personally selected and fine-tuned by each speaker or reader. The facility for editing and storage of several versions of the database and parameter sets, with complete control and transperency to the user, is specially suited for this user segment.
9. In publication of books:
Use of abbreviated text in books printed with proportionally spaced types, with reduction potential of about 20-25%, may prove economical and may even be preferred by fast readers. The abbreviation options, pertinent database and control parameters can be selected and fine-tuned by the authors and publishers. Thereafter the production of the abbreviated text version of any book can be fully automated.
10. In electronic data bank, database, encyclopedia, dictionary, glossary etc:
Users may find a sort order of words, phrases or captions ignoring phonetically insignificant characters (such as vowels--except the initial of each word, contiguously repeating consonant, apostrophe, hyphen and intervening space(s) between words) more convenient for two reasons. Firstly, spelling errors in words entered for search are minimized. Secondly, the number of keystrokes for search is reduced.
This sort order implies that the producer of the data bank, database, encyclopedia, dictionary, glossary or such other data source has to provide for an additional sort key for the words, phrases or captions (ignoring phonetically insignificant characters); and the end user has to search for the word, phrase or captions after entering these (in full or preceding part) with the insignificant characters excluded. Additionally, (during word processing), a user may spell-check for the normal word or search for its meaning. In case no match is found the system may develop the abbreviated word, phrase or captions, search for it and if a match is found: show it in full form with the meaning.
11. In search engines:
Developers of search engines for information on the Internet, may provide for search routines using the abbreviated text sort order.
BRIEF DESCRIPTION OF THE DRAWINGS
A more complete understanding of the invention may be obtained by reading the following description in conjunction with the appended drawings in which like elements are labelled similarly and in which:
FIG. 1 is a block schematic diagram of a typical computer system including storage 012;
FIG. 2 is a block schematic diagram of the interconnections between the programs (including the abbreviation control data program 123 and the abbreviation function 127) and the data structures within the storage 012;
FIG. 3 is a block schematic diagram of the structure of the abbreviation control data program 123;
FIG. 4 is a block schematic diagram of the subroutines and the data structures comprising the abbreviation function 127;
FIG. 5 is a flow chart representation of the Main subroutine 1271 within abbreviation function 127 as it relates to abbreviation of text string or word;
FIG. 6 is a continuation of the flow chart representation from FIG. 5;
FIG. 7 is a flow chart representation of the Main subroutine 1271 within abbreviation function 127 as it relates to abbreviation of (multiple line) text, branching off from FIG. 5;
FIG. 8 is a continuation of the flow chart representation from FIG. 7;
FIG. 9 is a flow chart representation of the Shorten subroutine;
FIG. 10 is a continuation of the flow chart representation from FIG. 9;
FIGS. 11 to 16 are blow-by-blow listings of progressive abbreviation results from input text to abbreviated text line-by-line, with the corresponding control panel at the top of the listing. Each of these lines are assembled from the relevant memory variables or other data structures located in storage 012 to illustrate the status of abbreviation at each step. The input line is prefixed with `00` and the subsequent lines are prefixed with the corresponding Method-step numbers. The Method-steps are described in detail in the detailed description of the preferred embodiment hereinafter;
FIG. 11 is a blow-by-blow listing of progressive abbreviation for a single line text of undefined output length;
FIG. 12 is a blow-by-blow listing illustrating abbreviation of a single line text to a predetermined desired output length limit. Apart from other abbreviation methods FIG. 12 illustrates the use of less commonly used acronym and word abbreviation. These are used because they provide greater reduction than other abbreviation methods and because the length of the partially abbreviated text exceeds the predetermined abbreviated text length limit of 30;
FIG. 13 is a blow-by-blow listing illustrating abbreviation of a single line text to a predetermined desired output length limit including an abstract segment;
FIG. 14 is a blow-by-blow listing illustrating abbreviation of a single line text with undefined desired output length including an enumeration words sequence;
FIG. 15 is a blow-by-blow listing illustrating abbreviation of a string containing pre-defined row separators into multiple rows of predetermined equal width. The string also contains a protected segment and a prioritized segment;
FIG. 16 is a blow-by-blow listing illustrating abbreviation of a string containing no row separators into multiple rows of predetermined equal width. Row separators are placed in the string by the system using Separate subroutine;
FIG. 17 illustrates how the use of this invention leads to better utilization of available space on a display or while printing. In the upper part of the FIG. the row titles in the table are unabbreviated and therefore take up a lot of space. In the lower part of the FIG. the row titles in the table have been abbreviated and hence a lot of space is saved. This saved space is used to display more useful information. The control panel in the middle shows the various abbreviation options and parameters used;
FIG. 18 is an illustration of the upper table of FIG. 17 with the row titles transformed into abbreviated multiple row columnar titles, suitable for a database listing format, with the corresponding control panel placed at the top. The database may have (i) name of corporation, (ii) year and (iii) rank (Rk) number as sort keys, so that listings can be taken for each corporation for desired sequence of years or each year for desired corporations/ranks. The data in each database listing may be millions of dollars, growth percentage or proportion percentage for the listed corporations or years;
FIG. 19 is an illustration of typical unabbreviated monospace text file, followed by a typical control panel for abbreviation and the corresponding text after abbreviation. The illustration comprises entirely of phonetic shortening methods in preference to abbreviation replacement methods and also does not include the delimited segment methods. The truncation methods are totally avoided in multiple line text abbreviation;
FIG. 20 is an illustration of the typical unabbreviated text file of FIG. 19 converted to proportionally spaced type (maintaining the line length as in FIG. 19), followed by the abbreviated text of FIG. 19 converted to proportionally spaced type and further followed by the abbreviated text of FIG. 19 converted to proportionally spaced larger type for maximized optical facility within the space constraints of the unabbreviated version at the top of FIG. 20.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENT OF THE INVENTION
The finely controlled methods of the invention are enabled with a scheme of discriminating between and recognizing the several basic elements of the text. Clear definition of the several basic elements or segments of the text is therefore a prerequisite for the detailed description.
Usually, text file comprises several sentences grouped into paragraphs with or without title lines, indentation or blank line separators. Generally, a text may be an input text string or a text string picked up from a multiple line text file and comprises one or more of the following basic elements or segments:
a) Word: a character sequence comprising entirely of alphabets (or a single alphabet), with or without apostrophe, separated at both ends with space, punctuation, start of text string or end of text string.
b) Abbreviatable word: a word with at least two alphabets, the initial in upper case or lower case and all the other alphabets in lower case.
c) Non-abbreviatable word: a word which is,
i) a single alphabet or
ii) a word with at least one alphabet other than the initial in upper case or
iii) a word abbreviation or
iv) an acronym.
d) Phrase: comprising an abbreviatable group of words forming a conceptual unit or name of person or entity.
e) Non-alphabet sequence: any contiguous sequence of non-alphabet characters (or a single non-alphabet character).
f) String: a line comprising one or more word, hyphenated word, phrase, numeric digit, punctuation character, bracket character, other symbolic character or intervening blank space.
g) Bundle Word: Any single non-trivial numeric word which connotes a bundle greater than hundred--e.g., thousand, lakh, lac, million, crore, billion, trillion, quadrillion and the like.
h) Enumeration Word: Any single word which,
i) corresponds to any one or pair of numeric characters barring the word `zero`,
ii) connotes a bundle--e.g., thousand, lakh, lac, million, crore, billion etc.,
iii) connects other enumeration words--e.g., `and`, or
iv) is a derivative of any other enumeration word--e.g., first, second, . . . thousandth, millionth etc. ]
i) Abstract segment: any sequence of characters within a text string, parenthesized with a pair of pre-defined unique characters--e.g., curly brackets--containing an intellectually concised abstraction (comprising one or more words--preferably abbreviatable) of the rest of the string and suffixed to or included in that string.
j) Prioritized segment: any sequence of characters (or a single character) within text string or text file, parenthesized with a pair of pre-defined unique characters--e.g., round brackets--to signal that the sequence is prioritized for deletion or truncation.
k) Protected segment: any sequence of characters (or a single character) within text string or text file, parenthesized with a pair of pre-defined unique characters--e.g., square brackets--to signal that the sequence is protected from abbreviation.
Logical Criteria for Abbreviation of Text String or Text File
A description of the inventors' understanding of a set of sequenced criteria on which the invention is broadly based follows:
1. Abbreviation of text is essentially a logical, prioritized and selective process of deletion of non-text matter such as blank spaces, deletion of less significant non-alphabet characters, replacement of words or word combinations with abbreviations, replacement of alphabet sequences from within words with representative shorter sequences, deletion of less significant alphabets from within words and truncation of long words from the right end.
2. Leading spaces used as word separators may be deleted after converting the initials of words to upper case, provided the preceding word does not end with a capital letter.
3. Special handling of parenthesized segments may include
a) abbreviation of an intellectually abstracted version parenthesized and suffixed to the text string,
b) deletion at the outset,
c) prioritized abbreviation or
d) protection from abbreviation.
Parenthesis delimitation characters may be deleted after using these for pre-defined special handling of each uniquely parenthesized segment of text.
4. Other non-alphabet characters may be pre-defined as insignificant--e.g., selected punctuation--for prioritized deletion or as highly significant--e.g., symbols: % $ # @--for protection from deletion.
5. Abbreviation or truncation of acronyms or abbreviated words and of words having upper case alphabets apart from initials--e.g., MoU or dBASE, numeric digits and other pre-defined highly significant non-alphabet characters results in loss or distortion of meaning and hence such abbreviations, words or characters are not abbreviatable.
6. Use of acronym comprising initials of any word combination such as phrase, personal name or institutional name, as abbreviation--e.g., PAT for `Profits after tax`--is common knowledge and results in substantial reduction. Though replacement of such combination with acronym results in total loss of recognition of each original word singly, the acronym may well be easily recognized as such by common usage. Replacement of original word combination with acronym is quite desirable in such cases and may preferably be prioritized. As word abbreviation replacement of any component word from within a phrase (which has a corresponding acronym) renders the phrase no more replaceable with the acronym and as acronyms yield greater reduction than abbreviation of component word(s), it is desirable to prioritize acronym replacement before word abbreviation replacement.
7. Replacement of long word with its abbreviation is also common knowledge--e.g., `Coy` for `Company`. Use of such word abbreviation may be prioritized next, provided:
a) such word abbreviation is commonly used,
b) the word is not a component of a phrase or institutional name which has a corresponding acronym and
c) if the replacement with word abbreviation results in greater reduction compared to word reduction based on criterion: 10.
8. Conversion of enumeration words to a sequence of numerics results in substantial reduction without any loss of phonetic content and word recognition facility. It is desirable to prioritize such replacement immediately following replacement with commonly used acronyms and word abbreviations.
9. Shortening of words by replacement of portions within words with corresponding shorter sequence of alphabets or by selective deletion of less significant alphabets from within words other than those which are not abbreviatable in accordance with criterion 5 may be considered if the abbreviation methods, based on the preceding criteria, are inadequate. Pre-defined contiguous sequences of alphabets (comprising portions within words), excluding the initials, may be replaced with pre-defined shorter sequences (obtained by deletion of less significant alphabets therein). It is desirable to prioritize replacement of such characters sequence, immediately following conversion of enumeration words sequences.
10. Significance of any alphabet depends on loss of word recognition facility with deletion of the alphabet from within words. Such loss depends on a combination of factors:
a) Position of character--initial being the most important for word recognition and the ending character(s) being the least important.
b) Redundancy of consonant--in certain contiguous consonant occurrences one of the consonant is redundant and insignificant--e.g., `c` in `ck`, `n` in `mn` at the end of word.
c) Repetition of characters--in contiguously repeating sequence of any consonant, the occurrences in excess of one are less significant.
d) Relative phonetic significance of the character--words with the consonant(s) deleted cannot be pronounced at all; but attempts to pronounce words with the intervening vowel(s) deleted and supplying vague intervening vowel sounds approximate the complete word pronunciation--e.g., `significant` cannot be pronounced from `iiia`; but it can be pronounced from `sgnfcnt`. In this sense the intervening vowel(s) are phonetically less significant.
e) Length of the word--very short words with all the vowels deleted may become difficult to recognize--e.g., `car`, `care`, `core`, `cur`, `cure`, `curia`, `curie` and `curio`. Hence vowels deletion from within very short words results in complete loss of word recognition.
f) Contiguity of vowels--in any occurrence of intervening contiguous sequence of vowels, each of the vowels is less significant and selective deletion of only one or a few of such vowels may distort the vowel sound representation--e.g., `beautiful`. Deletion of all intervening contiguous vowels, in each such occurrence, may be preferable.
g) Relative optical prominence of characters--lower-case vowels and some consonants--e.g., c, n, r, s, v, x and z--are optically least prominent since these are without height (k), depth (g) or width (m with proportional spacing).
h) Certain occurrences of a few consonants are silent or least pronounced--e.g., `r` in `figure`.
11. Replacement of phrase or word with acronym or word abbreviation which is less commonly used may be resorted to only if:
a) word abbreviation methods, based on the preceding criteria, do not produce the desired extent of reduction and
b) if such replacement is supported with a glossary of such equivalents.
12. If a text string, fully abbreviated adopting the preceding criteria, cannot be accommodated within the space constraints, truncation of words of the text string may become necessary. It is preferable that such truncation is subject to predetermined minimum truncated word length limit and is started from the right end of the text string.
13. Truncation results in accelerated loss of word recognition facility, which can be avoided by executing abbreviation methods afresh, starting with an intellectual abstract of the input text string comprising of abbreviatable word(s), if such abstract is enclosed in pre-defined pairs of unique characters and suffixed to the input text string, by the user--e.g., while it is impossible to abbreviate the text string: "Total assets (excluding carried forward losses) net of total liabilities" to six characters without drastic truncation of words, it's intellectual abstract: "Net worth" can be so abbreviated without any truncation.
14. Though word truncation may result in loss of word recognition facility, truncated word combinations within column or row titles in pre-defined forms may be quite recognizable due to the context, repetitive use and familiarity. Though `Nt` cannot be recognized by itself as `Net`, it may be recognized as such within a truncated word combination--e.g., `NtCrAs`--more so by those familiar with the relevant information domain--e.g., investors, business managers, finance professionals and accountants.
15. In case drastic truncation is inevitable, it is preferable to prioritize truncation of words from the right end of the text string to bare initials until the desired output length limit is reached.
16. In tabulated forms and database fields, an input text string may be required to be abbreviated and placed as a row or column title of predetermined width in one or more rows, with minimal loss of word recognition facility and without breaking words, except to minimize word truncation.
17. In personal names it is customary to abbreviate all but one word (normally the surname or family name) to bare initials. A personal name may be recognized if beginning with the occurrence of a pre-defined unique word indicating title, gender, status, etc. After discrimination of personal name the title word may be optionally deleted.
18. An individual end user may have personal, editorial or knowledge domain specific preferences in accepting the aforesaid criteria for abbreviation in general and as regards predefinition of abbreviation database, choice of abbreviation options and control parameters of abbreviation methods in particular. It would be desirable to allow the user to make intelligent and optimal use of these methods without requiring any programming skills.
19. It would be desirable to devise the abbreviation procedure as a function with a comprehensive parameters list so that the procedure can be called and executed from within any spreadsheet application, database application, database management system (DBMS), any other standard or customized application or word processing application.
20. It would be desirable to provide for a pre-defined or predefinable abbreviation data file in several versions and another file of function parameter sets accessible from memory, disk drive or file server in local or wide area networks. With these provisions an individual user may conveniently and recurrently choose from several pre-defined sets of parameters (including reference to abbreviation data file version). The choice can be appropriate to the application from within which the function is called, the language, the structure or length of input text and space constraints within which the abbreviated text output is to be placed.
FIG. 1 is a block schematic diagram of a typical computer system, required to implement the preferred embodiment of this invention, consisting of a central processing unit (CPU) 011. Peripheral equipment includes storage 012, input devices such as keyboard with or without mouse 013, video display unit (VDU) 014 and printer 015. All the aforesaid equipment conform to popular standards and are well known to one of ordinary skill in the art. In future, input devices may include voice input capture devices for conversion of voice to written or printed word. The computer system or any part of it, other than the input devices and the VDU, may be shared within local area networks, wide area networks, the Internet or any other system of linked computer networks. The computer's storage 012 may consist of primary storage, such as RAM and secondary storage, such as disk drives, CD ROMs, DVDs, solid state drives and the like. The specifics as regards what data is read from or written to primary storage and/or secondary storage at each stage of processing impacts the efficiency of processing and safety of data and would be known to one of ordinary skill in the art. Therefore, this detailed description does not differentiate between the different types of storage.
FIG. 2 broadly presents the typical structures of data and programs within the storage 012 which are required to implement the preferred embodiment of this invention. These include an operating system 121 such as MS-DOS, OS/2 or Windows being popular standards, sundry standard or customized application or utility program 122, an abbreviation control data program 123, an abbreviation data file 124, a parameters set file 125, a parameters list 126, and an abbreviation function 127. Data files and data structures are represented as double lined blocks. The abbreviation control data program 123 presented in greater detail in FIG. 3 and the abbreviation function presented in greater detail in FIG. 4 may be called from within any sundry application program 122. The structure of the abbreviation data file 124 is presented as TABLE 1.
TABLE 1
Structure Of Abbreviation Data File:
File Name: AbData (or the DOS file name of the current version
of the file used is passed from parameter: AbrDtaFN)
Fields:
Name Data Type Description
AbAR Integer Abbreviation Rule numbers
AbPhWd String(55) Unabbreviated phrase, word or characters
AbAbrv String(10) Acronym or word abbreviation string
The structure of the parameters list of abbreviation string function 126 , in the preferred embodiment, is presented as TABLE 2. The parameters may be initiated as memory variables or included as the parameter list of an abbreviate function of the format:
ABBREVIATE(parameter list)
The Abbreviate function, in the preferred embodiment, would return:
a) In case of text string input: the abbreviated string (in single or multiple rows) and the reduction percentage of the input (Rdctn%).
b) In case of text file input: number of output lines (OutptLns) and the percentage of reduction of the input (Rdctn%).
c) Appropriate error messages, if any.
The user may pass each parameter afresh everytime the function is called. As a convenient alternative the user may choose any parameter set comprising of parameters (numbered P#=1 to 28 in TABLE 2) from a parameters set file 125 and the user may pass only the unique parameters (numbered P#=29 to 34 in TABLE 2) directly every time the function is called.
It may be possible to design the sundry application, from within which the abbreviation function of this invention is called, itself to develop:
a) the parameters: OtptL and StrRws by checking the space constraints of the location to which the output is to be supplied, while abbreviating text string and
b) the parameters: InptL and OtptL by checking the input text file named in parameter:InputFN and the output text file named in parameter:OutptFN, while abbreviating text file.
TABLE 2
Structure Of Parameters List Of Abbreviate Function:
Fields:
Valid
For
P# PrmtrNm Short Description Data Type FnctnSb
1 FnctnSb Function sub code: String(1) s/t
`s` for abbreviating text
string,
`t` for abbreviating text
file to text file,
2 OptnACc All capitals convert to lower String(1) s/t
case
3 OptnAbs Intellectual abstraction in String(1) s
curly brackets usage
4 OptnPri Prioritized deletion/ String(1) s/t
truncation of round bracket
contents
5 OptnPro Protection of square bracket String(1) s/t
contents from abbreviation
6 OptnAbB Pre-defined words barred from String(1) s/t
abbreviation
7 OptnTWd Title word (preceding personal String(1) s
name) deletion
8 OptnCAc Compulsory acronym String(1) s/t
replacement for phrase
9 OptnCAb Compulsory abbreviation String(1) s/t
replacement for word
10 OptnEWN Enumeration words to numerics String(1) s/t
conversion
11 OptnESq Ending sequence replacement String(1) s/t
12 OptnISq Intervening sequence String(1) s/t
replacement
13 OptnRCd Repeating consonant deletion String(1) s/t
(i.e., replacement of a
sequence of a contiguously
repeating consonant with one
such consonant)
14 OptnLAd LAdStrng based deletion String(1) s/t
15 OptnNAc Need based acronym String(1) s/t
replacement for phrase
16 OptnNAb Need based abbreviation String(1) s/t
replacement for word
17 OptnTrn Words truncation String(1) s
18 OptnFnl Text string final truncation String(1) s
19 OptnISd Intervening space deletion String(1) t
20 OptnLBj Line breaks joining String(1) t
21 AbrDtaFN Abbreviation data file name - String(8) s/t
version specific
22 PndStrng String containing punctuations String(8) s
for deletion
23 LAdStrng String containing less String(8) s/t
significant alphabets for
deletion
24 NDSStrng String containing non- String(8) s
deletable symbols
25 MnWdL Minimum word length limit Integer s/t
26 MnTrL Minimum truncated word length Integer s
limit
27 MxPNWds Maximum personal name words Integer s
limit
28 PNFWdL Personal name first word Integer s
length limit
29 InputFN Input text file name String(8) t
30 OutptFN Output text file name String(8) t
31 OtptL Desired output length or Integer s/t
row width or output record
length
32 InptL Input record length Integer t
33 StrRws String output rows number Integer s
34 InputStr Input text string String- s
(120)
Generally, the following rules are observed:
1) The option value--i.e., for parameters P#=2 to 20--is set to `Y`, to exercise the option, or else it is left blank, except
for OptnAbs:
a) If OptnAbs=`X`: the entire text string, including the abstract segment in curly brackets, is abbreviated. If the desired output length limit is not reached, without using truncation options, the OptnAbs is set to `Y` and abbreviation of the string is tried afresh.
b) If OptnAbs=`Y`: the text string, excluding the abstract segment, is abbreviated. If the desired output length limit is not reached, without using truncation options, the OptnAbs is set to `z` and abbreviation of the string is tried afresh.
c) If OptnAbs=`Z`: only the abstract segment is retained and abbreviated.
for OptnPri:
a) If OptnPri=`D`: prioritized segments in round brackets are deleted starting from the right end of the text string until the desired output length limit is reached.
b) If OptnPri=`I`: each word from the prioritized segments is truncated to bare initials from the right end, starting from the end of file:Shrtn, until the desired output length limit is reached.
c) If OptnPri=`T`: each word in the prioritized segments is truncated upto a predetermined minimum truncated word length limit (MnTrL) from the right end, starting from the end of file:Shrtn, until the desired output length limit is reached.
for OptnEWN:
a) If OptnEWN=`X`: any bundle word greater than thousand (Th) at the end of numeric abbreviation is retained.
b) If OptnEWN=`Y`: any bundle word at the end of numeric abbreviation is retained.
c) If OptnEWN=`Z`: the enumeration words sequence in the text string is fully converted to numerics without retaining any bundle word at the end of numeric abbreviation.
for OptnLAd:
a) If OptnLAd=`X`: less significant alphabets are deleted from the right end upto a predetermined minimum word length limit (MnWdL), excluding the last alphabet within each word from deletion.
b) If OptnLAd=`Y`: less significant alphabets are deleted from the right end upto a predetermined minimum word length limit, including the last alphabet within each word.
for OptnTrn:
a) If OptnTrn=`P`: all shortened words are truncated from the right end, such that the length of each word which is in excess of the predetermined minimum truncated word length limit is deleted in required uniform proportion, until the desired output length limit is reached.
b) If OptnTrn=`R`: shortened words are truncated from the right end, upto a predetermined minimum truncated word length limit and until the desired output length limit is reached.
for OptnFnl:
a) If OptnFnl=`Y`: each word (or basic element) from the text string is truncated from the right end, excluding bare initials of each word, pre-defined non-deletable symbols, numeric digit, decimal point and protected segment, until the desired output length limit is reached or all the words are dealt with.
b) If OptnFnl=`Z`: each word (or basic element) from the text string is truncated as in the preceding option, except that the protected segment is not excluded from truncation.
2) Valid value of MnTrL is any integer greater than 1.
3) Valid value of MnWdL is any integer greater than 1 and not less than MnTrL.
The structure of the parameters set file 125 is presented as TABLE 3.
TABLE 3
Structure Of Parameters Set File:
File Name: PSet
Fields:
Name Data Type Short Description
PrmSetId String(3) Parameter Set Identification code:
1st character = FnctnSb
2nd character = any lower case alphabet
group indicator
3rd character = any numeric digit set
indicator read with 1st and 2nd
characters
PrmComnt String(55) Parameter set comment
OptnACc String(1) )
OptnAbs String(1) )
OptnPri String(1) )
OptnPro String(1) )
OptnAbB String(1) )
OptnTWd String(1) )
OptnCAc String(1) )
OptnCAb String(1) )
OptnEWN String(1) )
OptnESq String(1) )
OptnISq String(1) )
OptnRCd String(1) )
OptnLAd String(1) > As in TABLE 2
OptnNAc String(1) )
OptnNAb String(1) )
OptnTrn String(1) )
OptnFnl String(1) )
OptnISd String(1) )
OptnLBj String(1) )
AbrDtaFN String(8) )
PndStrng String(8) )
LAdStrng String(8) )
NDSStrng String(8) )
MnWdL Integer )
MnTrL Integer )
MxPNWds Integer )
PNFWdL Integer )
The abbreviation data file 124, the parameters set file 125 and the parameters list 126 are authored using the abbreviation control data program 123. Any record from the parameters set file 125 may be used as a subset (i.e., parameters numbered P# =1 to 28 in TABLE 2) of the parameters list 126.
FIG. 3 presents the abbreviation control data program 123 in greater detail. It consists of a menu 1230 allowing choice of abbreviation data file authoring program 1231, parameter sets file authoring program 1232 and parameters list authoring program 1233. The abbreviation data file authoring program 1231 uses a data capture form structure for on-screen display to create and update several versions of the abbreviation data file 124. Relevant details pertaining to abbreviation rules (number and description) and data validation rules for each field are presented as TABLE 4. A typical abridged version of the abbreviation data file is presented as TABLE 5. A complete and system fixed list of enumeration words with abbreviations, being a subset of Abbreviation Data File with AbAR=2 is presented as TABLE 6. The program is not described further, being a routine matter well known to one of ordinary skill in the art.
TABLE 4
Abbreviation Rule Numbers, Description And Field Data
Validation Rules for Abbreviation Data File:
Fields:
Ab Abbreviation AbPhWd AbAbrv
AR Rule Description Validation Rules Validation
Rules
1 Words barred Any word comprising entirely of an
from abbreviation upper case 1st alphabet followed
by lower-case alphabet(s)
2 Conversion of single words Any word comprising entirely of Any numeric
digit, alphabet
representing numbers up to alphabets representing numbers, characters
or combination
hundred and bundle words bundles and derivatives such as (system
defined)
(such as thousand, million etc.) First from one or Tenth
to abbreviations from ten (system defined)
3 Deletion of title word Any word or abbreviation
representing status and/or popularly used in personal name
gender in personal name
4 Compulsory replacement of Any phrase or series of words Any
appropriate system formed
phrase with acronym with initials capitalised, or user
edited acronym
intervening spaces deleted
and length <=55
5 Compulsory replacement of Any word of 3 to 25 characters Any
appropriate abbreviation
word with abbreviation other than `And`, `Point` to yield at
least 25% reduction
and `Zero` for words
containing <= 4
characters
and at least 40%
reduction for
longer words
6 Need based replacement Any phrase or series of words with Any
appropriate system formed
of phrase with acronym initials capitalised, intervening or user
edited acronym
spaces deleted and length <=55
7 Need based replacment of Any word of 3 to 25 characters Any
appropriate abbreviation
word with abbreviation other than `And`, `Point` to yield at
least 25% reduction
and `Zero` for words
containing <= 4
characters
and at least 40%
reduction for
longer words
8 Need based replacment of Any sequence of Blank OR any
appropriate lower-
ending sequence of characters lower-case characters case
shorter sequence to yield
in word with a shorter sequence at least
a 50% reduction
9 Need based replacement of Any sequence of Blank OR any
appropriate lower-
intervening sequence of lower-case characters case shorter
sequence to yield
characters in word with a at least a
50% reduction
shorter sequence
Suggested cautions:
a) Do not permit user editing of system defined records with AbAR = 2.
b) Do not permit duplication of field: AbPhWd entries between records with
AbAR = 4 and 6; and between records with AbAR = 2,5 and 7.
c) Provide for change of AbAR = 4 to AbAR = 6, AbAR = 5 to AbAR = 7 and
vice versa.
d) Provide for creation and editing of several versions of file: AbData
appropriate to each usage domain or individual preference, with unique DOS
file names passed as parameter: AbrDtaFN.
TABLE 5
Abridged Version Of Typical Abbreviation Data File:
Fields:
AbAR AbAbrv AbPhWd
1 Abraham
1 Lincoln
2 Bn Billion
2 Cr Crore
2 8 Eight
2 18 Eighteen
2 80th Eightieth
2 80 Eighty
3 Mr
3 Mr.
3 Mrs
3 Dr
4 BS BalanceSheet
4 Fed FederalReserve
4 P&L ProfitAndLoss
4 PAT ProfitAfterTax
5 # Number
5 Tue Tuesday
5 Coy Company
5 Corp Corporation
5 $ Dollar
5 Govt Government
5 Inc Incorporated
5 % Percent
6 ADN AnyDayNow
6 BTW ByTheway
6 FYI ForYourInformation
7 Doc Document
7 Spdt Superintendant
8 k ck
8 g ing
8 mt ment
8 nt nent
8 nt nant
9 m mn
9 k ck
9 g ing
9 mt ment
9 nt nent
TABLE 6
Complete And System Fixed List Of Enumeration Words
(with abbreviations, being a subset of Abbreviation
Data File with AbAR = 2)
Fields:
AbPhWd AbAbrv
Billion Bn
Crore Cr
Eight 8
Eighteen 18
Eighteenth 18th
Eighth 8th
Eightieth 80th
Eighty 80
Eleven 11
Eleventh 11th
Fifteen 15
Fifteenth 15th
Fifth 5th
Fiftieth 50th
Fifty 50
First 1st
Five 5
Fortieth 40th
Forty 40
Four 4
Fourteen 14
Fourteenth 14th
Fourth 4th
Hundred 00
Hundredth 00th
Lac Lc
Lakh Lk
Million Mn
Nil 0
Nine 9
Nineteen 19
Nineteenth 19th
Nineth 9th
Ninetieth 90th
Ninety 90
One 1
Quadrillion Qd
Second 2nd
Seven 7
Seventeen 17
Seventeenth 17th
Seventh 7th
Seventieth 70th
Seventy 70
Six 6
Sixteen 16
Sixteenth 16th
Sixth 6th
Sixtieth 60th
Sixty 60
Ten 10
Tenth 10th
Third 3rd
Thirteen 13
Thirteenth 13th
Thirtieth 30th
Thirty 30
Thousand Th
Three 3
Trillion Tr
Twelfth 12th
Twelve 12
Twentieth 20th
Twenty 20
Two 2
Note:
In some Asian countries One Hundred Thousand is reckoned as a Lakh (Lac),
One Hundred Lakh is reckoned as a Crore and hence One Hundred Crore is
equivalent to One Billion.
The parameters set file authoring program 1232 uses a data capture form structure for on-screen display to create and update the parameters set file 125 covering parameters as presented in TABLE 3. The program is not described further being a routine matter well known to one of ordinary skill in the art. The parameters list authoring program 1233 uses a data capture form structure for on-screen display to create the parameters list 126. The program is not described further being a routine matter well known to one of ordinary skill in the art.
FIG. 4 presents in greater detail the abbreviation function 127, which is the crux of this invention. This presentation shows all subroutines, memory variables set, shortening file and reduction scope file, which files are used by the subroutines. Data files and data structures are represented as double lined blocks. Execution of the abbreviation function starts with the Main subroutine 1271 which in turn calls the Shorten subroutine 1272. The Shorten subroutine may recursively call the Move subroutine 1273, Match subroutine 1274, Replace subroutine 1275 and Separate subroutine 1276. Apart from the abbreviation data file 124 and the parameters list 126, which are accessed by the abbreviation function 127, the subroutines also access the memory variables set 1280 and the shortening file 1281. The reduction scope file 1282 is accessed only from the Replace subroutine 1275. The input file 1283, the processed input file 1284 and the output file 1285 are used from within the Main subroutine while abbreviating text file inputs only and not while abbreviating text string or word input. The memory variables set, comprises individual variables described within each method hereinafter and other variables that may be required to control the execution of conditional, sequenced and/or recursive steps of the methods of the abbreviation function depending on the programming details well known to one of ordinary skill in the art. The structures of the shortening file 1281, the reduction scope file 1282, the input file 1283, the processed input file 1284 and the output file 1285 are presented in TABLEs 7 to 11 hereinafter.
The shortening file 1281 has fields which are structured to:
a) hold each word (or basic element), separated from the word separation and processing string, in separate records in original sequence (i.e., in sequence of field:ShSq),
b) hold, along with the first word of phrase, less commonly used matched acronym for need based replacement; or hold, along with the word, less commonly used word abbreviation for need based replacement,
c) hold indication if each word (or basic element) has reduction scope (i.e., open for reduction, by acronym or word abbreviation replacement or phonetic shortening),
d) hold indication if each word (or basic element) is covered by any of the abbreviation methods, rule numbers or category numbers--number greater than zero indicating that the word is not open for abbreviation or truncation except in the last step of text string truncation, if required,
e) hold indication if each word originally had the initial in capital letter and if it had any of its other alphabets capitalized by the system to control the phonetic shortening methods and
f) in general facilitate execution of the methods of this invention within shortening file until the desired output length limit is reached or each word (or basic element) is dealt with.
The Shorten subroutine 1272 calls the Match subroutine 1274 wherein the abbreviation data file 124 is searched for acronyms or word abbreviations corresponding to phrases or words contained in the input text string. The commonly used acronyms or word abbreviations, if found, are replaced compulsorily and other acronyms or word abbreviations, if found, are held in corresponding records of shortening file 1281 for need based replacement at a later stage. The Shorten subroutine 1272 calls the Replace subroutine 1275 for need based replacement of phrases or words with acronyms or word abbreviations using the reduction scope file 1282 to keep track of reduction scope length of the acronyms or word abbreviations found and held for need based replacement. The records in reduction scope file are sequenced in the descending order of reduction scope length, the objective being to achieve the required reduction with the least number of need based replacements in the records of the shortening file as referenced from the first few records of the reduction scope file.
If the abbreviated output is required to be placed in multiple row column widths, the Shorten subroutine 1272 calls the Separate subroutine 1276 to separate the input text string into required number of portions without splitting words (except to minimize word truncation) before shortening the portions. Thereafter each portion is duly abbreviated to the desired output length limit.
Generally, abbreviation of multiple line text requires additional files--namely an input file 1283 from which the input records are first read, a processed input file 1284 into which the input records are copied with control data to keep track of line breaks, paragraph breaks, blank lines and indentation and an output file 1285 to which the abbreviated records are written. The pattern of line breaks, blank lines and indentation are reproduced in the output file 1285, if required. Optionally line breaks and blank lines are joined to save display space, indicating the joints with one or two `@` character(s).
If a single line text string is being abbreviated, the function returns the output string. If a multiple line text in a file is being abbreviated, each abbreviated output string is accumulated upto the predetermined output record length, reckoned in terms of monospace or proportional spacing, and each accumulated record is added to the pre-defined output text file 1285.
TABLE 7
Structure Of Shortening File:
File Name:Shrtn
Fields:
Name Data Type Description
ShSq Integer Record sequence number.
ShSWrd String (26) Word (or basic element) for abbreviation
or commonly used acronym or word
abbreviation replacement (after OptnCAc
& OptnCAb are used).
ShAbrv String (10) Less commonly used acronym or word
abbreviation held for need based
replacement later.
ShRS Integer Reduction scope indicator (only 0, 1, 6 or
7 being valid). ShRS = 0 indicates that
field:ShSWrd is not open for reduction.
ShRS is set to 6 or 7, if less commonly
used acronym or abbreviation is held in
ShAbrv.
ShAR Integer Abbreviation rule indicator - default
value being zero. AbAR numbers 1-5, are
copied directly from file:AbData, as
applicable and numbers 6 & 7 are copied
from ShRS after need based replacement
of less commonly used acronym or word
abbreviation. All non-abbreviatable
words (or basic elements) are numbered
20. Protected segment is numbered 22
ShAR > 0 indicates that the word is not
open for word truncation, except in the
last stage of text string truncation,
if required.
ShCap Integer Indicating original case status of
initial of word or capitalization of
other alphabets of word to control
phonetic shortening methods (only 0, 1,
10 or 11 being valid).
In the preferred embodiment, the following integer variables, derived from
the field value(s) in this file, are used:
a) SwrdLen = Number of characters contained (excluding trailing space(s))
in field:ShSWrd of each record in file:Shrtn.
b) AbrvLen = Number of characters contained (excluding trailing space(s))
in field:ShAbrv of each record in file:Shrtn.
c) TotLen = Sum of SWrdLen of all records in file:Shrtn.
TABLE 8
Structure Of Reduction Scope File:
File Name:Scope
Fields:
Name Data Type Description
ScAcAbRS Integer Need based acronym or word abbreviation
replacement reduction scope length
ScSq Integer Sequence number copied from ShSq of
file:Shrtn
TABLE 8
Structure Of Reduction Scope File:
File Name:Scope
Fields:
Name Data Type Description
ScAcAbRS Integer Need based acronym or word abbreviation
replacement reduction scope length
ScSq Integer Sequence number copied from ShSq of
file:Shrtn
TABLE 8
Structure Of Reduction Scope File:
File Name:Scope
Fields:
Name Data Type Description
ScAcAbRS Integer Need based acronym or word abbreviation
replacement reduction scope length
ScSq Integer Sequence number copied from ShSq of
file:Shrtn
TABLE 11
Structure Of Output File:
File Name:The DOS file name of the current file used is
passed from parameter:OutptFN.
Fields:
Name Data Type Description
OtRcord String (120) Output text file record
The subroutines are presented in greater detail in flow chart format in FIGS. 5 to 8 and 9 to 10.
Separate control panel formats for text string abbreviation and text file abbreviation are presented in TABLES 12 and 13. The panels may be used for capturing user's choice of data file version, options or other control parameters or to display or print these choices, if required.
TABLE 12
Text String Output Control Panel:
String Output: PrmSetId = sal, AbrDtaFN = abdata2
ACc Abs Pri Pro AbB TWd CAc CAb EWN ESq ISq RCd LAd
NAc NAb Trn Fnl
Y XYZ DIT Y Y Y Y Y XYZ Y Y Y XY
Y Y PR YZ
PndStrng LAdStrng NDSStrng MnWdl MnTrL MxPNWds PNFWdL OtptL
StrRws
,;: aeiour #$%+-@/ 03 02 03 08 25
01
TABLE 12
Text String Output Control Panel:
String Output: PrmSetId = sal, AbrDtaFN = abdata2
ACc Abs Pri Pro AbB TWd CAc CAb EWN ESq ISq RCd LAd
NAc NAb Trn Fnl
Y XYZ DIT Y Y Y Y Y XYZ Y Y Y XY
Y Y PR YZ
PndStrng LAdStrng NDSStrng MnWdl MnTrL MxPNWds PNFWdL OtptL
StrRws
,;: aeiour #$%+-@/ 03 02 03 08 25
01
The basic embodiment of the invention with some variations is designed to abbreviate text string or word and with other variations to abbreviate text file. These variations are explained with reference to each method or step of the invention hereinafter. Each method of the invention is numbered 1 to 33 and in case any method comprises a plurality of steps, each such step is designated with a unique lower-case alphabet suffix to the method number. The one or two digit numeric designating the method or the numeric-alphabet designating a method-step is used as the reference character in the flow charts (i.e., in FIGS. 5 to 8 and 9 to 10). These designating reference characters are placed at the end of relevant method description or method-step statement, as the case may be.
The methods that are used in the preferred embodiment, numbered 1 to 33, are as follows:
Method 1: Creation of Abbreviation Data File
The system provides for creation and editing of abbreviation data file 124, by the user or developer using the file structure described in TABLE 1 hereinbefore with valid inputs as specified in the format presented in TABLE 4. The method of creation and editing is not described further, being a routine matter known to one of ordinary skill in the art.
Method 2: Creation of Parameters Set File
The system requires creation and editing of parameters set file 125, by the user, with valid inputs into the file structure described in TABLE 3 hereinbefore. Several parameter sets may be pre-defined and stored in the file for selective use as and when the function is called. The method of creation and editing is not described further, being a routine matter known to one of ordinary skill in the art.
Method 3: Generation of Complete Parameter List
A user may be allowed to develop the parameter list by passing each parameter afresh each time the function is called. For convenience and consistency of results the user may be enabled to define instantly the unique parameters--i.e., input text DOS file name (InputFN), output text DOS file name (OutptFN), output length (OtptL), input length (InptL), number of string output rows (StrRws) and input text string (InputStr)--and to choose any parameter set from the parameter sets file 125 to complete the parameters list 126 everytime the function is called. With this the function is ready for execution.
Method 4: Start of Main Program
This method is illustrated in FIG. 5
The abbreviation methods of the invention use related entries in records of file:AbData and the pre-defined characters passed in the parameters:PndStrng, LAdStrng and NDSStrng. The parameter list also includes certain parameters which indicate to the system what methods or control features the user has opted for. However, even if the user chooses certain options, in the absence of related entries in file:AbData and parameters:PndStrng, LAdStrng and NDSStrng, the options are not effective. To prevent wasteful processing, the system, checks for and blanks the `empty` options, at the outset. A backup of the control parameters is made so that later the original values can be restored, if required 4.
For abbreviation of text file (i.e., if parameter:FnctnSb=`t`): the system skips to Method 10.
Method 5: String Initial Steps, Separation and Movement to WrdStrng
This method is illustrated in FIGS. 5, 15 and 16.
An input text string may be abbreviated into a single row string or a string comprising multiple rows. Abbreviation of input text string into multiple row string is covered by two options:
i) Manual separation--i.e., delimitation of input text string into several portions by the user using a unique row separator character (i.e., vertical bar `.vertline.`), before calling the abbreviate function.
ii) Automated system separation of input text string into predetermined number of rows (StrRws).
In the former option each separated portion is processed separately to the desired row width (OtptL). Each separate abbreviated output is then accumulated into a string (CumStrng) separated with system supplied row separator(s).
In the latter option the unseparated input text string is first processed as a whole upto and including Method 16, before system placement of row separators (using Methods 17 and 31). The number of row separators is one less than the desired number of several rows (StrRws).
For the control of processing of input text string as a whole before system placement of row separators, it is necessary to set desired output length limit, OutL=OtptL*StrRws. After placement of row separators, input text string is recycled back to Method 5 and each portion of the string is processed separately to the desired row width (OtptL) as in the former option. In this latter option also, each separate abbreviated output is accumulated into CumStrng, separated with system supplied row separators.
In the initial steps, the contents of input text string parameter (InputStr) are moved to a separate input processing string (InpStrng) in which the initial steps of abbreviation are executed. If option for conversion of all capital letters input to lower case is chosen (i.e., OptnACc=`Y`) and if all alphabet characters are capital letters all capital letters are converted to lower case. All apostrophes are deleted from the InpStrng.
The initial steps are concluded by moving the whole or each separate portion of InpStrng to word (or basic element) separation and processing string (WrdStrng), left justified. The WrdStrng is also copied to WrdStrngC so that in case the processing has to be aborted and tried afresh, the WrdStrng is available in original form.
The specific steps of the method, designated (a) to (f), are:
a) Blanking CumStrng;
Copying InputStr to InpStrng and setting OutL=OtptL, if StrRws=1; Else OutL=OtptL*StrRws
If OptnAbs is not blank: Setting OptnAbsC=OptnAbs;
Note: This is done to hold a copy of the parameter value intact while OptnAbs parameter value may be changed from `X` to `Y` or `Y` to `Z`. The need for change in OptnAbs parameter value arises, if OptnAbs=`X` or `Y` and the InpStrng cannot be abbreviated to the OutL limit without resorting to word truncation.
Similarly, setting OptnPriC=OptnPri and OptnProC=OptnPro to have backups in case the paramater values are changed in process 5a.
b) If OptnACc=`Y` and InpStrng has all alphabets in capitals: converting InpStrng to lower case 5b.
Note: The abbreviation methods are ineffective in any input text string which consists of all capital letters, unless OptnACc=`Y`.
c) Deleting apostrophe from InpStrng 5c.
d) Locating within InpStrng row separator--i.e., vertical bar character `.vertline.`--and, if found, setting StrRws=1 & OutL=OtptL; and moving each separate portion of InpStrng to WrdStrng 5d.
e) If row separator is not found, then moving the whole of InpStrng to WrdStrng, left justified 5e.
f) Setting WrdStrngC=WrdStrng 5f.
Method 6: String Brackets Handling
This method is illustrated in FIGS. 5, 13 and 15.
For abbreviating text string, delimited segment options include:
i) OptnAbs (abstract segment):
A matched pair of unique delimitation characters--i.e., curly brackets--not nested within any matched pair of unique characters, containing a substantially and intellectually concised abstraction (comprising one or more words--preferably abbreviatable) of the text string and suffixed to that string for:
a) If OptnAbs=`X`: abbreviating the entire string, including the portions contained within curly brackets. If the desired output length limit (OutL) is not reached without resorting to word truncation, OptnAbs is set to `Y` and abbreviation of the string is tried afresh.
b) If OptnAbs=`Y`: retaining and abbreviating the string, excluding the curly brackets and their contents. If OutL limit is not reached without resorting to word truncation, OptnAbs is set to `Z` and abbreviation of the string is tried afresh.
c) If OptnAbs=`Z`: retaining and abbreviating only the contents of the curly brackets.
ii) OptnPro (protected segment):
A matched pair of unique delimitation characters--i.e., square brackets--not nested within any other matched pair of square or round brackets, delimiting the segment(s) of string to be protected from abbreviation until final truncation (i.e., Method 27)
iii) OptnPri (prioritized segment):
A matched pair of unique delimitation characters--i.e., round brackets--not nested within any other matched pair of square or round brackets, delimiting the segments of string to be prioritized for:
a) If OptnPri=`D`: deleting completely.
b) If OptnPri=`I`: truncating words to bare initial, excluding pre-defined non-deletable characters.
c) If OptnPri=`T`: truncating words upto a predetermined minimum truncated word length limit (MnTrL), excluding pre-defined non-deletable characters.
The specific bracket handling steps for text string abbreviation, designated (a) to (c), are:
a) Blanking the bracket character if corresponding delimited segment option is not chosen--i.e., curly brackets for OptnAbs, square brackets for OptnPro and round brackets for OptnPri; and blanking all occurrences of unmatched brackets; and blanking any bracket character found within a pair of matched curly brackets; and blanking any bracket character found within a pair of outer most (round or square) matched brackets 6a.
Note: This is done to give precedence to the outer pair of brackets.
b) Retaining entire WrdStrng or portion for abbreviation, if containing pair of curly brackets as follows 6b:
If OptnAbs=`X`: entire WrdStrng, left justified.
If OptnAbs=`Y`: after deleting the curly brackets and contents from WrdStrng, left justifying the remaining portion(s).
If OptnAbs=`Z`: after deleting all but the contents of curly brackets from WrdStrng, left justifying the remaining portions and setting OptnAbs=blank.
c) If OptnPri=`D` and if matching pair(s) of round brackets found: deleting the round bracket pair(s) and contents (one pair and contents at a time) from the right end, until OutL limit is reached 6c.
Method 7: Deletion of Punctuation
This method is illustrated in FIGS. 6, 13, 15 and 16.
Occurrence of punctua |