Replacement of element

Abbreviating and compacting text to cope with display space constraint in computer software

6279018

Abstract

This invention relates to text abbreviation methods to cope with display or print space constraint in computer software. In particular, abbreviation of text into predetermined field widths (with single or multiple rows), utilizing an operating system (121), an application program (122), and an abbreviation control data program (123), along with combinations of prioritized shortening methods in preference to or in addition to glossaries of acronyms and word abbreviations using an abbreviation function (127) are disclosed. The special handling of segments of input contained within pairs of pre-defined characters, as well as omission of spaces, and conversion of enumeration word or word sequences to numbers utilizing an abbreviation data file (124), a parameters sets file (125), and a parameters list (126), are also disclosed. The omission of spaces and phonetically less significant characters compacts word sequences, which saves display space and enables use of larger type sizes.


Claims

We claim:

1. A method for abbreviating text to cope with display or print space constraint in computer software such that loss of word recognizability is minimized, wherein said text includes a plurality of words, said space constraint is defined in terms of a predetermined abbreviated text length limit and said method comprises the steps of:

a) selecting one or more words from the text as being abbreviatable words;

b) shortening only those abbreviatable words whose length exceeds a predetermined minimum word length limit while the length of the text is in excess of the predetermined abbreviated text length limit, said shortening comprising at least one of:

(i) replacing a sequence of alphabets in any abbreviatable word with a shorter sequence, wherein said sequence of alphabets does not include the initial of the abbreviatable word; and

(ii) deleting one or more alphabets from any abbreviatable word, but excluding from deletion the initial of the abbreviatable word; and

c) truncating only those abbreviatable words whose length exceeds a predetermined minimum truncated word length limit while the length of the text is in excess of the predetermined abbreviated text length limit.

2. The method of claim 1 further comprising at least one of:

replacing in the text a sequence of words, being a phrase, with its corresponding commonly used acronym, if an entry containing the phrase and the acronym is found in a predetermined list; and

replacing in the text an abbreviatable word with its corresponding commonly used word abbreviation, if an entry containing the abbreviatable word and the word abbreviation is found in a predetermined list.

3. The method of claim 1 further comprising at least one of:

replacing in the text a sequence of words, being a phrase, with its corresponding acronym, if an entry containing the phrase and the acronym categorized as less commonly used is found in a predetermined list and the length of the text is greater than the predetermined abbreviated text length limit even after shortening the abbreviatable words in the text; and

replacing in the text an abbreviatable word with its corresponding word abbreviation, if an entry containing said abbreviatable word and said word abbreviation categorized as less commonly used is found in a predetermined list, such replacement yields greater reduction than the reduction that is obtained by shortening said abbreviatable word and the length of the text is greater than the predetermined abbreviated text length limit even after shortening the abbreviatable words in the text.

4. The method of claim 1 further comprising converting a continuous sequence of at least two enumeration words in the text, of which at least one of the enumeration words is a bundle word, into a shorter sequence using a predetermined list of enumeration words and corresponding abbreviations, wherein said bundle word is a single enumeration word which connotes a value greater than hundred and said shorter sequence contains at least four numeric characters.

5. The method of claim 4 further comprising step (a) and at least one of steps (b), (c), and (d):

a) replacing bundle word abbreviations with predetermined corresponding figures, each said corresponding figure comprising either a punctuation character or a sequence of one or more punctuation characters and numeric zero(s);

b) inserting into the sequence of converted figures a numeric one;

c) locating and deleting occurrences of superfluous numeric zero(s), if any, from the sequence of converted figures; and

d) inserting into the sequence of converted enumeration words one or more numeric zero(s).

6. The method of claim 1 further comprising truncating the abbreviated text finally, starting from the right end, until the text is reduced to the predetermined abbreviated text length limit or the entire text is dealt with, but excluding from truncation the initial alphabet of any word and at least one of:

a) any numeric character or decimal point;

b) any character contained in a predetermined set of non-deletable symbols; and

c) predetermined protected segments.

7. The method of claim 1 wherein the abbreviatable words on which the abbreviating steps are carried out include partially abbreviated words.

8. The method of claim 1 wherein the length of any word is the number of characters in the word.

9. The method of claim 1 wherein the selecting step includes:

locating sequences of one or more contiguous alphabets preceded by a space, punctuation or beginning of text and followed by a space, punctuation or end of text and recognizing such sequences as words; and

locating words containing at least two alphabets and no upper case alphabets other than the first alphabet and classifying such words as abbreviatable words.

10. The method of claim 1 wherein the replacing step comprises at least one of:

replacing a contiguous sequence of alphabets in any abbreviatable word with a shorter sequence of at least one alphabet, if an entry containing said contiguous sequence of alphabets and its corresponding shorter sequence is found in a predetermined list; and

replacing a sequence comprising a contiguously repeating consonant in any abbreviatable word with a shorter sequence of only one such consonant.

11. An abbreviated text generated by employing the method in claim 10.

12. The method of claim 10 wherein the replaced shorter sequence is identified so that said shorter sequence is not further shortened using the shortening step subsequently.

13. The method of claim 1 wherein the deleting step comprises deleting a contiguous sequence of one or more vowels from any abbreviatable word, provided said contiguous sequence is deleted entirely and the length of said abbreviatable word after deleting said contiguous sequence would not become less than the predetermined minimum word length limit.

14. An abbreviated text generated by employing the method in claim 13.

15. The method of claim 1 wherein the truncating step includes at least one of:

truncating only the truncatable part of every abbreviatable word in an approximately equal proportion such that the text is reduced to the predetermined abbreviated text length limit, said truncatable part comprising that part of every such word which is in excess of the predetermined minimum truncated word length limit; and

truncating abbreviatable words to the predetermined minimum truncated word length limit, starting from the right end of the text, while the length of the text is in excess of the predetermined abbreviated text length limit.

16. An abbreviated text generated by employing the method in claim 15.

17. The method of claim 1 wherein the selecting step includes at least one of:

classifying a word as a non-abbreviatable word, if said word is found in a predetermined list of words barred from abbreviation; and

classifying a word as a non-abbreviatable word, if said word is an acronym or a word abbreviation appearing in a predetermined list.

18. The method of claim 1 further comprising dealing with predetermined delimited segments in an exceptional manner, where said dealing includes at least one of:

a) abbreviating only the delimited segment containing an abstract after deleting the rest of the text;

b) protecting the delimited segment from abbreviation;

c) prioritizing deletion of the delimited segment before abbreviating the rest of the text; and

d) prioritizing truncation of the delimited segment before truncating the rest of the text.

19. An abbreviated text generated by employing the method in claim 18.

20. The method of claim 1 wherein the unit of measure for the predetermined abbreviated text length limit and for the length of the text is either a monospaced character or a unit of measure suitable for measuring proportionally spaced text.

21. A computer-readable medium embodying the method in one of claims 1-3, 4-6, 7, 8-12, 13, 15-18, 20.

22. An abbreviated text generated by employing the method in claim 1.

23. The method of claim 1 wherein the shortening step (b) is executed irrespective of the predetermined abbreviated text length limit, the truncating step (c) is not executed and the abbreviated text is split into two or more lines each not exceeding the predetermined abbreviated text length limit.

24. A computer system for abbreviating text to cope with display or print space constraint such that loss of word recognizability is minimized, wherein said text includes a plurality of words and said system comprises:

a) means for selecting one or more words from the text as being abbreviatable words;

b) means for shortening only those abbreviatable words whose length exceeds a predetermined minimum word length limit while the length of the text is in excess of a predetermined abbreviated text length limit, said shortening means comprising at least one of:

(i) means for replacing a sequence of alphabets in any abbreviatable word with a shorter sequence, wherein said sequence of alphabets does not include the initial of the abbreviatable word; and

(ii) means for deleting one or more alphabets from any abbreviatable word such that the initial of the abbreviatable word is excluded from deletion;

c) means for truncating only those abbreviatable words whose length exceeds a predetermined minimum truncated word length limit while the length of the text is in excess of the predetermined abbreviated text length limit; and

d) means for controlling abbreviation of the text, said means comprising one or more predetermined abbreviation data lists, abbreviation options and abbreviation control parameters.

25. The system of claim 24 further comprising at least one of:

means for replacing in the text a sequence of words, being a phrase, with its corresponding commonly used acronym, if an entry containing the phrase and the acronym is found in a predetermined list; and

means for replacing in the text an abbreviatable word with its corresponding commonly used word abbreviation, if an entry containing the abbreviatable word and the word abbreviation is found in a predetermined list.

26. The system of claim 24 further comprising at least one of:

means for replacing in the text a sequence of words, being a phrase, with its corresponding acronym, if an entry containing the phrase and the acronym categorized as less commonly used is found in a predetermined list and the length of the text is greater than the predetermined abbreviated text length limit even after shortening the abbreviatable words in the text; and

means for replacing in the text an abbreviatable word with its corresponding word abbreviation, if an entry containing said abbreviatable word and said word abbreviation categorized as less commonly used is found in a predetermined list, such replacement yields greater reduction than the reduction that is obtained by shortening said abbreviatable word and the length of the text is greater than the predetermined abbreviated text length limit even after shortening the abbreviatable words in the text.

27. The system of claim 24 further comprising means for converting a continuous sequence of at least two enumeration words in the text, of which at least one of the enumeration words is a bundle word, into a shorter sequence using a predetermined list of enumeration words and corresponding abbreviations, wherein said bundle word is a single enumeration word which connotes a value greater than hundred and said shorter sequence contains at least four numeric characters.

28. The system of claim 27 further comprising means as in means (a) and at least one of means (b), (c), and (d):

a) means for replacing bundle word abbreviations with predetermined corresponding figures, each said corresponding figure comprising either a punctuation character or a sequence of one or more punctuation characters and numeric zero(s);

b) means for inserting into the sequence of converted figures a numeric one;

c) means for locating and deleting occurrences of superfluous numeric zero(s), if any, from the sequence of converted figures; and

d) means for inserting into the sequence of converted enumeration words one or more numeric zero(s).

29. The system of claim 24 wherein the controlling means includes means for dealing with predetermined delimited segments in an exceptional manner, where said dealing means includes at least one of:

a) means for abbreviating only the delimited segment containing an abstract after deleting the rest of the text;

b) means for protecting the delimited segment from abbreviation;

c) means for prioritizing deletion of the delimited segment before abbreviating the rest of the text; and

d) means for prioritizing truncation of the delimited segment before truncating the rest of the text.

30. The system of claim 24 wherein the controlling means includes means for determining the points of separation while abbreviating text into predetermined plural number of rows of predetermined row widths.

31. The system of claim 30 wherein the determining means includes means for ensuring that said points of separation are determined such that at least one of the following conditions are satisfied:

a) each separated portion of the text when abbreviated results in uniform reduction, with the length reduction within each row after separation bearing approximately the same proportion to the sum of the excess length of every abbreviatable word over a predetermined minimum word length limit;

b) unutilized blank spaces are minimized in each of the abbreviated separate rows;

c) splitting of words between rows is minimized; and

d) words or sequences of characters which are too long to be accommodated at the end of any row and which will cause unutilized space in the row if accommodated in the next row, are split between the rows such that each split portion has at least two characters.

32. The system of claim 24 wherein the controlling means includes a file which facilitates abbreviation of the text by holding words or sequences obtained from the text along with an indication for every word that is abbreviatable.

33. The system of claim 24 wherein the abbreviation data list means includes at least one of:

a) a list of at least one entry containing a word barred from abbreviation;

b) a list of at least one entry containing an enumeration word and its abbreviation;

c) a list of at least one entry containing a phrase and its commonly used acronym;

d) a list of at least one entry containing a word and its commonly used word abbreviation;

e) a list of at least one entry containing a phrase and its less commonly used acronym; and

f) a list of at least one entry containing a sequence of alphabets and its shorter sequence for replacement in a word.

34. The system of claim 24 wherein the abbreviation option means includes at least one of:

a) an option for prioritized deletion or truncation of a delimited segment in the text;

b) an option for protection of a delimited segment from abbreviation in the text;

c) an option for barring predetermined words from abbreviation in the text;

d) an option for compulsorily replacing a phrase with its commonly used acronym in the text;

e) an option for compulsorily replacing a word with its commonly used abbreviation in the text;

f) an option for abbreviating an enumeration word sequence into a sequence containing at least one numeric character in the text;

g) an option for replacing an ending sequence of alphabets in a word with a shorter sequence;

h) an option for replacing an intervening sequence of alphabets in a word with a shorter sequence;

i) an option for replacing a sequence of a contiguously repeating consonant in a word with one such consonant;

j) an option for deleting a less significant alphabet in a word;

k) an option for need based replacement of a phrase with its less commonly used acronym in the text;

l) an option for truncating a word in the text; and

m) an option for final truncation of the text.

35. The system of claim 24 wherein the abbreviation control parameter means includes at least one of:

a) a group of one or more punctuations for deletion in the text;

b) a group of one or more less significant alphabets for deletion in a word;

c) a group of one or more non-deletable symbols;

d) a minimum word length limit;

e) a minimum truncated word length limit;

f) an abbreviated text length limit;

g) a separated row output width value; and

h) a number of separated output rows value.

36. A method for abbreviating text to fit into a display or print space constraint in computer software such that loss of word recognizability is minimized, wherein said text includes a plurality of words, said display or print space constraint comprises a predetermined plural number of rows of predetermined row widths and said method comprises the steps of:

a) selecting one or more words from the text as being abbreviatable words;

b) replacing in the text a sequence of words comprising a phrase with its corresponding acronym, if an entry containing the phrase and its corresponding acronym is found in a predetermined list;

c) after replacing phrases with corresponding acronyms as described in step (b), separating the text into at least two row strings such that the number of said row strings does not exceed the predetermined plural number of rows and each said row string is associated with its corresponding predetermined row width;

d) in any row string, shortening only those abbreviatable words whose length exceeds a predetermined minimum word length limit while the length of said row string is in excess of its corresponding predetermined row width, said shortening comprising at least one of:

(i) replacing a sequence of alphabets in any abbreviatable word with a shorter sequence, wherein said sequence of alphabets does not include the initial of the abbreviatable word; and

(ii) deleting one or more alphabets from any abbreviatable word, but excluding from deletion the initial of the abbreviatable word; and

e) in any row string, truncating only those abbreviatable words whose length exceeds a predetermined minimum truncated word length limit while the length of said row string is in excess of its corresponding predetermined row width.

37. The method of claim 36 further comprising converting a continuous sequence of at least two enumeration words in the text, of which at least one of the enumeration words is a bundle word, into a shorter sequence using a predetermined list of enumeration words and corresponding abbreviations, wherein said bundle word comprises a single enumeration word which connotes a value greater than hundred and said shorter sequence contains at least four numeric characters.

38. The method of claim 37 further comprising step (a) and at least one of steps (b), (c), and (d):

a) replacing bundle word abbreviations with predetermined corresponding figures, each said corresponding figure comprising either a punctuation character or a sequence of one or more punctuation characters and numeric zero(s);

b) inserting into the sequence of converted figures a numeric one;

c) locating and deleting occurrences of superfluous numeric zero(s), if any, from the sequence of converted figures; and

d) inserting into the sequence of converted enumeration words one or more numeric zero(s).

39. The method of claim 36 further comprising truncating any row string finally, starting from the right end, until said row string is reduced to its corresponding predetermined row width or the entire row string is dealt with, but excluding from truncation the initial alphabet of any word and at least one of:

a) any numeric character or decimal point;

b) any character contained in a predetermined set of non-deletable symbols; and

c) predetermined protected segments.

40. The method of claim 36 wherein the abbreviatable words on which the abbreviating steps are carried out include partially abbreviated words.

41. The method of claim 36 wherein the length of any word is the number of characters in the word.

42. The method of claim 36 wherein the selecting step includes:

locating sequences of one or more contiguous alphabets preceded by a space, punctuation or beginning of text and followed by a space, punctuation or end of text and recognizing such sequences as words; and

locating words containing at least two alphabets and no upper case alphabets other than the first alphabet and classifying such words as abbreviatable words.

43. The method of claim 36 wherein the replacing step comprises at least one of:

replacing a contiguous sequence of alphabets in any abbreviatable word with a shorter sequence of at least one alphabet, if an entry containing said contiguous sequence of alphabets and its corresponding shorter sequence is found in a predetermined list; and

replacing a sequence comprising a contiguously repeating consonant in any abbreviatable word with a shorter sequence of only one such consonant.

44. The method of claim 43 wherein the replaced shorter sequence is identified so that said shorter sequence is not further shortened using the shortening step subsequently.

45. The method of claim 36 wherein the deleting step comprises deleting a contiguous sequence of one or more vowels from any abbreviatable word, provided said contiguous sequence is deleted entirely and the length of said abbreviatable word after deleting said contiguous sequence would not become less than the predetermined minimum word length limit.

46. The method of claim 36 wherein the truncating step includes at least one of:

truncating only the truncatable part of every abbreviatable word in an approximately equal proportion such that the row string is reduced to its corresponding predetermined row width, said truncatable part comprising that part of every such word which is in excess of the predetermined minimum truncated word length limit; and

truncating abbreviatable words to the predetermined minimum truncated word length limit, starting from the right end of the row string, until the row string is reduced to its corresponding predetermined row width.

47. The method of claim 36 wherein the selecting step includes at least one of:

classifying a word as a non-abbreviatable word, if said word is found in a predetermined list of words barred from abbreviation; and

classifying a word as a non-abbreviatable word, if said word is an acronym or a word abbreviation appearing in a predetermined list.

48. The method of claim 36 further comprising dealing with predetermined delimited segments in an exceptional manner, where said dealing includes at least one of:

a) abbreviating only the delimited segment containing an abstract after deleting the rest of the text;

b) protecting the delimited segment from abbreviation;

c) prioritizing deletion of the delimited segment before abbreviating the rest of the text; and

d) prioritizing truncation of the delimited segment.

49. The method of claim 36 wherein the separating step (c) further comprises:

ca) in the text which has to be separated into row strings, selecting a word for splitting into two split portions;

cb) shortening the selected word using the shortening step (d) in said claim 36, if the selected word is an abbreviatable word;

cc) splitting the selected word such that each split portion has at least two characters;

cd) separating the text into at least two row strings such that one of the row strings ends with the first split portion and the next row string begins with the second split portion; and

cd) identifying each split portion to prevent further shortening.

50. The method of claim 36 wherein the unit of measure for the predetermined row widths and for the length of the row strings is either a monospaced character or a unit of measure suitable for measuring proportionally spaced text.

51. An abbreviated text generated by employing the method in claim 36.


Description

FIELD OF THE INVENTION

This invention relates to a method and system for abbreviating text to predetermined or undefined length through user controlled and selective methods such as deletion of alphabets in words, replacement of sequences of alphabets in words with representative shorter sequences, replacement of phrases and words with acronyms and word abbreviations respectively and truncation of words and text to make up for the spatial limitations of the display screen or the printed page within any computer software. The methods are effective for language scripts which use capital letters apart from lower case and separate alphabets for consonants and vowels.

BACKGROUND OF THE INVENTION

Human beings have devised words in script form for representing the contents of their vocal communication and intellectual pursuits. The computer with its binary code can hold, process and reproduce information in audio-visual form avoiding the written word. Audio-visual form may be more convenient than the written word for many purposes. However, the written word may not yet be avoidable altogether and may well be more practical in many situations.

Newspapers and periodicals continue to be popular, though with crowding of information, type sizes tend to be reduced. Reading fine print is strainful to the eyes, especially within transportation systems which are not vibration free or with advancing age.

In the world of commerce, industry, business, management and other professions there is an increasing tendency to tabulate and present information in sets of predesigned forms. A tabulated display of text (including numeric values) on screen, unlike a serial replay of voice file or audio-video recording, allows the user to skim across the display screen at his or her own pace to spot, read and comprehend portions in isolation or to read related portions back and forth recurrently for overall comprehension without changing the display. Forms essentially entail demarcation of columns or rows to predetermined sizes--e.g., in spreadsheets, database files or other application packages. Accommodation of text strings of varying length into predetermined columns or rows of fixed length is problematic. Some solutions offered in computer software are:

a) manual editing for abbreviation,

b) change or adjustment of column width or row height and

c) synonym search and replacement with any shorter synonym.

These solutions require user interference with discretion and the results may not be uniform, at each occurrence of the same problem.

There is increasing use of computers for word processing and for a variety of other applications with precise and consistent fonts. The miniaturization of computers is leading to hand-held personal computers packed with tremendous inbuilt or accessible computing power and a variety of software applications with stored data apart from direct and instant access to the information highway. However, the display unit cannot be subjected to unlimited miniaturization due to the physical limitations of the human eye in reading text or graphics. The display space is proving to be a serious constraint; and methods apart from miniaturization need to be found to overcome the display unit constraint.

Conventional methods and prior art which are being used to accommodate more text in display or print include:

a) use of glossaries for replacement of words or phrases with word abbreviations or acronyms,

b) deletion of blank spaces separating words (in excess of one),

c) deletion of all blank spaces separating any two words in a line, after capitalizing the initial of the second word,

d) deletion of blank space(s) around punctuation characters,

e) deletion of all vowels from word,

f) deletion of all vowels from word, excluding the first character,

g) truncation of word or text string,

h) reduction of space between lines of text,

i) finer crafting of fonts, using proportional spacing,

j) compression, size reduction or congesting of characters and

k) vertical or horizontal scrolling of text interactively (in display).

U.S. Pat. No. 5,691,708 includes an abbreviation command, controlled by five parameters, used prior to placement of text message in buffer for abstraction. The first parameter allows use of word abbreviation or acronyms from an abbreviation text file which is a common practice. There are no control features to prioritize acronym replacement over word abbreviation replacement, to prioritize commonly used acronyms or word abbreviations over those which are less commonly used and to use less commonly used acronyms or word abbreviations only if other methods do not yield the desired reduction. The second and the third parameters allow deletion of all vowels from words excluding or including the first characters. There are no control features to ensure that deletion of vowels from words does not render them unrecognizable, nor to allow the user to be selective as to which vowels or other less significant alphabets are open for deletion.

U.S. Pat. No. 4,486,857 is a "Display System For The Suppression And Regeneration Of Characters In A Series Of Fields In A Stored Record". "Suppression" comprises the methods of vowel deletion and truncation. There are no control features to ensure that the use of these methods does not render the contents of the fields unrecognizable.

Certain rules for development of abbreviations as speedy inputs to computers to obtain the full text are contained in U.S. Pat. Nos. 5,623,406, 5,305,205, 4,969,097 and 4,760,528. But these abbreviation rules are mechanistic and suitable only for computer processing and not for easy recognition by the users.

The method of expansion and resizing of data fields in forms as contained in U.S. Pat. No. 5,450,538 may not always be practicable or convenient.

U.S. Pat. No. 5,231,579 covers methods of compression, size reduction or congesting of characters; and these methods are strainful to the eyes.

Modern word processors with finely crafted fonts and using proportional spacing have fairly exhausted further scope for compacting of screen fonts and printer fonts.

Unlike the optical faculty which cannot be stretched beyond a point, the intellectual faculty to associate symbols or words with concepts, to interpret occurrences of words according to context and to recognize words in abbreviated forms can be cultivated almost without bounds. Such cultivation, training or practice through conscious and deliberate effort results in accrual to subconscious (and hence effortless) competencies.

Word abbreviations are recognized by common usage and repetitive association with the original words. A reader or writer is capable of associating printed or written symbols with spoken sounds. A listener is capable of associating spoken sounds with the objects, processes and concepts they represent. A silent reader is capable of directly associating printed or written symbols with the objects, processes or concepts they represent.

The spoken word is often a combination of several sounds. In many written languages each alphabet represents a single basic sound--though in English some alphabets--e.g., c, g, h, n, r and the vowels--are pronounced differently or are silent depending on their context. Phonetically all sounds are not equally significant and it is possible to classify each alphabet based on its usual phonetic significance. This would provide a criterion for prioritizing deletion of less significant alphabets from within words for progressive abbreviation with minimal loss of phonetic content. Such a criterion together with other complementary criteria can provide an alternative of automated phonetic abbreviation to the commonly used word or phrase abbreviation which may not necessarily be phonetic abbreviation.

Phonetic abbreviations would be quite convenient to users, when commonly used acronyms or word abbreviations are not well established or are not known to the users. By and large, only a few of all the words in any language have commonly used abbreviations; and it is necessary to devise alternate methods of word abbreviation for wider application.

Consequently, there is a clear and urgent need:

a) to devise phonetic abbreviation criteria, rules and methods to be used in preference to or in addition to the conventional or known abbreviation methods,

b) to devise fine controls for abbreviation methods including for conventional or known abbreviation methods, and

c) to allow the end user to make intelligent and optimal use of these methods and controls in accordance with personal or knowledge domain specific preferences, without requiring any programming skills. The preferences may be as regards predefinition of abbreviation database, choice of abbreviation options and control parameters and delimitation of segments for special handling. Each individual user should be able to instantly abbreviate text from any source entirely in accordance with his or her own personal preferences.

SUMMARY OF THE INVENTION

It is therefore an object of the invention to provide a comprehensive set of fully automated methods for abbreviation of text in any computer software.

Another object of the invention is to provide fine controls for the methods of the invention through user editable means. For example, the user editable means include abbreviation data lists, abbreviation options and abbreviation control parameters. Sets of the user editable means may be stored in data files so that appropriate sets may be recurrently and readily used in a variety of software applications in accordance with the context--viz. the language and or subject of the text, structure or length of text and space constraints within which the abbreviated text is to be placed.

An applications design related object of the invention is to provide for a versatile abbreviation function which can be used for instant abbreviation of any addressable text (entered through keyboard, voice recognition input device or other input device) for placement within the space constraints of any single or multiple row field with minimal loss of phonetic content and without splitting words between rows except to minimize word truncation.

An overall object of the invention is to provide maximum optical facility by abbreviating text and enabling use of larger types or precluding the use of smaller types, in display and print.

Another overall object of the invention is to accommodate more text by abbreviation in the available display space, thus overcoming the display unit constraint in computers and hand-held devices.

The abbreviation methods in this invention include the following steps:

1. selecting one or more abbreviatable words from the text,

2. prioritizing replacement of commonly used acronyms and word abbreviations over less commonly used acronyms and word abbreviations,

3. using the less commonly used acronyms and word abbreviations only if the other abbreviation methods do not yield the required reduction,

4. converting sequences of enumeration words in the text into sequences comprising numeric characters and punctuations,

5. replacing a sequence of alphabets in any abbreviatable word with a corresponding shorter sequence,

6. deleting one or more alphabets from any abbreviatable word,

7. checking length of abbreviatable words to ensure that abbreviatable words with length greater than a predetermined minimum word length limit are subject to abbreviation,

8. truncating abbreviatable words and text, if necessary,

9. dealing with pre-defined delimited segments in an exceptional manner, for example:

a) abbreviating only the delimited segment containing an abstract after deleting the rest of the text,

b) protecting the delimited segment from abbreviation,

c) prioritizing deletion of the delimited segment before abbreviating the rest of the text, and

d) prioritizing truncation of the delimited segment before truncating the rest of the text,

10. determining the points of separation while abbreviating text into predetermined number of rows of predetermined row width, if the points of separation have not been pre-defined before abbreviating text,

11. controlling abbreviation of text in accordance with abbreviation control parameters,

The abbreviation means used in this invention include the following:

1. abbreviation data list means:

a) a list of words barred from abbreviation,

b) a list of enumeration words and their abbreviations,

c) a list of phrases and their commonly used acronyms,

d) a list of words and their commonly used word abbreviations,

e) a list of phrases and their less commonly used acronyms, and

f) a list of alphabet sequences and their shorter sequence for replacement in words,

2. abbreviation option means:

a) an option for prioritized deletion or truncation of delimited segments in the text,

b) an option for protection of delimited segments from abbreviation in the text,

c) an option for barring pre-defined words from abbreviation in the text,

d) an option for compulsorily replacing phrases with their commonly used acronyms in the text,

e) an option for compulsorily replacing words with their commonly used abbreviations in the text,

f) an option for abbreviating enumeration word sequences into sequences comprising numeric characters and punctuations in the text,

g) an option for replacing ending sequences of alphabets in words with shorter sequences,

h) an option for replacing intervening sequences of alphabets in words with shorter sequences,

i) an option for replacing sequences of a contiguously repeating consonant in words with one such consonant,

j) an option for deleting less significant alphabets in words,

k) an option for need based replacement of phrases with their less commonly used acronyms in the text,

l) an option for truncating words in the text, and

m) an option for final truncation of the text,

3. abbreviation control parameter means:

a) a group of punctuations for deletion in the text,

b) a group of less significant alphabets for deletion in words,

c) a group of non-deletable symbols,

d) a minimum word length limit,

e) a minimum truncated word length limit,

f) an abbreviated text length limit,

g) a separated row output width value, and

h) a number of separated output rows value.

4. enumeration words conversion means:

A system for converting any continuous sequence of enumeration words in any text into a sequence containing numeric characters comprising:

a) means for replacing enumeration words with their corresponding abbreviations,

b) means for handling variations in style of expressing enumeration words sequences,

c) means for obtaining valid converted sequence suitable for arithmetic manipulation,

d) means for inserting into the converted sequence punctuation characters if required,

e) means for inserting into the converted sequence one or more numeric characters representing zero if required,

f) means for inserting into the converted sequence numeric character representing one if required, and

g) means for deleting occurrences of connecting word abbreviation such as "and" from the converted sequence, if superfluous.

How the Objects are Achieved

Phonetic abbreviation is achieved by selective deletion of blank spaces (after capitalizing the initials of words), deletion of pre-defined insignificant non-alphabet characters, replacement of sequences of alphabets within words with representative shorter sequences, deletion of pre-defined alphabets considered to be less significant for word recognition--i.e., phonetically or optically. Phonetic abbreviation results in minimal loss of phonetic content, saves display space and enables use of larger type sizes maximizing optical facility. The result is: maximum optical facility added concised script (abbreviated as Mofacs).

This invention is suitable for abbreviation of:

a) Text, comprising a string, into abbreviated text string of predetermined or undefined abbreviated text length.

b) Text, comprising a string delimited into several portions with user supplied row separator(s) (i.e., a unique delimitation character such as a vertical bar), into an abbreviated text string comprising several rows of predetermined equal length.

c) Text, comprising a string without any user supplied row separator(s), into an abbreviated string comprising predetermined number of rows of predetermined equal length, without splitting of words between rows except to minimize word truncation.

d) Text, comprising long multiple line textual matter (e.g., newspaper reports, essays, speeches and the like), into abbreviated text of predetermined width.

In the preferred embodiment of the invention described hereinafter, multiple line text is read from an ASCII text file and abbreviated output text is written to an ASCII text file. However the users of this invention may read multiple line text from any other type of file or from a memo field and abbreviated output text may be written to any other type of file or to a memo field.

In the preferred embodiment, abbreviation methods are always carried out on text strings. If a long multiple line text is input for abbreviation, smaller strings are picked up from the long text in sequence and abbreviated one at a time. Hereinafter, the text which is to be abbreviated is referred. to as either "text" or "text string".

The methods of acronym and word abbreviation replacement, deletion of alphabets (generally vowels) and truncation are known methods as outlined in the Background Of The Invention. In this invention, these methods are improved as explained below:

1. Improved acronym and word abbreviation replacement method:

This invention has two types of acronyms and word abbreviations namely, commonly used acronyms and word abbreviations and less commonly used acronyms and word abbreviations.

Replacement of commonly used acronyms and word abbreviation is compulsory and is prioritized before replacement of less commonly used acronyms and word abbreviations.

Replacement of less commonly used acronyms and word abbreviation is done only if necessary and if the other abbreviation methods yield lesser reduction.

2. Improved alphabets deletion method:

In this invention deletion of alphabets from words is subject to a minimum word length limit. Alphabets are not deleted from words if the word length does not exceed the minimum word length limit. Because of this control feature there is minimal loss of word recognition facility.

3. Improved truncation methods:

In this invention truncating methods are executed in stages in a controlled manner to minimize loss of word recognition facility. In the earlier stage words are truncated only if the word length exceeds a minimum truncated word length limit. In the later stage of truncation of text from the right end, the initials of words, numeric characters, decimal point, pre-defined non-deletable symbols and pre-defined protected segments are not truncated.

The prioritized and selective methods of the invention pertain to five broad groups:

1. Delimitation of segments with unique characters for special handling--namely:

a) Identifying a segment of a text string as being an intellectual abstract of the rest of the text string, so that the abstract may be abbreviated if the text string in itself or excluding the delimited segment cannot be abbreviated to the desired output length limit without resorting to word truncation.

Hereinafter, the desired output length limit is also referred to as the abbreviated text length limit.

b) Prioritizing segment(s) in a text string for deletion or truncation before abbreviation of the rest of the text string.

c) Protecting segment(s) in a text string from abbreviation and truncation until the final truncation of the text string.

2. Phonetic abbreviation methods:

a) Deleting blank spaces and pre-defined non-alphabet characters having no phonetic content.

b) Shortening of words with minimal loss of phonetic content by:

i) replacement of frequently occurring sequences of lower-case alphabets with representative shorter sequences,

ii) replacing occurrences of contiguously repeating consonants with one such consonant--repeating consonants being largely redundant phonetically,

iii) deletion of less significant alphabets in accordance with item 10 of the section entitled, "Logical criteria for abbreviation of text string or text file", presented later herein,

subject to a predetermined minimum word length limit.

3. Enumeration words conversion methods:

Converting enumeration words sequence to a sequence, comprising numeric digits and punctuations, without loss of phonetic content, the numeric sequence being a phonetic equivalent (e.g., `One Thousand` and `1,000` are both pronounced identically).

4. Abbreviation replacement methods:

after searching separate glossaries for:

a) commonly used phrases and corresponding acronyms,

b) commonly used words and corresponding word abbreviations,

c) less commonly used phrases and corresponding acronyms, and

d) less commonly used words and corresponding word abbreviations,

Generally, the following rules are observed:

a) Acronym replacement is prioritized over word abbreviation replacement.

b) Commonly used phrases and words are replaced compulsorily before phonetic shortening methods.

c) Less commonly used phrases and words are replaced before resorting to truncation methods, but after exhausting phonetic shortening methods.

d) While commonly used acronyms and word abbreviations, if opted for, will compulsorily replace corresponding phrases and words, the less commonly used acronyms and word abbreviations, if opted for, will replace corresponding phrases and words only if needed--i.e., only if the text cannot be reduced to desired output length limit without use of these acronyms and word abbreviations.

5. Truncation methods:

optionally deleting characters from word, abbreviated word or any sequence of characters.

The options include:

a) In personal name text string:

i) optional deletion of title word,

ii) truncation of all abbreviatable words (ignoring undeleted title word, if any), except the first word, to bare initials and

iii) truncation of the first word (ignoring undeleted title word, if any) from the right end to a predetermined minimum length.

b) In segment(s) of text string prioritized for deletion or truncation using pre-defined delimitation characters:

i) deletion of each prioritized segment, starting from the right end until the desired output length limit is reached or all the segments are dealt with,

ii) truncation of abbreviatable or shortened abbreviatable word to bare initials, starting from the right end until the desired output length limit is reached or all the words are dealt with or

iii) truncation of abbreviatable or shortened abbreviatable word to a predetermined minimum truncated word length limit, starting from the right end until the desired output length limit is reached or all the words are dealt with.

c) In text string (excluding prioritized segments):

i) truncation of abbreviatable or shortened abbreviatable word, such that a uniform proportion of the length of word which is in excess of the predetermined minimum truncated word length limit is deleted starting from the right end until the desired output length limit is reached or all the words are dealt with or

ii) truncation of abbreviatable or shortened abbreviatable word to a predetermined minimum truncated word length limit, starting from the right end until the desired output length limit is reached or all the words are dealt with.

d) In text string for final truncation: truncation of the text string, starting from the right end, but excluding:

i) bare initial of each word,

ii) pre-defined non-deletable symbols,

iii) numeric digit,

iv) decimal point and

v) segment protected from abbreviation and truncation by delimitation with any pair of pre-defined unique characters (if so opted),

until the desired output length limit is reached or all the words (or basic elements) are dealt with.

For abbreviation of text comprising of the input text string, all the opted methods are used in sequence, but the phonetic shortening and need based abbreviation replacement methods are stopped as soon as the predetermined desired output length limit is reached. The truncation methods are used last, if and to the extent required.

As a special option, a single row text string may be processed using all the opted methods, barring truncation and without any length limit if the desired output length limit is passed as zero (i.e., undefined). Thus, phonetic shortening options and need based abbreviation replacement options are fully exhausted but truncation methods are avoided altogether.

For abbreviation of multiple line text, compulsory abbreviation replacement methods are used first, if opted, followed by enumeration words conversion and all the opted word shortening methods. This is followed by need based abbreviation replacement methods, if opted and if these provide greater shortening. Punctuation deletion methods and truncation methods are generally not used while abbreviating multiple line text.

Generally, the punctuation deletion parameter is validated with reference to a system fixed comprehensive group of punctuations--e.g., ! ; ' .backslash. , _ : " ?. The less significant alphabet deletion parameter consists of lower-case alphabets for deletion and is validated with reference to a system fixed comprehensive group of low case alphabets, as appropriate to each input language. The truncation methods are controlled with a non-deletable symbols parameter. Generally, the non-deletable symbols parameter is validated with reference to a system fixed comprehensive group of symbols--e.g., @ # $ % + - .backslash.. This comprehensive group of symbols and the comprehensive group of punctuations, mentioned hereinbefore, are generally mutually exclusive. The other parameters of the function also control the abbreviation methods in several ways, as can be seen from the detailed description of the methods of the preferred embodiment hereinafter.

The delimited segments special handling, phonetic shortening, abbreviation replacement, enumeration words conversion and truncation methods of this invention are fully automated. The text for abbreviation may be accessed from addressable fields in databases, spreadsheets or other applications or from text or other files (or memo fields). The text may also be obtained by keyboard inputs or through special devices such as a voice recognition (to written or printed word) system. If the methods are used as a function, the abbreviated text string is returned for placement within any desired field in database, spreadsheet or other applications or the abbreviated text is appended to a text or other file. Generally, the names of the source and output files are defined and included in the parameter list.

In the preferred embodiment, repetitive use of the Abbreviate function is facilitated by predefining sets of choices of:

a) user created abbreviation data file version,

b) control options and

c) other control parameters

preferably into a data file.

Consistently abbreviated results will be obtained if the same pre-defined set of choices is used. However, appropriate sets of choices may have to be carefully pre-defined and chosen to optimize the results of the methods, in tune with the language of text abbreviated, personal preferences and specific knowledge domain. With its several optional features the abbreviate function as it applies to text files can be a useful component of any word processing application.

Uses of the Invention

The abbreviated text obtained using the methods and means of this invention may be used to overcome display space constraint or for greater optical facility (with use of larger types) in computers.

Busy officials and business executives may develop a preference for internal reports in Mofacs (maximum optical facility added concised script) for fast personal reading.

After the abbreviation data file and parameters list have been defined or determined and the abbreviation function (i.e., this invention) is called, the text is abbreviated in a fully automated manner without user intervention.

This invention can be used for many language scripts apart from English.

Some of the uses of the invention are as follows:

1. In computer screens:

In computer applications--e.g., spreadsheet package, database package, database management system (DBMS) or any other standard or customized application--screen form layout entails demarcation of columns and rows to predetermined sizes. Certain columns or rows in the layout contain fixed information (i.e., names or titles of items of information, but not the information itself) with which users develop familiarity. The invention can help to reduce the area allocated for such fixed information, thus saving space for variable information, which in fact is the subject matter for careful, selective and focused reading. The variable information also can be automatically abbreviated to predetermined field widths.

The methods of the invention may be used in menu bars, pull-down menus, windows for displaying text and dialog boxes to cope with display space constraint.

The several methods of the invention yield a wide range of reduction upto about 70%, if required, without any manual intervention. Horizontal or vertical scrolling and compressed printing of oversized forms is avoided. The abbreviated text can fit into varying field widths in different forms, though the original information elements are sourced every time from a commonly used unabbreviated data file.

The function format of the invention, with a comprehensive parameters list supported with a database of abbreviation rules with pertinent data and the provision for choice of pre-defined parameter sets ensures consistent results. The control panel showing combination of parameters list used offers total control to the user with complete transparency. The user can fine-tune his or her choices with experience and preserve the preferred parameter sets for future use on textual information obtained or downloaded from any source.

The methods of the invention in general and the conversion of enumeration words sequence to number in particular are quite suited to voice recognition input methods in database, spreadsheet, word processing or other application programs, if and when such input methods are generally accepted as practical. Inputs to numeric and other data fields through keyboard would normally involve the use of numeric digit keys. However, with voice recognition input systems the input capture may be in words form and such enumeration words sequences can be instantly converted to numeric characters using the methods of this invention.

2. In Web sites:

The methods of the invention can be used in Web sites, so that visitors are able to read the textual information either in original form or in abbreviated versions.

3. In newspaper columns:

Senior citizens (and perhaps the readership at large) may develop a preference for abbreviated text in newspaper columns, provided the abbreviation database version, control options and control parameters are carefully fine-tuned and consistently used. Sections in newspapers which are specially devoted to such readers can be produced with abbreviated text in larger type size, within the space constraints.

In columns reporting market quotations, the names of companies or items quoted can be abbreviated. The abbreviation options, pertinent database and control parameters, if adopted uniformly and consistently in reporting business performances, would facilitate focused reading by busy investors and executives.

Classified advertisements with abbreviated text within newspaper columns may be more economical and yet readable.

4. In pagers:

Pager being a tiny portable device, has a tinier panel for message display. Though the pager may not have the computing facility to abbreviate messages on-line, the methods of the invention may be quite feasible for pagers also. The messages can be abbreviated at a central computing facility before transmission to any pager. Use of appropriate versions of abbreviation rules and pertinent data in a central database will ensure consistency of abbreviated text. Task relevant acronyms and word abbreviations may be adopted for common use and uniform communication.

5. In control panels:

Control panels are an essential requirement within aircrafts, vehicles, manufacturing and household equipment, control rooms and computer applications. Modern control panels include context specific messages for obtaining response to faults or errors. Abbreviated text may help to make the most of the space constraints on the panel.

6. In Television screens:

Often films are telecast with subtitles in a different language. The reading convenience to the viewer is inversely proportional to the speed of the character train or the number of display changes in a given time. The speed or number of display changes can be reduced in direct proportion to the reduction obtained with the abbreviated text.

7. In billboards:

Electronic billboards are installed at prominent places to be visible from large distances. The display includes character trains and flashes of advertisement text. Abbreviated text of the invention offers the same optical facility as in television screens.

8. In teleprompters:

Abbreviated text can be instantly produced from plain text for teleprompters, using abbreviation options, pertinent data and control parameters personally selected and fine-tuned by each speaker or reader. The facility for editing and storage of several versions of the database and parameter sets, with complete control and transperency to the user, is specially suited for this user segment.

9. In publication of books:

Use of abbreviated text in books printed with proportionally spaced types, with reduction potential of about 20-25%, may prove economical and may even be preferred by fast readers. The abbreviation options, pertinent database and control parameters can be selected and fine-tuned by the authors and publishers. Thereafter the production of the abbreviated text version of any book can be fully automated.

10. In electronic data bank, database, encyclopedia, dictionary, glossary etc:

Users may find a sort order of words, phrases or captions ignoring phonetically insignificant characters (such as vowels--except the initial of each word, contiguously repeating consonant, apostrophe, hyphen and intervening space(s) between words) more convenient for two reasons. Firstly, spelling errors in words entered for search are minimized. Secondly, the number of keystrokes for search is reduced.

This sort order implies that the producer of the data bank, database, encyclopedia, dictionary, glossary or such other data source has to provide for an additional sort key for the words, phrases or captions (ignoring phonetically insignificant characters); and the end user has to search for the word, phrase or captions after entering these (in full or preceding part) with the insignificant characters excluded. Additionally, (during word processing), a user may spell-check for the normal word or search for its meaning. In case no match is found the system may develop the abbreviated word, phrase or captions, search for it and if a match is found: show it in full form with the meaning.

11. In search engines:

Developers of search engines for information on the Internet, may provide for search routines using the abbreviated text sort order.

BRIEF DESCRIPTION OF THE DRAWINGS

A more complete understanding of the invention may be obtained by reading the following description in conjunction with the appended drawings in which like elements are labelled similarly and in which:

FIG. 1 is a block schematic diagram of a typical computer system including storage 012;

FIG. 2 is a block schematic diagram of the interconnections between the programs (including the abbreviation control data program 123 and the abbreviation function 127) and the data structures within the storage 012;

FIG. 3 is a block schematic diagram of the structure of the abbreviation control data program 123;

FIG. 4 is a block schematic diagram of the subroutines and the data structures comprising the abbreviation function 127;

FIG. 5 is a flow chart representation of the Main subroutine 1271 within abbreviation function 127 as it relates to abbreviation of text string or word;

FIG. 6 is a continuation of the flow chart representation from FIG. 5;

FIG. 7 is a flow chart representation of the Main subroutine 1271 within abbreviation function 127 as it relates to abbreviation of (multiple line) text, branching off from FIG. 5;

FIG. 8 is a continuation of the flow chart representation from FIG. 7;

FIG. 9 is a flow chart representation of the Shorten subroutine;

FIG. 10 is a continuation of the flow chart representation from FIG. 9;

FIGS. 11 to 16 are blow-by-blow listings of progressive abbreviation results from input text to abbreviated text line-by-line, with the corresponding control panel at the top of the listing. Each of these lines are assembled from the relevant memory variables or other data structures located in storage 012 to illustrate the status of abbreviation at each step. The input line is prefixed with `00` and the subsequent lines are prefixed with the corresponding Method-step numbers. The Method-steps are described in detail in the detailed description of the preferred embodiment hereinafter;

FIG. 11 is a blow-by-blow listing of progressive abbreviation for a single line text of undefined output length;

FIG. 12 is a blow-by-blow listing illustrating abbreviation of a single line text to a predetermined desired output length limit. Apart from other abbreviation methods FIG. 12 illustrates the use of less commonly used acronym and word abbreviation. These are used because they provide greater reduction than other abbreviation methods and because the length of the partially abbreviated text exceeds the predetermined abbreviated text length limit of 30;

FIG. 13 is a blow-by-blow listing illustrating abbreviation of a single line text to a predetermined desired output length limit including an abstract segment;

FIG. 14 is a blow-by-blow listing illustrating abbreviation of a single line text with undefined desired output length including an enumeration words sequence;

FIG. 15 is a blow-by-blow listing illustrating abbreviation of a string containing pre-defined row separators into multiple rows of predetermined equal width. The string also contains a protected segment and a prioritized segment;

FIG. 16 is a blow-by-blow listing illustrating abbreviation of a string containing no row separators into multiple rows of predetermined equal width. Row separators are placed in the string by the system using Separate subroutine;

FIG. 17 illustrates how the use of this invention leads to better utilization of available space on a display or while printing. In the upper part of the FIG. the row titles in the table are unabbreviated and therefore take up a lot of space. In the lower part of the FIG. the row titles in the table have been abbreviated and hence a lot of space is saved. This saved space is used to display more useful information. The control panel in the middle shows the various abbreviation options and parameters used;

FIG. 18 is an illustration of the upper table of FIG. 17 with the row titles transformed into abbreviated multiple row columnar titles, suitable for a database listing format, with the corresponding control panel placed at the top. The database may have (i) name of corporation, (ii) year and (iii) rank (Rk) number as sort keys, so that listings can be taken for each corporation for desired sequence of years or each year for desired corporations/ranks. The data in each database listing may be millions of dollars, growth percentage or proportion percentage for the listed corporations or years;

FIG. 19 is an illustration of typical unabbreviated monospace text file, followed by a typical control panel for abbreviation and the corresponding text after abbreviation. The illustration comprises entirely of phonetic shortening methods in preference to abbreviation replacement methods and also does not include the delimited segment methods. The truncation methods are totally avoided in multiple line text abbreviation;

FIG. 20 is an illustration of the typical unabbreviated text file of FIG. 19 converted to proportionally spaced type (maintaining the line length as in FIG. 19), followed by the abbreviated text of FIG. 19 converted to proportionally spaced type and further followed by the abbreviated text of FIG. 19 converted to proportionally spaced larger type for maximized optical facility within the space constraints of the unabbreviated version at the top of FIG. 20.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENT OF THE INVENTION

The finely controlled methods of the invention are enabled with a scheme of discriminating between and recognizing the several basic elements of the text. Clear definition of the several basic elements or segments of the text is therefore a prerequisite for the detailed description.

Usually, text file comprises several sentences grouped into paragraphs with or without title lines, indentation or blank line separators. Generally, a text may be an input text string or a text string picked up from a multiple line text file and comprises one or more of the following basic elements or segments:

a) Word: a character sequence comprising entirely of alphabets (or a single alphabet), with or without apostrophe, separated at both ends with space, punctuation, start of text string or end of text string.

b) Abbreviatable word: a word with at least two alphabets, the initial in upper case or lower case and all the other alphabets in lower case.

c) Non-abbreviatable word: a word which is,

i) a single alphabet or

ii) a word with at least one alphabet other than the initial in upper case or

iii) a word abbreviation or

iv) an acronym.

d) Phrase: comprising an abbreviatable group of words forming a conceptual unit or name of person or entity.

e) Non-alphabet sequence: any contiguous sequence of non-alphabet characters (or a single non-alphabet character).

f) String: a line comprising one or more word, hyphenated word, phrase, numeric digit, punctuation character, bracket character, other symbolic character or intervening blank space.

g) Bundle Word: Any single non-trivial numeric word which connotes a bundle greater than hundred--e.g., thousand, lakh, lac, million, crore, billion, trillion, quadrillion and the like.

h) Enumeration Word: Any single word which,

i) corresponds to any one or pair of numeric characters barring the word `zero`,

ii) connotes a bundle--e.g., thousand, lakh, lac, million, crore, billion etc.,

iii) connects other enumeration words--e.g., `and`, or

iv) is a derivative of any other enumeration word--e.g., first, second, . . . thousandth, millionth etc. ]

i) Abstract segment: any sequence of characters within a text string, parenthesized with a pair of pre-defined unique characters--e.g., curly brackets--containing an intellectually concised abstraction (comprising one or more words--preferably abbreviatable) of the rest of the string and suffixed to or included in that string.

j) Prioritized segment: any sequence of characters (or a single character) within text string or text file, parenthesized with a pair of pre-defined unique characters--e.g., round brackets--to signal that the sequence is prioritized for deletion or truncation.

k) Protected segment: any sequence of characters (or a single character) within text string or text file, parenthesized with a pair of pre-defined unique characters--e.g., square brackets--to signal that the sequence is protected from abbreviation.

Logical Criteria for Abbreviation of Text String or Text File

A description of the inventors' understanding of a set of sequenced criteria on which the invention is broadly based follows:

1. Abbreviation of text is essentially a logical, prioritized and selective process of deletion of non-text matter such as blank spaces, deletion of less significant non-alphabet characters, replacement of words or word combinations with abbreviations, replacement of alphabet sequences from within words with representative shorter sequences, deletion of less significant alphabets from within words and truncation of long words from the right end.

2. Leading spaces used as word separators may be deleted after converting the initials of words to upper case, provided the preceding word does not end with a capital letter.

3. Special handling of parenthesized segments may include

a) abbreviation of an intellectually abstracted version parenthesized and suffixed to the text string,

b) deletion at the outset,

c) prioritized abbreviation or

d) protection from abbreviation.

Parenthesis delimitation characters may be deleted after using these for pre-defined special handling of each uniquely parenthesized segment of text.

4. Other non-alphabet characters may be pre-defined as insignificant--e.g., selected punctuation--for prioritized deletion or as highly significant--e.g., symbols: % $ # @--for protection from deletion.

5. Abbreviation or truncation of acronyms or abbreviated words and of words having upper case alphabets apart from initials--e.g., MoU or dBASE, numeric digits and other pre-defined highly significant non-alphabet characters results in loss or distortion of meaning and hence such abbreviations, words or characters are not abbreviatable.

6. Use of acronym comprising initials of any word combination such as phrase, personal name or institutional name, as abbreviation--e.g., PAT for `Profits after tax`--is common knowledge and results in substantial reduction. Though replacement of such combination with acronym results in total loss of recognition of each original word singly, the acronym may well be easily recognized as such by common usage. Replacement of original word combination with acronym is quite desirable in such cases and may preferably be prioritized. As word abbreviation replacement of any component word from within a phrase (which has a corresponding acronym) renders the phrase no more replaceable with the acronym and as acronyms yield greater reduction than abbreviation of component word(s), it is desirable to prioritize acronym replacement before word abbreviation replacement.

7. Replacement of long word with its abbreviation is also common knowledge--e.g., `Coy` for `Company`. Use of such word abbreviation may be prioritized next, provided:

a) such word abbreviation is commonly used,

b) the word is not a component of a phrase or institutional name which has a corresponding acronym and

c) if the replacement with word abbreviation results in greater reduction compared to word reduction based on criterion: 10.

8. Conversion of enumeration words to a sequence of numerics results in substantial reduction without any loss of phonetic content and word recognition facility. It is desirable to prioritize such replacement immediately following replacement with commonly used acronyms and word abbreviations.

9. Shortening of words by replacement of portions within words with corresponding shorter sequence of alphabets or by selective deletion of less significant alphabets from within words other than those which are not abbreviatable in accordance with criterion 5 may be considered if the abbreviation methods, based on the preceding criteria, are inadequate. Pre-defined contiguous sequences of alphabets (comprising portions within words), excluding the initials, may be replaced with pre-defined shorter sequences (obtained by deletion of less significant alphabets therein). It is desirable to prioritize replacement of such characters sequence, immediately following conversion of enumeration words sequences.

10. Significance of any alphabet depends on loss of word recognition facility with deletion of the alphabet from within words. Such loss depends on a combination of factors:

a) Position of character--initial being the most important for word recognition and the ending character(s) being the least important.

b) Redundancy of consonant--in certain contiguous consonant occurrences one of the consonant is redundant and insignificant--e.g., `c` in `ck`, `n` in `mn` at the end of word.

c) Repetition of characters--in contiguously repeating sequence of any consonant, the occurrences in excess of one are less significant.

d) Relative phonetic significance of the character--words with the consonant(s) deleted cannot be pronounced at all; but attempts to pronounce words with the intervening vowel(s) deleted and supplying vague intervening vowel sounds approximate the complete word pronunciation--e.g., `significant` cannot be pronounced from `iiia`; but it can be pronounced from `sgnfcnt`. In this sense the intervening vowel(s) are phonetically less significant.

e) Length of the word--very short words with all the vowels deleted may become difficult to recognize--e.g., `car`, `care`, `core`, `cur`, `cure`, `curia`, `curie` and `curio`. Hence vowels deletion from within very short words results in complete loss of word recognition.

f) Contiguity of vowels--in any occurrence of intervening contiguous sequence of vowels, each of the vowels is less significant and selective deletion of only one or a few of such vowels may distort the vowel sound representation--e.g., `beautiful`. Deletion of all intervening contiguous vowels, in each such occurrence, may be preferable.

g) Relative optical prominence of characters--lower-case vowels and some consonants--e.g., c, n, r, s, v, x and z--are optically least prominent since these are without height (k), depth (g) or width (m with proportional spacing).

h) Certain occurrences of a few consonants are silent or least pronounced--e.g., `r` in `figure`.

11. Replacement of phrase or word with acronym or word abbreviation which is less commonly used may be resorted to only if:

a) word abbreviation methods, based on the preceding criteria, do not produce the desired extent of reduction and

b) if such replacement is supported with a glossary of such equivalents.

12. If a text string, fully abbreviated adopting the preceding criteria, cannot be accommodated within the space constraints, truncation of words of the text string may become necessary. It is preferable that such truncation is subject to predetermined minimum truncated word length limit and is started from the right end of the text string.

13. Truncation results in accelerated loss of word recognition facility, which can be avoided by executing abbreviation methods afresh, starting with an intellectual abstract of the input text string comprising of abbreviatable word(s), if such abstract is enclosed in pre-defined pairs of unique characters and suffixed to the input text string, by the user--e.g., while it is impossible to abbreviate the text string: "Total assets (excluding carried forward losses) net of total liabilities" to six characters without drastic truncation of words, it's intellectual abstract: "Net worth" can be so abbreviated without any truncation.

14. Though word truncation may result in loss of word recognition facility, truncated word combinations within column or row titles in pre-defined forms may be quite recognizable due to the context, repetitive use and familiarity. Though `Nt` cannot be recognized by itself as `Net`, it may be recognized as such within a truncated word combination--e.g., `NtCrAs`--more so by those familiar with the relevant information domain--e.g., investors, business managers, finance professionals and accountants.

15. In case drastic truncation is inevitable, it is preferable to prioritize truncation of words from the right end of the text string to bare initials until the desired output length limit is reached.

16. In tabulated forms and database fields, an input text string may be required to be abbreviated and placed as a row or column title of predetermined width in one or more rows, with minimal loss of word recognition facility and without breaking words, except to minimize word truncation.

17. In personal names it is customary to abbreviate all but one word (normally the surname or family name) to bare initials. A personal name may be recognized if beginning with the occurrence of a pre-defined unique word indicating title, gender, status, etc. After discrimination of personal name the title word may be optionally deleted.

18. An individual end user may have personal, editorial or knowledge domain specific preferences in accepting the aforesaid criteria for abbreviation in general and as regards predefinition of abbreviation database, choice of abbreviation options and control parameters of abbreviation methods in particular. It would be desirable to allow the user to make intelligent and optimal use of these methods without requiring any programming skills.

19. It would be desirable to devise the abbreviation procedure as a function with a comprehensive parameters list so that the procedure can be called and executed from within any spreadsheet application, database application, database management system (DBMS), any other standard or customized application or word processing application.

20. It would be desirable to provide for a pre-defined or predefinable abbreviation data file in several versions and another file of function parameter sets accessible from memory, disk drive or file server in local or wide area networks. With these provisions an individual user may conveniently and recurrently choose from several pre-defined sets of parameters (including reference to abbreviation data file version). The choice can be appropriate to the application from within which the function is called, the language, the structure or length of input text and space constraints within which the abbreviated text output is to be placed.

FIG. 1 is a block schematic diagram of a typical computer system, required to implement the preferred embodiment of this invention, consisting of a central processing unit (CPU) 011. Peripheral equipment includes storage 012, input devices such as keyboard with or without mouse 013, video display unit (VDU) 014 and printer 015. All the aforesaid equipment conform to popular standards and are well known to one of ordinary skill in the art. In future, input devices may include voice input capture devices for conversion of voice to written or printed word. The computer system or any part of it, other than the input devices and the VDU, may be shared within local area networks, wide area networks, the Internet or any other system of linked computer networks. The computer's storage 012 may consist of primary storage, such as RAM and secondary storage, such as disk drives, CD ROMs, DVDs, solid state drives and the like. The specifics as regards what data is read from or written to primary storage and/or secondary storage at each stage of processing impacts the efficiency of processing and safety of data and would be known to one of ordinary skill in the art. Therefore, this detailed description does not differentiate between the different types of storage.

FIG. 2 broadly presents the typical structures of data and programs within the storage 012 which are required to implement the preferred embodiment of this invention. These include an operating system 121 such as MS-DOS, OS/2 or Windows being popular standards, sundry standard or customized application or utility program 122, an abbreviation control data program 123, an abbreviation data file 124, a parameters set file 125, a parameters list 126, and an abbreviation function 127. Data files and data structures are represented as double lined blocks. The abbreviation control data program 123 presented in greater detail in FIG. 3 and the abbreviation function presented in greater detail in FIG. 4 may be called from within any sundry application program 122. The structure of the abbreviation data file 124 is presented as TABLE 1.

                             TABLE 1
               Structure Of Abbreviation Data File:
    File Name: AbData (or the DOS file name of the current version
       of the file used is passed from parameter: AbrDtaFN)
    Fields:
    Name        Data Type   Description
    AbAR        Integer     Abbreviation Rule numbers
    AbPhWd      String(55)  Unabbreviated phrase, word or characters
    AbAbrv      String(10)  Acronym or word abbreviation string


The structure of the parameters list of abbreviation string function 126 , in the preferred embodiment, is presented as TABLE 2. The parameters may be initiated as memory variables or included as the parameter list of an abbreviate function of the format:

ABBREVIATE(parameter list)

The Abbreviate function, in the preferred embodiment, would return:

a) In case of text string input: the abbreviated string (in single or multiple rows) and the reduction percentage of the input (Rdctn%).

b) In case of text file input: number of output lines (OutptLns) and the percentage of reduction of the input (Rdctn%).

c) Appropriate error messages, if any.

The user may pass each parameter afresh everytime the function is called. As a convenient alternative the user may choose any parameter set comprising of parameters (numbered P#=1 to 28 in TABLE 2) from a parameters set file 125 and the user may pass only the unique parameters (numbered P#=29 to 34 in TABLE 2) directly every time the function is called.

It may be possible to design the sundry application, from within which the abbreviation function of this invention is called, itself to develop:

a) the parameters: OtptL and StrRws by checking the space constraints of the location to which the output is to be supplied, while abbreviating text string and

b) the parameters: InptL and OtptL by checking the input text file named in parameter:InputFN and the output text file named in parameter:OutptFN, while abbreviating text file.

                             TABLE 2
       Structure Of Parameters List Of Abbreviate Function:
    Fields:
                                                         Valid
                                                         For
    P#  PrmtrNm     Short Description          Data Type FnctnSb
     1  FnctnSb     Function sub code:         String(1) s/t
                    `s` for abbreviating text
                    string,
                    `t` for abbreviating text
                    file to text file,
     2  OptnACc     All capitals convert to lower String(1) s/t
                    case
     3  OptnAbs     Intellectual abstraction in String(1) s
                    curly brackets usage
     4  OptnPri     Prioritized deletion/      String(1) s/t
                    truncation of round bracket
                    contents
     5  OptnPro     Protection of square bracket String(1) s/t
                    contents from abbreviation
     6  OptnAbB     Pre-defined words barred from String(1) s/t
                    abbreviation
     7  OptnTWd     Title word (preceding personal String(1) s
                    name) deletion
     8  OptnCAc     Compulsory acronym         String(1) s/t
                    replacement for phrase
     9  OptnCAb     Compulsory abbreviation    String(1) s/t
                    replacement for word
    10  OptnEWN     Enumeration words to numerics String(1) s/t
                    conversion
    11  OptnESq     Ending sequence replacement String(1) s/t
    12  OptnISq     Intervening sequence       String(1) s/t
                    replacement
    13  OptnRCd     Repeating consonant deletion String(1) s/t
                    (i.e., replacement of a
                    sequence of a contiguously
                    repeating consonant with one
                    such consonant)
    14  OptnLAd     LAdStrng based deletion    String(1) s/t
    15  OptnNAc     Need based acronym         String(1) s/t
                    replacement for phrase
    16  OptnNAb     Need based abbreviation    String(1) s/t
                    replacement for word
    17  OptnTrn     Words truncation           String(1) s
    18  OptnFnl     Text string final truncation String(1) s
    19  OptnISd     Intervening space deletion String(1) t
    20  OptnLBj     Line breaks joining        String(1) t
    21  AbrDtaFN    Abbreviation data file name - String(8) s/t
                    version specific
    22  PndStrng    String containing punctuations String(8) s
                    for deletion
    23  LAdStrng    String containing less     String(8) s/t
                    significant alphabets for
                    deletion
    24  NDSStrng    String containing non-     String(8) s
                    deletable symbols
    25  MnWdL       Minimum word length limit  Integer   s/t
    26  MnTrL       Minimum truncated word length Integer   s
                    limit
    27  MxPNWds     Maximum personal name words Integer   s
                    limit
    28  PNFWdL      Personal name first word   Integer   s
                    length limit
    29  InputFN     Input text file name       String(8) t
    30  OutptFN     Output text file name      String(8) t
    31  OtptL       Desired output length or   Integer   s/t
                    row width or output record
                    length
    32  InptL       Input record length        Integer   t
    33  StrRws      String output rows number  Integer   s
    34  InputStr    Input text string          String-   s
                                               (120)


Generally, the following rules are observed:

1) The option value--i.e., for parameters P#=2 to 20--is set to `Y`, to exercise the option, or else it is left blank, except

for OptnAbs:

a) If OptnAbs=`X`: the entire text string, including the abstract segment in curly brackets, is abbreviated. If the desired output length limit is not reached, without using truncation options, the OptnAbs is set to `Y` and abbreviation of the string is tried afresh.

b) If OptnAbs=`Y`: the text string, excluding the abstract segment, is abbreviated. If the desired output length limit is not reached, without using truncation options, the OptnAbs is set to `z` and abbreviation of the string is tried afresh.

c) If OptnAbs=`Z`: only the abstract segment is retained and abbreviated.

for OptnPri:

a) If OptnPri=`D`: prioritized segments in round brackets are deleted starting from the right end of the text string until the desired output length limit is reached.

b) If OptnPri=`I`: each word from the prioritized segments is truncated to bare initials from the right end, starting from the end of file:Shrtn, until the desired output length limit is reached.

c) If OptnPri=`T`: each word in the prioritized segments is truncated upto a predetermined minimum truncated word length limit (MnTrL) from the right end, starting from the end of file:Shrtn, until the desired output length limit is reached.

for OptnEWN:

a) If OptnEWN=`X`: any bundle word greater than thousand (Th) at the end of numeric abbreviation is retained.

b) If OptnEWN=`Y`: any bundle word at the end of numeric abbreviation is retained.

c) If OptnEWN=`Z`: the enumeration words sequence in the text string is fully converted to numerics without retaining any bundle word at the end of numeric abbreviation.

for OptnLAd:

a) If OptnLAd=`X`: less significant alphabets are deleted from the right end upto a predetermined minimum word length limit (MnWdL), excluding the last alphabet within each word from deletion.

b) If OptnLAd=`Y`: less significant alphabets are deleted from the right end upto a predetermined minimum word length limit, including the last alphabet within each word.

for OptnTrn:

a) If OptnTrn=`P`: all shortened words are truncated from the right end, such that the length of each word which is in excess of the predetermined minimum truncated word length limit is deleted in required uniform proportion, until the desired output length limit is reached.

b) If OptnTrn=`R`: shortened words are truncated from the right end, upto a predetermined minimum truncated word length limit and until the desired output length limit is reached.

for OptnFnl:

a) If OptnFnl=`Y`: each word (or basic element) from the text string is truncated from the right end, excluding bare initials of each word, pre-defined non-deletable symbols, numeric digit, decimal point and protected segment, until the desired output length limit is reached or all the words are dealt with.

b) If OptnFnl=`Z`: each word (or basic element) from the text string is truncated as in the preceding option, except that the protected segment is not excluded from truncation.

2) Valid value of MnTrL is any integer greater than 1.

3) Valid value of MnWdL is any integer greater than 1 and not less than MnTrL.

The structure of the parameters set file 125 is presented as TABLE 3.

                             TABLE 3
                Structure Of Parameters Set File:
    File Name: PSet
    Fields:
    Name          Data Type   Short Description
    PrmSetId      String(3)   Parameter Set Identification code:
                              1st character = FnctnSb
                              2nd character = any lower case alphabet
                              group indicator
                              3rd character = any numeric digit set
                              indicator read with 1st and 2nd
                              characters
    PrmComnt      String(55)  Parameter set comment
    OptnACc       String(1)   )
    OptnAbs       String(1)   )
    OptnPri       String(1)   )
    OptnPro       String(1)   )
    OptnAbB       String(1)   )
    OptnTWd       String(1)   )
    OptnCAc       String(1)   )
    OptnCAb       String(1)   )
    OptnEWN       String(1)   )
    OptnESq       String(1)   )
    OptnISq       String(1)   )
    OptnRCd       String(1)   )
    OptnLAd       String(1)   > As in TABLE 2
    OptnNAc       String(1)   )
    OptnNAb       String(1)   )
    OptnTrn       String(1)   )
    OptnFnl       String(1)   )
    OptnISd       String(1)   )
    OptnLBj       String(1)   )
    AbrDtaFN      String(8)   )
    PndStrng      String(8)   )
    LAdStrng      String(8)   )
    NDSStrng      String(8)   )
    MnWdL         Integer     )
    MnTrL         Integer     )
    MxPNWds       Integer     )
    PNFWdL        Integer     )


The abbreviation data file 124, the parameters set file 125 and the parameters list 126 are authored using the abbreviation control data program 123. Any record from the parameters set file 125 may be used as a subset (i.e., parameters numbered P# =1 to 28 in TABLE 2) of the parameters list 126.

FIG. 3 presents the abbreviation control data program 123 in greater detail. It consists of a menu 1230 allowing choice of abbreviation data file authoring program 1231, parameter sets file authoring program 1232 and parameters list authoring program 1233. The abbreviation data file authoring program 1231 uses a data capture form structure for on-screen display to create and update several versions of the abbreviation data file 124. Relevant details pertaining to abbreviation rules (number and description) and data validation rules for each field are presented as TABLE 4. A typical abridged version of the abbreviation data file is presented as TABLE 5. A complete and system fixed list of enumeration words with abbreviations, being a subset of Abbreviation Data File with AbAR=2 is presented as TABLE 6. The program is not described further, being a routine matter well known to one of ordinary skill in the art.

                                            TABLE 4
                     Abbreviation Rule Numbers, Description And Field Data
                          Validation Rules for Abbreviation Data File:
    Fields:
    Ab  Abbreviation               AbPhWd                         AbAbrv
    AR  Rule Description           Validation Rules               Validation
     Rules
    1   Words barred               Any word comprising entirely of an
        from abbreviation          upper case 1st alphabet followed
                                   by lower-case alphabet(s)
    2   Conversion of single words Any word comprising entirely of Any numeric
     digit, alphabet
        representing numbers up to alphabets representing numbers, characters
     or combination
        hundred and bundle words   bundles and derivatives such as (system
     defined)
        (such as thousand, million etc.) First from one or Tenth
        to abbreviations           from ten (system defined)
    3   Deletion of title word     Any word or abbreviation
        representing status and/or popularly used in personal name
        gender in personal name
    4   Compulsory replacement of  Any phrase or series of words  Any
     appropriate system formed
        phrase with acronym        with initials capitalised,     or user
     edited acronym
                                   intervening spaces deleted
                                   and length <=55
    5   Compulsory replacement of  Any word of 3 to 25 characters Any
     appropriate abbreviation
        word with abbreviation     other than `And`, `Point`      to yield at
     least 25% reduction
                                   and `Zero`                     for words
     containing <= 4
                                                                  characters
     and at least 40%
                                                                  reduction for
     longer words
    6   Need based replacement     Any phrase or series of words with Any
     appropriate system formed
        of phrase with acronym     initials capitalised, intervening or user
     edited acronym
                                   spaces deleted and length <=55
    7   Need based replacment of   Any word of 3 to 25 characters Any
     appropriate abbreviation
        word with abbreviation     other than `And`, `Point`      to yield at
     least 25% reduction
                                   and `Zero`                     for words
     containing <= 4
                                                                  characters
     and at least 40%
                                                                  reduction for
     longer words
    8   Need based replacment of   Any sequence of                Blank OR any
     appropriate lower-
        ending sequence of characters lower-case characters          case
     shorter sequence to yield
        in word with a shorter sequence                                at least
     a 50% reduction
    9   Need based replacement of  Any sequence of                Blank OR any
     appropriate lower-
        intervening sequence of    lower-case characters          case shorter
     sequence to yield
        characters in word with a                                 at least a
     50% reduction
        shorter sequence
    Suggested cautions:
    a) Do not permit user editing of system defined records with AbAR = 2.
    b) Do not permit duplication of field: AbPhWd entries between records with
     AbAR = 4 and 6; and between records with AbAR = 2,5 and 7.
    c) Provide for change of AbAR = 4 to AbAR = 6, AbAR = 5 to AbAR = 7 and
     vice versa.
    d) Provide for creation and editing of several versions of file: AbData
     appropriate to each usage domain or individual preference, with unique DOS
     file names passed as parameter: AbrDtaFN.


TABLE 5 Abridged Version Of Typical Abbreviation Data File: Fields: AbAR AbAbrv AbPhWd 1 Abraham 1 Lincoln 2 Bn Billion 2 Cr Crore 2 8 Eight 2 18 Eighteen 2 80th Eightieth 2 80 Eighty 3 Mr 3 Mr. 3 Mrs 3 Dr 4 BS BalanceSheet 4 Fed FederalReserve 4 P&L ProfitAndLoss 4 PAT ProfitAfterTax 5 # Number 5 Tue Tuesday 5 Coy Company 5 Corp Corporation 5 $ Dollar 5 Govt Government 5 Inc Incorporated 5 % Percent 6 ADN AnyDayNow 6 BTW ByTheway 6 FYI ForYourInformation 7 Doc Document 7 Spdt Superintendant 8 k ck 8 g ing 8 mt ment 8 nt nent 8 nt nant 9 m mn 9 k ck 9 g ing 9 mt ment 9 nt nent

TABLE 6 Complete And System Fixed List Of Enumeration Words (with abbreviations, being a subset of Abbreviation Data File with AbAR = 2) Fields: AbPhWd AbAbrv Billion Bn Crore Cr Eight 8 Eighteen 18 Eighteenth 18th Eighth 8th Eightieth 80th Eighty 80 Eleven 11 Eleventh 11th Fifteen 15 Fifteenth 15th Fifth 5th Fiftieth 50th Fifty 50 First 1st Five 5 Fortieth 40th Forty 40 Four 4 Fourteen 14 Fourteenth 14th Fourth 4th Hundred 00 Hundredth 00th Lac Lc Lakh Lk Million Mn Nil 0 Nine 9 Nineteen 19 Nineteenth 19th Nineth 9th Ninetieth 90th Ninety 90 One 1 Quadrillion Qd Second 2nd Seven 7 Seventeen 17 Seventeenth 17th Seventh 7th Seventieth 70th Seventy 70 Six 6 Sixteen 16 Sixteenth 16th Sixth 6th Sixtieth 60th Sixty 60 Ten 10 Tenth 10th Third 3rd Thirteen 13 Thirteenth 13th Thirtieth 30th Thirty 30 Thousand Th Three 3 Trillion Tr Twelfth 12th Twelve 12 Twentieth 20th Twenty 20 Two 2 Note: In some Asian countries One Hundred Thousand is reckoned as a Lakh (Lac), One Hundred Lakh is reckoned as a Crore and hence One Hundred Crore is equivalent to One Billion.


The parameters set file authoring program 1232 uses a data capture form structure for on-screen display to create and update the parameters set file 125 covering parameters as presented in TABLE 3. The program is not described further being a routine matter well known to one of ordinary skill in the art. The parameters list authoring program 1233 uses a data capture form structure for on-screen display to create the parameters list 126. The program is not described further being a routine matter well known to one of ordinary skill in the art.

FIG. 4 presents in greater detail the abbreviation function 127, which is the crux of this invention. This presentation shows all subroutines, memory variables set, shortening file and reduction scope file, which files are used by the subroutines. Data files and data structures are represented as double lined blocks. Execution of the abbreviation function starts with the Main subroutine 1271 which in turn calls the Shorten subroutine 1272. The Shorten subroutine may recursively call the Move subroutine 1273, Match subroutine 1274, Replace subroutine 1275 and Separate subroutine 1276. Apart from the abbreviation data file 124 and the parameters list 126, which are accessed by the abbreviation function 127, the subroutines also access the memory variables set 1280 and the shortening file 1281. The reduction scope file 1282 is accessed only from the Replace subroutine 1275. The input file 1283, the processed input file 1284 and the output file 1285 are used from within the Main subroutine while abbreviating text file inputs only and not while abbreviating text string or word input. The memory variables set, comprises individual variables described within each method hereinafter and other variables that may be required to control the execution of conditional, sequenced and/or recursive steps of the methods of the abbreviation function depending on the programming details well known to one of ordinary skill in the art. The structures of the shortening file 1281, the reduction scope file 1282, the input file 1283, the processed input file 1284 and the output file 1285 are presented in TABLEs 7 to 11 hereinafter.

The shortening file 1281 has fields which are structured to:

a) hold each word (or basic element), separated from the word separation and processing string, in separate records in original sequence (i.e., in sequence of field:ShSq),

b) hold, along with the first word of phrase, less commonly used matched acronym for need based replacement; or hold, along with the word, less commonly used word abbreviation for need based replacement,

c) hold indication if each word (or basic element) has reduction scope (i.e., open for reduction, by acronym or word abbreviation replacement or phonetic shortening),

d) hold indication if each word (or basic element) is covered by any of the abbreviation methods, rule numbers or category numbers--number greater than zero indicating that the word is not open for abbreviation or truncation except in the last step of text string truncation, if required,

e) hold indication if each word originally had the initial in capital letter and if it had any of its other alphabets capitalized by the system to control the phonetic shortening methods and

f) in general facilitate execution of the methods of this invention within shortening file until the desired output length limit is reached or each word (or basic element) is dealt with.

The Shorten subroutine 1272 calls the Match subroutine 1274 wherein the abbreviation data file 124 is searched for acronyms or word abbreviations corresponding to phrases or words contained in the input text string. The commonly used acronyms or word abbreviations, if found, are replaced compulsorily and other acronyms or word abbreviations, if found, are held in corresponding records of shortening file 1281 for need based replacement at a later stage. The Shorten subroutine 1272 calls the Replace subroutine 1275 for need based replacement of phrases or words with acronyms or word abbreviations using the reduction scope file 1282 to keep track of reduction scope length of the acronyms or word abbreviations found and held for need based replacement. The records in reduction scope file are sequenced in the descending order of reduction scope length, the objective being to achieve the required reduction with the least number of need based replacements in the records of the shortening file as referenced from the first few records of the reduction scope file.

If the abbreviated output is required to be placed in multiple row column widths, the Shorten subroutine 1272 calls the Separate subroutine 1276 to separate the input text string into required number of portions without splitting words (except to minimize word truncation) before shortening the portions. Thereafter each portion is duly abbreviated to the desired output length limit.

Generally, abbreviation of multiple line text requires additional files--namely an input file 1283 from which the input records are first read, a processed input file 1284 into which the input records are copied with control data to keep track of line breaks, paragraph breaks, blank lines and indentation and an output file 1285 to which the abbreviated records are written. The pattern of line breaks, blank lines and indentation are reproduced in the output file 1285, if required. Optionally line breaks and blank lines are joined to save display space, indicating the joints with one or two `@` character(s).

If a single line text string is being abbreviated, the function returns the output string. If a multiple line text in a file is being abbreviated, each abbreviated output string is accumulated upto the predetermined output record length, reckoned in terms of monospace or proportional spacing, and each accumulated record is added to the pre-defined output text file 1285.

                             TABLE 7
                  Structure Of Shortening File:
                         File Name:Shrtn
          Fields:
    Name        Data Type   Description
    ShSq        Integer     Record sequence number.
    ShSWrd      String (26) Word (or basic element) for abbreviation
                            or commonly used acronym or word
                            abbreviation replacement (after OptnCAc
                            & OptnCAb are used).
    ShAbrv      String (10) Less commonly used acronym or word
                            abbreviation held for need based
                            replacement later.
    ShRS        Integer     Reduction scope indicator (only 0, 1, 6 or
                            7 being valid). ShRS = 0 indicates that
                            field:ShSWrd is not open for reduction.
                            ShRS is set to 6 or 7, if less commonly
                            used acronym or abbreviation is held in
                            ShAbrv.
    ShAR        Integer     Abbreviation rule indicator - default
                            value being zero. AbAR numbers 1-5, are
                            copied directly from file:AbData, as
                            applicable and numbers 6 & 7 are copied
                            from ShRS after need based replacement
                            of less commonly used acronym or word
                            abbreviation. All non-abbreviatable
                            words (or basic elements) are numbered
                            20. Protected segment is numbered 22
                            ShAR > 0 indicates that the word is not
                            open for word truncation, except in the
                            last stage of text string truncation,
                            if required.
    ShCap       Integer     Indicating original case status of
                            initial of word or capitalization of
                            other alphabets of word to control
                            phonetic shortening methods (only 0, 1,
                            10 or 11 being valid).
    In the preferred embodiment, the following integer variables, derived from
     the field value(s) in this file, are used:
    a) SwrdLen = Number of characters contained (excluding trailing space(s))
     in field:ShSWrd of each record in file:Shrtn.
    b) AbrvLen = Number of characters contained (excluding trailing space(s))
     in field:ShAbrv of each record in file:Shrtn.
    c) TotLen = Sum of SWrdLen of all records in file:Shrtn.


TABLE 8 Structure Of Reduction Scope File: File Name:Scope Fields: Name Data Type Description ScAcAbRS Integer Need based acronym or word abbreviation replacement reduction scope length ScSq Integer Sequence number copied from ShSq of file:Shrtn

TABLE 8 Structure Of Reduction Scope File: File Name:Scope Fields: Name Data Type Description ScAcAbRS Integer Need based acronym or word abbreviation replacement reduction scope length ScSq Integer Sequence number copied from ShSq of file:Shrtn

TABLE 8 Structure Of Reduction Scope File: File Name:Scope Fields: Name Data Type Description ScAcAbRS Integer Need based acronym or word abbreviation replacement reduction scope length ScSq Integer Sequence number copied from ShSq of file:Shrtn

TABLE 11 Structure Of Output File: File Name:The DOS file name of the current file used is passed from parameter:OutptFN. Fields: Name Data Type Description OtRcord String (120) Output text file record


The subroutines are presented in greater detail in flow chart format in FIGS. 5 to 8 and 9 to 10.

Separate control panel formats for text string abbreviation and text file abbreviation are presented in TABLES 12 and 13. The panels may be used for capturing user's choice of data file version, options or other control parameters or to display or print these choices, if required.

                                                TABLE 12
                                   Text String Output Control Panel:
                           String Output: PrmSetId = sal, AbrDtaFN = abdata2
    ACc   Abs   Pri   Pro   AbB   TWd   CAc   CAb   EWN   ESq   ISq   RCd   LAd
       NAc   NAb     Trn   Fnl
    Y     XYZ   DIT   Y     Y     Y     Y     Y     XYZ   Y     Y     Y     XY
       Y     Y       PR    YZ
    PndStrng  LAdStrng  NDSStrng  MnWdl   MnTrL   MxPNWds     PNFWdL    OtptL
     StrRws
    ,;:       aeiour    #$%+-@/   03      02      03          08        25
     01


TABLE 12 Text String Output Control Panel: String Output: PrmSetId = sal, AbrDtaFN = abdata2 ACc Abs Pri Pro AbB TWd CAc CAb EWN ESq ISq RCd LAd NAc NAb Trn Fnl Y XYZ DIT Y Y Y Y Y XYZ Y Y Y XY Y Y PR YZ PndStrng LAdStrng NDSStrng MnWdl MnTrL MxPNWds PNFWdL OtptL StrRws ,;: aeiour #$%+-@/ 03 02 03 08 25 01


The basic embodiment of the invention with some variations is designed to abbreviate text string or word and with other variations to abbreviate text file. These variations are explained with reference to each method or step of the invention hereinafter. Each method of the invention is numbered 1 to 33 and in case any method comprises a plurality of steps, each such step is designated with a unique lower-case alphabet suffix to the method number. The one or two digit numeric designating the method or the numeric-alphabet designating a method-step is used as the reference character in the flow charts (i.e., in FIGS. 5 to 8 and 9 to 10). These designating reference characters are placed at the end of relevant method description or method-step statement, as the case may be.

The methods that are used in the preferred embodiment, numbered 1 to 33, are as follows:

Method 1: Creation of Abbreviation Data File

The system provides for creation and editing of abbreviation data file 124, by the user or developer using the file structure described in TABLE 1 hereinbefore with valid inputs as specified in the format presented in TABLE 4. The method of creation and editing is not described further, being a routine matter known to one of ordinary skill in the art.

Method 2: Creation of Parameters Set File

The system requires creation and editing of parameters set file 125, by the user, with valid inputs into the file structure described in TABLE 3 hereinbefore. Several parameter sets may be pre-defined and stored in the file for selective use as and when the function is called. The method of creation and editing is not described further, being a routine matter known to one of ordinary skill in the art.

Method 3: Generation of Complete Parameter List

A user may be allowed to develop the parameter list by passing each parameter afresh each time the function is called. For convenience and consistency of results the user may be enabled to define instantly the unique parameters--i.e., input text DOS file name (InputFN), output text DOS file name (OutptFN), output length (OtptL), input length (InptL), number of string output rows (StrRws) and input text string (InputStr)--and to choose any parameter set from the parameter sets file 125 to complete the parameters list 126 everytime the function is called. With this the function is ready for execution.

Method 4: Start of Main Program

This method is illustrated in FIG. 5

The abbreviation methods of the invention use related entries in records of file:AbData and the pre-defined characters passed in the parameters:PndStrng, LAdStrng and NDSStrng. The parameter list also includes certain parameters which indicate to the system what methods or control features the user has opted for. However, even if the user chooses certain options, in the absence of related entries in file:AbData and parameters:PndStrng, LAdStrng and NDSStrng, the options are not effective. To prevent wasteful processing, the system, checks for and blanks the `empty` options, at the outset. A backup of the control parameters is made so that later the original values can be restored, if required 4.

For abbreviation of text file (i.e., if parameter:FnctnSb=`t`): the system skips to Method 10.

Method 5: String Initial Steps, Separation and Movement to WrdStrng

This method is illustrated in FIGS. 5, 15 and 16.

An input text string may be abbreviated into a single row string or a string comprising multiple rows. Abbreviation of input text string into multiple row string is covered by two options:

i) Manual separation--i.e., delimitation of input text string into several portions by the user using a unique row separator character (i.e., vertical bar `.vertline.`), before calling the abbreviate function.

ii) Automated system separation of input text string into predetermined number of rows (StrRws).

In the former option each separated portion is processed separately to the desired row width (OtptL). Each separate abbreviated output is then accumulated into a string (CumStrng) separated with system supplied row separator(s).

In the latter option the unseparated input text string is first processed as a whole upto and including Method 16, before system placement of row separators (using Methods 17 and 31). The number of row separators is one less than the desired number of several rows (StrRws).

For the control of processing of input text string as a whole before system placement of row separators, it is necessary to set desired output length limit, OutL=OtptL*StrRws. After placement of row separators, input text string is recycled back to Method 5 and each portion of the string is processed separately to the desired row width (OtptL) as in the former option. In this latter option also, each separate abbreviated output is accumulated into CumStrng, separated with system supplied row separators.

In the initial steps, the contents of input text string parameter (InputStr) are moved to a separate input processing string (InpStrng) in which the initial steps of abbreviation are executed. If option for conversion of all capital letters input to lower case is chosen (i.e., OptnACc=`Y`) and if all alphabet characters are capital letters all capital letters are converted to lower case. All apostrophes are deleted from the InpStrng.

The initial steps are concluded by moving the whole or each separate portion of InpStrng to word (or basic element) separation and processing string (WrdStrng), left justified. The WrdStrng is also copied to WrdStrngC so that in case the processing has to be aborted and tried afresh, the WrdStrng is available in original form.

The specific steps of the method, designated (a) to (f), are:

a) Blanking CumStrng;

Copying InputStr to InpStrng and setting OutL=OtptL, if StrRws=1; Else OutL=OtptL*StrRws

If OptnAbs is not blank: Setting OptnAbsC=OptnAbs;

Note: This is done to hold a copy of the parameter value intact while OptnAbs parameter value may be changed from `X` to `Y` or `Y` to `Z`. The need for change in OptnAbs parameter value arises, if OptnAbs=`X` or `Y` and the InpStrng cannot be abbreviated to the OutL limit without resorting to word truncation.

Similarly, setting OptnPriC=OptnPri and OptnProC=OptnPro to have backups in case the paramater values are changed in process 5a.

b) If OptnACc=`Y` and InpStrng has all alphabets in capitals: converting InpStrng to lower case 5b.

Note: The abbreviation methods are ineffective in any input text string which consists of all capital letters, unless OptnACc=`Y`.

c) Deleting apostrophe from InpStrng 5c.

d) Locating within InpStrng row separator--i.e., vertical bar character `.vertline.`--and, if found, setting StrRws=1 & OutL=OtptL; and moving each separate portion of InpStrng to WrdStrng 5d.

e) If row separator is not found, then moving the whole of InpStrng to WrdStrng, left justified 5e.

f) Setting WrdStrngC=WrdStrng 5f.

Method 6: String Brackets Handling

This method is illustrated in FIGS. 5, 13 and 15.

For abbreviating text string, delimited segment options include:

i) OptnAbs (abstract segment):

A matched pair of unique delimitation characters--i.e., curly brackets--not nested within any matched pair of unique characters, containing a substantially and intellectually concised abstraction (comprising one or more words--preferably abbreviatable) of the text string and suffixed to that string for:

a) If OptnAbs=`X`: abbreviating the entire string, including the portions contained within curly brackets. If the desired output length limit (OutL) is not reached without resorting to word truncation, OptnAbs is set to `Y` and abbreviation of the string is tried afresh.

b) If OptnAbs=`Y`: retaining and abbreviating the string, excluding the curly brackets and their contents. If OutL limit is not reached without resorting to word truncation, OptnAbs is set to `Z` and abbreviation of the string is tried afresh.

c) If OptnAbs=`Z`: retaining and abbreviating only the contents of the curly brackets.

ii) OptnPro (protected segment):

A matched pair of unique delimitation characters--i.e., square brackets--not nested within any other matched pair of square or round brackets, delimiting the segment(s) of string to be protected from abbreviation until final truncation (i.e., Method 27)

iii) OptnPri (prioritized segment):

A matched pair of unique delimitation characters--i.e., round brackets--not nested within any other matched pair of square or round brackets, delimiting the segments of string to be prioritized for:

a) If OptnPri=`D`: deleting completely.

b) If OptnPri=`I`: truncating words to bare initial, excluding pre-defined non-deletable characters.

c) If OptnPri=`T`: truncating words upto a predetermined minimum truncated word length limit (MnTrL), excluding pre-defined non-deletable characters.

The specific bracket handling steps for text string abbreviation, designated (a) to (c), are:

a) Blanking the bracket character if corresponding delimited segment option is not chosen--i.e., curly brackets for OptnAbs, square brackets for OptnPro and round brackets for OptnPri; and blanking all occurrences of unmatched brackets; and blanking any bracket character found within a pair of matched curly brackets; and blanking any bracket character found within a pair of outer most (round or square) matched brackets 6a.

Note: This is done to give precedence to the outer pair of brackets.

b) Retaining entire WrdStrng or portion for abbreviation, if containing pair of curly brackets as follows 6b:

If OptnAbs=`X`: entire WrdStrng, left justified.

If OptnAbs=`Y`: after deleting the curly brackets and contents from WrdStrng, left justifying the remaining portion(s).

If OptnAbs=`Z`: after deleting all but the contents of curly brackets from WrdStrng, left justifying the remaining portions and setting OptnAbs=blank.

c) If OptnPri=`D` and if matching pair(s) of round brackets found: deleting the round bracket pair(s) and contents (one pair and contents at a time) from the right end, until OutL limit is reached 6c.

Method 7: Deletion of Punctuation

This method is illustrated in FIGS. 6, 13, 15 and 16.

Occurrence of punctua