Information processing analysis system for sorting and scoring text5371673Abstract A method and system for text analysis provides that text messages perceived by a population can be scored to determine the extent to which the messages favor one or more specified positions on a specified issue. A method and system for predicting public opinion based on message scores provides that the extent to which messages favor one or more specified positions can be used to determine the effect on the opinions of a specified population and to determine changes in the percentages of the percent of subpopulations within said specified population which favor said one or more specified positions. Claims I claim: Description TECHNICAL FIELD OF THE INVENTION
TABLE 1
______________________________________
Concept array for text analysis dictionary
Concept mnemonic
Concept symbol
______________________________________
AmericaWord A
DefenseWord d
SpendingWord s
AmerDefense b
AmerDefSpend +
NonPrefix n
______________________________________
The dictionary array of the text analysis dictionary is comprised of a number of specified words. Each word is associated with one of the concepts in the concept array. Therefore, each word has a corresponding concept mnemonic and concept symbol. Returning to the example of the text filtration for "American defense spending," the dictionary array might include the entries in Table 2. Here, only the concept symbol is given since the concept mnemonic can be read from Table 1.
TABLE 2
______________________________________
Dictionary array for text analysis dictionary
Dictionary word Concept symbol
______________________________________
America A
U.S. A
defense d
funding s
budget s
Marine b
non n
______________________________________
In this example, the concept of "American" would have concept symbol "A" and concept mnemonic "AmericaWord." The words "America" and "U.S." could both belong to this concept. However, not every concept need have corresponding words in the dictionary. For example, the concept of "American defense spending" could have concept symbol "+" together with concept mnemonic "AmerDefSpend" and yet not be represented by any single word. (II-A3) Besides loading the dictionary, the computer loads the text transformation rules from disk into memory. These rules are comprised of: (a) Initiation and termination signals for "blocks of text." These signals are strings of characters. The computer scans for the first occurrence of an initiation signal after the decimal date. The computer marks this as the beginning of a block of text. The computer marks the first following termination signal as the end of the that block of text. The block of text itself is the text between the two signals. For the sample edited text of step I-3 above, it is possible to discard irrelevant paragraphs in which case the appropriate blocks of text would be paragraphs. Therefore, the "!" would be the only permissible initiation signal since all paragraphs in the edited text begin with this symbol. However, there would be two possible termination signals, the character "!" marking the beginning of a following paragraph and the end of story marker described in step II-A5 below. Therefore, the "!" can mark both the end of one block of text and the beginning of the next. It is also possible to use other initiation and termination signals. For example, the "!" signal marking the beginnings of paragraphs could still be used as the initiation signal for discarding entire stories. In this case, the end of story maker would be the only permissible termination signal. Intervening appearances of "!" would not cause the block of text to end. (b) Specified carryover symbols. These symbols are a subset of all the concept symbols of step II-A2 above (see step II-A18 below for their use). (c) Text equivalent transformation rules. These rules are a set of individual transformation rules (see step II-A15 below for their use). (II-A4) Besides the text analysis dictionary and text transformation rules, the computer also loads the text filtration rules from disk into memory. These rules are comprised of: (a) a set of "text discard symbols" with each element comprised of a concept symbol which would lead to discarding a block of text, and (b) a set of "text retention symbols" with each element comprised of a concept symbol which would lead to retaining a block of text. (II-A5) After loading the dictionary and corresponding text transformation rules, the computer loads the text of a specified story into memory, placing an end of story mark in the memory unit following the end of the story. In a disk file containing more than one story, the "*" symbol just before the decimal date of the next story (see example of step I-3) would delimit the end of the previous story. Alternatively, the end of the disk file would signal the end of a story. (II-A6) The computer reads the "*" and decimal date at the beginning of a story and writes this information to two separate disk files, a "full text output file" and a "filtered text output file." The computer then scans the text until it encounters the first initiation signal and places a "current begin block of text" marker at the beginning of this signal. (II-A7) The computer also places a "current position" marker at the next character. In the example in step I-3 above, the mark is on the quote sign immediately after the first "!". (II-A8) The computer marks the first word in the text analysis dictionary as the "current dictionary word." (II-A9) The computer prepares a "candidate word" comprising of a string of characters in the text of the story beginning with the current position mark and extending into the text until the length current dictionary word is reached or until a termination signal is encountered. The computer compares the candidate word with the current dictionary word. In making the comparisons, all letters are reduced to the lower case. Carriage returns and line feeds are considered to be the equivalent of a space. (II-A10) If there is no match in step II-A9 above, the computer moves the current dictionary word marker to the next word in the dictionary and repeats step II-A9. This repetition is continued until either a match is found or the end of the dictionary is reached. (II-A11) If there is a match in either step II-A9 or step II-A10 above, the computer inserts into the text the following three characters just before the position of the match: (a) the control character "<" indicating the arrival of a concept symbol, (b) the concept symbol of the matching dictionary word, and (c) the control character ">" indicating the end of the concept symbol. (II-A12) After step II-A11, the computer advances the current position marker to the next character in the text and repeats steps II-A8 to II-A11. The computer continues this repetition until a termination signal is reached. The text between the current begin block of text marker and this termination signal, including the character additions of step II-A11, is considered to be the "current block of text." At this point, the current block of text from the example of step I-3 above is: !"<s>Funding for the <b>Marines and <n>non-<d>defense items should not be cut," he said. The computer writes the current block of text to the display monitor to permit the system operator to follow the progress of the text analysis. (II-A13) The computer constructs a "text equivalent" for the current block of text comprised of an alternating series of elements: a "diagnostic symbol" and a "diagnostic distance:" (a) The "diagnostic symbol" is comprised of either the symbol "*" denoting the beginning of the block of text or a concept symbol inserted in step II-A11 above. (b) The "diagnostic distance" is an integer equivalent to the number of characters between two concept symbols in the block of text, between a concept symbol and the beginning or end of the block of text, or between the beginning and end of the block of text if no concept symbols were placed in the block of text. The computer constructs the text equivalent, stores it memory and displays it on the monitor as follows: (a) The computer writes to the monitor the diagnostic symbol "*" indicating the beginning of the block of text. The computer then writes a space. At this point, the computer starts at the beginning of the block of text, after the initiation signal, and counts characters up to but not including the first "<" character or the termination signal for the block of text, whichever is reached first. This number of characters is the first diagnostic distance. The computer writes the diagnostic distance followed by a space. (b) If a "<" has been reached, the computer then writes the next character, the concept symbol of the following word, as next diagnostic symbol. The computer again writes a space and skips the following ">" and counts characters to the next "<" or termination signal. The computer then enters that number as the next diagnostic distance. (c) Step b immediately above is then repeated until the termination signal is reached. (d) The end of the text equivalent is delimited by a carriage return followed by a line feed. The text equivalent for the block of text in step II-A12 is: * 1 s 16 b 12 n 4 d 42 (II-A14) The computer transforms the text equivalent based on the text equivalent of the previous block of text in a story. If there is no prior block, this step is skipped. If there is a previous block, the computer consults the list of matching carryover symbols generated in step II-A16 below. The computer adds any matching carryover symbols followed by the diagnostic distance of zero to the end of the current text equivalent. This step permits the implied meaning in a previous block of text to be transferred to the current block. (II-A15) After any additions to the text equivalent in step II-A14, the the computer makes transformations on the resulting text equivalent following the text equivalent transformation rules of step II-A3. Each rule is comprised of the following elements: (a) a "specified operator symbol" which can be any one of the concept symbols in the text analysis dictionary, (b) a "specified target symbol" which can also be any one of the concept symbols in the text analysis dictionary or the reserved symbol "$," (c) a "specified direction" indicated by one of the three letters "A, B, E" (A for ahead, B for behind, and E for either), (d) a "specified distance" indicated by an integer, (e) a "specified decision symbol" which can be one of the concept symbols in the text analysis dictionary or the reserved symbol "%," and (f) a Boolean "specified operator retention" variable. The computer applies individual rules in their order of entry into memory as follows: (a) The computer scans the text equivalent until a diagnostic symbol matches the specified operator symbol. This matching diagnostic symbol is marked as the "matching operator symbol." (b) If a matching operator symbol is found, the computer marks diagnostic symbols in the text equivalent as "matching target symbols" based on the specified target symbol, distance and direction as follows: (i) If the specified target symbol is "$," the computer marks the matching operator symbol as a matching target symbol. (ii) If the specified target symbol is not "$," the specified distance and specified direction are used as follows: (1) If the specified distance is zero, then the computer compares the matching operator entry with the specified target symbol. If a match is found, the computer marks the matching operator symbol as a matching target symbol. (2) If the specified distance is less than zero and the specified direction is "A," then the computer marks, as matching target symbol, all diagnostic symbols matching the specified target symbol if these symbols are ahead of the matching operator symbol in the text equivalent. (3) If the specified distance is less than zero and the specified direction is "B," then the computer marks, as matching target symbol, all diagnostic symbols matching the specified target symbol if these symbols are behind the matching operator symbol in the text equivalent. (4) If the specified distance is less than zero and the specified direction is "E," then the computer marks, as matching target symbols, all diagnostic symbols in the text equivalent matching the specified target symbol. (5) If the specified distance is greater than zero and the specified direction is "A," then the computer marks, as the matching target symbol, the diagnostic symbol immediately ahead of the matching operator symbol in the text equivalent if the following symbol matches the specified target symbol and if the intervening diagnostic distance is less than the specified distance. (6) if the specified distance is greater than zero and the specified direction is "B," then the computer marks, as the matching target symbol, the diagnostic symbol immediately behind the matching operator symbol in the text equivalent if the preceding symbol matches the specified target symbol and if the intervening diagnostic distance is less than the specified distance. (7) If the specified distance is greater than zero and the specified direction is "E," then the computer marks, as the matching target symbol, the diagnostic symbol just ahead of the matching operator symbol in the text equivalent if the following symbol matches the specified target symbol and if the intervening diagnostic distance is less than the specified distance. If this following symbol is not marked, then the computer marks the diagnostic symbol in the text equivalent just behind the matching operator symbol as the matching target symbol if the prior symbol matches the specified target symbol and if the intervening diagnostic distance is less than the specified distance. (c) Transformations are performed on the text equivalent based on matching target symbols as follows: (i) If the matching target symbol is "$," then: (1) If the specified direction is "B," the computer inserts the specified decision symbol and then the specified distance into the text equivalent just prior to the matching target symbol. (2) If the specified direction is "A," the computer inserts the specified distance and then the specified decision symbol into the text equivalent just after to the matching target symbol. (ii) If the matching target symbol is not "$," then: (1) If the specified decision symbol is "%," then the specified target symbol and its following diagnostic distance are both deleted from the text equivalent, (2) If the specified decision symbol is not "%," then the matching target symbol is replaced by the specified decision symbol. (iii) If the specified operator retention variable is FALSE, then the matching operator symbol and its following diagnostic distance are both deleted from the text equivalent. These transformations are now illustrated using the text equivalent of step II-A13 and three sample text equivalent transformation rules. (a) Rule 1: Specified operator symbol=n Specified target symbol=d Specified direction=A Specified distance=10 Specified decision symbol=% Specified operator retention variable=FALSE This rule has the function of removing, from the text equivalent, references to non-defense matters. Application of this rule to the text equivalent of step II-A13, repeated here, * 1 s 16 b 12 n 4 d 42 yields * 1 s 16 b 12. In applying rule 1, the computer marks the "n" in the text equivalent as the matching operator symbol. Since the specified direction is "A," the computer examines the diagnostic symbol after the "n" and finds a match with the specified target symbol of "A." The intervening diagnostic distance of 4 is also less than the specified distance of 10. Since these two criteria are both met, the computer marks the diagnostic symbol "d" as the matching target symbol. Since the decision symbol is "%," the matching target symbol and its following diagnostic distance of 42 are both deleted. Then, because the specified operator retention variable is FALSE, the computer also deletes the matching operator symbol "n" and its following diagnostic distance of 4. (b) Rule 2: Specified operator symbol=b Specified target symbol=s Specified direction=B Specified distance=-1 Specified decision symbol=+ Specified operator retention variable=TRUE Application of this rule to the previously transformed text equivalent * 1 s 16 b 12 yields * 1+16 b 12. In this transformation, the computer finds and marks the matching operator symbol "b" in the text equivalent. Since the specified direction is "B" with the meaning of either ahead or behind and since the specified distance is less than zero, the computer searches for all occurrences of the target symbol anywhere in the text equivalent. The computer finds the matching target symbol "s." Since this was not the reserved symbol "%," the computer replaces the matching target symbol "s" with the specified decision symbol "+." Since the specified operator retention variable is TRUE, the matching operator symbol is retained in the text equivalent. In this way, it has been possible to combine the meanings of "American defense" represented by the "b" and "spending" represented by the "s" to give the meaning of "American defense spending" represented by the "+." (c) Rule 3: Specified operator symbol=+ Specified target symbol=$ Specified direction=B Specified distance=0 Specified decision symbol=A Specified operator retention variable=TRUE Application of this rule to the previously transformed text equivalent * 1+16 b 12 yields * 1 A 0+16 b 12. In this transformation, the computer finds the matching operator symbol "+." Since the target symbol is the reserved symbol "$" and since the specified direction is "B," the computer marks the matching operator symbol as the matching target symbol and inserts the specified decision symbol "A" and the specified diagnostic distance of 0 just before the target entry. Rule 3 has permitted the extraction of individual meanings from concept symbols embodying several different concepts. In the example presented above, the concept of "America" denoted by the concept symbol "A" could be written explicitly in the text equivalent since "America" was already a part of the concept of "American defense spending" denoted by the concept symbol "+." In the dictionary of Table 2, the word "Marine" alone was interpreted to embody the meaning of "American defense." Therefore, the concept of "America" was always implied although not explicitly stated as a separate word. Even though Marine can refer to other concepts besides a subset of the American military. It is possible to let Marine have this connotation, if the set of collected AP dispatches almost always uses this word in this context. Therefore, ambiguous words can have a precise meaning if text containing other meanings is eliminated. As a general rule, the text transformation rules lead to a series of transformations resulting in a final text equivalent containing concept symbols representing directly the concepts used for the decisions for a block of text. (II-A16) After performing all the transformations specified by the text equivalent transformation rules, the computer makes a list of matching carryover symbols comprising of all concept symbols which are in both the list of specified carryover symbols (entered at step II-A3) and in the final text equivalent. As an example, consider symbol "A" being in the list of carryover symbols. In this case, the computer would enter this symbol in the list of matching carryover symbols if the text equivalent is the final one in the previous step, namely, * 1 A 0+16 b 12. The use of "A" as a specified carryover symbol permits the connotation of "American" to be transferred from one block of text to the next. This designation is desirable when it is not certain that the next block of text would refer to "America" unless the previous block of text had this implication. (II-A17) The computer checks the text equivalent for any diagnostic symbols which match the list of retention symbols. If any are found, the computer sets the Boolean block retention variable as TRUE. Otherwise the block retention variable is set as FALSE. The computer then checks the text equivalent for any diagnostic symbols matching the list of discard symbols. If any matches are found, the block retention flag is set as FALSE. The computer writes the block of text, including the <> marks and their included concept symbols to the full text output file on disk. Immediately following this text, the computer writes the text equivalent both before any transformations have been made and also after each transformation step. If the block retention flag is TRUE, the computer then writes the words "PREVIOUS TEXT IS RETAINED." If the block retention flag is FALSE, the computer writes "PREVIOUS TEXT IS DISCARDED" instead. In addition, if the block retention flag is TRUE, the computer also writes to the filtered text output file the same items described above for the full text output file with the omission of the text equivalents after the transformations. Also, the phrases about the previous text being retained or discarded are also not written since only retained text is stored in the filtered text output file. Continuing with the text of step I-3, it is possible to filter for the retention of only those paragraphs directly relevant to American defense spending. In this case, the "+" symbol could be used as the retention signal. It would be unnecessary to specify any concept symbols as discard symbols. In this case, the "+" in last text equivalent of step II-A15 would cause the block retention variable to be set to TRUE. The output to the full text output file for the block of text in step II-A15 would be: !"<s>Funding for the <b>Marines and <n>non-<d>defense items should not be cut," he said. * 1 s 16 b 12 n 4 d 42 * 1 s 16 b 12 * 1+16 b 12 * 1 A 0+16 b 12 PREVIOUS TEXT IS RETAINED (II-A18) The computer repeats steps II-A7 to II-A17 above after finding the next initiation signal in the text and moving the current begin block of text marker to this initiation signal. The repetition stops when the end of the story is reached. The second block of text from the edited text of step I-3 would be: !He did ask for a reduction in health care <s>spending. with text equivalent: * 42 s 9 The text equivalent transformation of step II-A14 would yield * 42 s 9 A 0 after addition of the matching carryover symbol of "A" (identified in step II-A16) and the diagnostic distance of 0. There is no change in this text equivalent following any of the Rules 1 to 3 of step II-A15. Since there is no "+" symbol in the final transformed text equivalent, the computer sets the block retention variable to FALSE and writes the block of text, the original text equivalent, the three text equivalents after application of the transformation rules, and the words "PREVIOUS TEXT IS DISCARDED" to the full text output file. Nothing from this second block of text is written to the filtered text output file. The result is a more homogeneous text containing only those paragraphs directly relevant to "American defense spending." After steps II-A1 through this step, the filtered story of step I-3 is: * 87.0027 !"<s>Funding for the <b>Marines and <n>non-<d>defense items should not be cut," he said. * 1 s 16 b 12 n 4 d 42 (II-A19) The text filtration steps II-A5 to II-A18 for individual stories are repeated for all retrieved stories. (II-A20) If the text after step II-A18 retains important amounts of irrelevant text, the computer loads a new dictionary and set of corresponding rules based on criteria different from those for steps II-A1 to II-A4 and repeats the text filtration steps II-A5 to II-A18. If necessary, this filtration to remove irrelevant text is repeated using a new alternative dictionary and corresponding set of rules each time until reasonably homogeneous text is obtained. The text used for these further filtrations is the filtered text output file of an earlier filtration. In loading the text in step II-A5, the computer removes all old diagnostic symbols by eliminating all character strings beginning with "<" and ending with ">." The computer also removes the old text equivalent by removing the string of characters between the "*" symbol marking the beginning of a text equivalent and the carriage return marking the end of a text equivalent as shown in step II-A17. II-B. Text Scoring The stories are each scored for the extent to which they support positions relevant to public opinion change for the issue under study. Continuing with the example from step II-A20 above, the task proceeds using a text analysis dictionary, a set of transformation rules and a set of scoring rules all designed for the computation of numerical "message scores" favoring the three positions of More, Same and Less defense spending. Every message is given an index number k. For this kth message, the scores are designated s.sub.ij"k where index i refers to the source of the information as deduced from the message itself, where index j" denotes the position the score favors, and where k is the message index. Index i is odd if the scored information directly supports a position so that the message is accessible to all members of the population. Index i is even if the information indirectly supports the position so that only those already of the issue are able to make the connection. For example if the President of the United States is quoted in the kth message as saying that there should be More defense spending, then: (a) i=an odd number (e.g. i=3) identifying the source as the President (the number is odd for defense spending because the message directly advocates a position on this issue; indirect Presidential assertions requiring interpretation by the audience would have another i characteristic of the President but the index, e.g. i=4, would be an even number), (b) j"=1 if the position of More defense spending had this index, and (c) k=the index number identifying the AP story from which the quote came. This definition for s.sub.ij"k is the same as that in equation A.28 of Fan (1987). Because index i can have several values, reflecting different message sources, it is possible for a message to have several scores with different indices i favoring the same position indexed by j". In the simplest scoring for the defense spending example, no distinctions would be made between messages according to source. In this degenerate case of no source dependence, i=1 for all sources, both direct and indirect. This is assumption made for the defense spending scoring described below. The actual scoring involves determining s.sub.ij"k values for each block of text and then summing the values for all blocks of text. Therefore, the additional term s.sub.ij"kq is introduced where q is the index for individual blocks of text. Thus s.sub.ij"kq is the score for the ith source, supporting the j"th position, of the qth block of text within the kth story. The final s.sub.ij"k for a story is the sum of the s.sub.ij"kq over all the q for that story. The text scoring procedure itself is very similar to that for the text filtrations and is performed on text remaining after all the filtration steps described in step II-A. (II-B1) The text scoring uses a text analysis dictionary and a set of text transformation rules with exactly the same formats as those in steps II-A. In addition, the task requires a set of "text scoring rules." (II-B2) The computer reads and stores in memory the text analysis dictionary, the text transformation rules, and the text scoring rules. Since the text scoring rules include concept symbols from the text analysis dictionary, it is necessary to discuss the dictionary before considering the text scoring rules. Therefore, consider the example of the text from step II-A20. For this example, the dictionary could include the fragments of the concept and dictionary arrays shown in Table 3 (the complete dictionary would have more entries):
TABLE 3
______________________________________
Concept and dictionary arrays of text analysis
dictionary
______________________________________
Fragment of concept array
Concept mnemonic
Concept symbol
______________________________________
AheadNegation /
LessWord L
SameWord s
MoreWord M
______________________________________
Fragment of dictionary array
Dictionary word Concept symbol
______________________________________
not /
cut L
______________________________________
In this dictionary, the words with the concept of AheadNegation would cause a reversal in the sense of words further into the text equivalent. A typical example would be the word "not." LessWord would be a word favoring less spending. The example in the dictionary is the word "cut." Since the text filtration illustrated in step II-A above had already required that the paragraph be about "American defense spending," the text scoring step might simply be based on word clusters implying support for More, Same or Less without regard to reference to America, defense or spending. In this case, the concepts of "less," "same," and "more" (LessWord, SameWord, MoreWord in the dictionary of Table 3) would be enough for the text scoring rules. These rules are a two dimensional array with elements S.sub.ij" where i indexes the sources of thoughts in the collected AP stories and j" indexes the positions favored by the message scores. Each element S.sub.ij" has two components: (a) a "scoring mnemonic" comprised of a strings of characters, sufficiently long to suggest the source of a thought and the position that thought is scored to support, and (b) a corresponding "scoring symbol" comprised of a concept symbol. The appearance of a scoring symbol in the qth text equivalent of the kth story would lead to a positive numerical score s.sub.ij"kq. Consider the task where all messages are scored independently of source so that i=1 for all scores. Furthermore, consider that MoreWord in Table 3 implies support of More defense spending and corresponds to index j"=1. Similarly, SameWord might imply support for Same spending with index j"=2, and LessWord might imply support for Less spending with index j"=3. Then, the scoring rules would have the form of Table 4.
TABLE 4
______________________________________
Text scoring rules
The prefix letter "s" is frequently used to indicate a
scoring mnemonic so that sMore, sSame, and sLess correspond to
scores supporting More, Same and Less defense spending. The
scoring symbols are from Table 3.
S.sub.ij"k
Scoring mnemonic
Scoring symbol
______________________________________
S.sub.11 sMore M
S.sub.12 sSame S
S.sub.13 sLess L
______________________________________
(II-B3) The computer loads the filtered texts of the collected stories after the last filtration of step II-A20. As for the filtration of step II-A20, the computer removes from the text all strings beginning with "<" and ending with ">" and omits the text equivalents. The computer then performs steps II-A5 to II-A16 using the text analysis dictionary and the text transformation rules of step II-B2. If the initiation and termination signals for blocks of text are paragraphs as in step II-A3, and if the dictionary is that in Table 4, then the computer would construct this text and text equivalent from the text of step II-A20: !"Funding for the Marines and non-defense items should </>not be <L>cut," he said. * 54/8 L 14 Since the decision in the example of step II-B2 was simply to score word combinations favoring More, Same and Less, the following text equivalent transformation rule could be used with "not less" being considered to be equivalent to "same:" Rule 4: Specified operator symbol=/ Specified target symbol=L Specified direction=A Specified distance=20 Specified decision symbol=S Specified operator retention variable=FALSE Computer application of this rule to the previous text equivalent * 54/8 L 14 yields * 54 S 14. (II-B4) After constructing the summary text equivalents by performing the text equivalent transformations for the qth block of text of the kth message using the text transformation rules of step II-B4 above, the computer then uses the text scoring rules to calculate the s.sub.ij"kq values in the following manner. The computer compares the scoring symbols in the S.sub.ij" of the text scoring rules with the concept symbols in the summary text equivalent of the qth block of text of the kth message. The computer calculates the s.sub.ij"kq for a particular i and j" by dividing the number of matches for the concept symbols in the corresponding S.sub.ij" by the total number of all matches for all concept symbols in all the S.sub.ij" regardless of i and j". By performing this division, the same total score is given for every AP paragraph. If matches are found for two concept symbols supporting different positions, the total paragraph score would be divided among the corresponding concepts. Using the text scoring rules of Table 4 for the transformed text equivalent of step II-B3, the computer obtains: s.sub.11k1 =0/1=0.0 AP paragraphs, s.sub.12k1 =1/1=1.0 AP paragraphs, and s.sub.13k1 =0/1=0.0 AP paragraphs. Here, the corresponding block of text is the first in the story so q=1. The units for s.sub.ij"kq are the types of blocks of text scored. Since the scores above are for a typical AP paragraph, the units are AP paragraphs. For calculating these s.sub.ij"kq scores, scoring symbol "S" in Table 4 appears once while scoring symbols "L" and "M" appear zero times in the text equivalent. With only one appearance of a scoring symbol, the total number of matches is 1+0+0=1. To permit the system operator to follow the progress of the analysis, the computer writes to the monitor a full text output file in a format very much like the full text output file of step II-A17. The only difference is that a line describing the block scores replaces the sentences announcing the retention or discarding of previous text. The block of text scores in the output (beginning with the word "SCORES:") includes the scoring mnemonic in the S.sub.ij" with the same i and j" as in the s.sub.ij"kq (see Table 4): !"Funding for the Marines and non-defense items should </>not be <L>cut," he said. * 54/8 L 14 * 54 S 14 SCORES: sMore=0.0, sSame=1.0, sLess=0.0 (II-B5) The computer repeats steps II-B3 and II-B4, for all blocks of text q in the kth story. Upon scoring the last block of text, computer sums the individual s.sub.ij"kq over all q to obtain the final s.sub.ij"k score. For the kth story, the computer writes the decimal date, denoted t.sub.k, and these final scores s.sub.ij"k to disk. Since there was only one block of text left in the example of step II-A20, the final message scores s.sub.1j"k are the same as the s.sub.1j"k1 scores in step II-B4 above. (II-B6) The computer repeats steps II-B3 to II-B5 for the remaining text of all the stories after the filtration of step II-A20 and writes their s.sub.ij"k and their corresponding t.sub.k to disk. The examples presented above were chosen to illustrate the major options in the text analysis dictionary, in the transformation rules, in the text filtration, and in the text scoring rules. The actual dictionaries, transformation rules, text filtration rules, and text scoring rules used for the analysis of American defense spending (Fan, 1987) were somewhat different. Fan (1987) also describes the application of the text analysis in this step II to five other topics. In all cases, the text analysis was found to give acceptable results (see Chapter 3 and of Fan, 1987). III. Computations of Public Opinion (III-1) The computer reads from disk all the data calculated in step II-B6 and stores these data as a "scores array" with index k so that each array element contains the t.sub.k and all the corresponding s.sub.ij"k of the kth story. The computer then sorts the elements of the scores array by date with the data from earliest story being the first item in the array and the information from the latest message being the last. In all subsequent steps, index k refers to the array index after sorting. For example, if the story retrieved in step I-3 and scored in step II-B6 had index k, then its entry in the scores array would be: t.sub.k =87.0027, s.sub.11k =0.0 AP paragraphs, s.sub.12k =1.0 AP paragraphs, and s.sub.13k =0.0 AP paragraphs. (III-2) In order to compare predicted and measured public opinion, the system operator enters into the computer via the keyboard a series results from measured public opinion polls. This series will typically include data from published opinion polls in which the same question or close variants are asked at a number of different times. Since people are only allowed to hold one position for any single polled issue, opinion polls assume that the total population can be divided into subpopulations, P.sub.j, each with members favoring the same polled position with index j. The fraction of the total population in subpopulation P.sub.j is defined as B.sub.j. The percentage in the No Opinion or Don't Know category can be considered to be a separate subpopulation or the percentage for this group can be removed with the remaining percentages for defined positions being renormalized to 100%. This nomenclature follows that in Sections A.2 and A.4 of Fan (1987). Since public opinion can change, B.sub.j will vary with time t so that B.sub.j =B.sub.j (t). To indicate that opinion percentages are from poll data, the B.sub.j (t) from polls carry the extra subscript P and therefore have the form B.sub.Pj (t). Different polls in a series are identified by index numbers n so the nth poll result favoring position j is B.sub.Pnj (t). If t.sub.Pn is the time at which the nth poll was taken, with subscript P again indicating that time is for a poll, then B.sub.Pnj (t)=B.sub.Pnj (t.sub.Pn). The time t.sub.Pn of a poll is computed by averaging the beginning and ending date of the poll. A series of poll data appropriate for the example in step I, is given in Table B.1 of Fan (1987). For the calculations in Fan (1987), the percentage of Not sures and Don't knows were subtracted with the remaining 90% or so of the population favoring More, Same and Less defense spending being renormalized to 100%. (III-3) Similar to the case with the message scores, the computer stores the poll data as an array indexed by poll number n with each "opinion array" element having the t.sub.Pn of the poll and the corresponding B.sub.Pnj (t.sub.Pn). The computer sorts the array elements according to date t.sub.Pn with the poll at the earliest date being the first element in the array. In subsequent discussions, n will be the poll index after sorting. For example, the first element of the poll array for the first line of the data of Table B.1 of Fan (1987) (after removal of the Not sure's and renormalization as mentioned above) would have n=1. The corresponding element in the poll array consists of: t.sub.P1 =(31+28+15)/365=77.2027 (The date before conversion to a decimal date was March, 1977. When only the month is supplied in the published poll, the poll date is assigned to the middle of the month, hence March 15, in the present example. The 31 and 28 are the number of days in January and February respectively.), B.sub.P11 =25.7%, B.sub.P12 =49.5%, and B.sub.P13 =24.9%. (III-4) To predict public opinion, the computer loads a set of "refining weight" constants w.sub.ij'j" prescribing the method for generating "persuasive force functions" G".sub.j' describing the ability of information favoring a position to change the minds of persons holding different opinions. Equation A.29 of Fan (1987) relating these items to each other is reproduced below G".sub.j' (t)=.sub.i,j",k w.sub.ij'j".s.sub.ij"k.e.sup.-p(t-t.sbsp.k.sup.).sbsp.. (A. 29) where the summation is over all i and j" and over all k with t.sub.k <t. These indices are the ones entered for s.sub.ij"k and t.sub.k in step III-1 above. Constant p is the "persistence constant" characteristic of AP stories. Constants w.sub.ij'j" describe the contribution of each of the scores s.sub.ij"k to persuasive force function G".sub.j' (t). Each persuasive force function G".sub.j' (t) favors a position denoted by j'. These positions often coincide with the positions j" of scores s.sub.ij"k but need not do so as discussed in of Fan (1987). These w.sub.ij'j" permit different types of information favoring a position to have different weights. For instance, it is conceivable that information from the President of the United States might be more or less persuasive than messages from Congress for the issue of whether more should be spent for military defense. To take this possibility into account, scores s.sub.3j"k with index i=3 could refer to the quoted source being the President, and scores s.sub.5j"k with i=5 could refer to the quoted source being from Congress. Both indices would be odd if the scores were due to direct quotes favoring the position indexed by j". Indices i=4,6 could be used if descriptions of Presidential and Congressional action indirectly supported a position on defense spending. A score attributed to the President (s.sub.31k) and one identified with Congress (s.sub.51k) could both favor the same position such as more defense spending (j"=1), and both scores could come from the same kth story. These scores would be differentiated by their indices i. If Presidential statements had greater persuasive force, then w.sub.3j'j" would be greater than w.sub.5j'j". In the simplest case, the persuasive forces would favor the same positions as the scores contributing to these forces. Then, w.sub.ij'j" would only have a positive, non-zero value when j"=j'. However, the w.sub.ij'j" can also reflect ambiguities in the message scoring. For example, it is conceivable that the average message scored as favoring Same defense spending actually has a portion favoring More spending as well. In this case, the s.sub.ij"k favoring Same defense spending could contribute both to G".sub.1 (t) for More defense spending and to G".sub.2 (t) for Same defense spending with different weights w.sub.ij'j" for the two contributions. If a position score s.sub.ij"k makes no contribution to persuasive force function G".sub.' (t), the corresponding refining weight w.sub.ij'j", is zero. The scores of step III-1 were for three positions j"=1,2,3 corresponding to support of More, Same and Less defense spending with no attribution by source so that i=1 for all scores. It is also reasonable to postulate that the G".sub.j' functions causing opinion change for the issue of defense spending only received contributions from scores favoring these same three positions. In this case, the refining weights could be given by Table 5 where each s.sub.ij"k only contributed to the persuasive force function G".sub.j" favoring the same position indexed by j".
TABLE 5
______________________________________
Refining weights
In this Table, the non-zero w.sub.ij'j" are indicated by their
appropriate index numbers.
Index i for source = 1
Index j" for scores s.sub.ij"k
Index j' for
favoring
persuasive force
More Same Less
G".sub.j' (t) favoring
(j" = 1) (j" = 3) (j" = 3)
______________________________________
More (j' = 1)
w.sub.111 0.0 0.0
Same (j' = 2)
0.0 w.sub.122
0.0
Less (j' = 3)
0.0 0.0 w.sub.133
______________________________________
All values w.sub.111, w.sub.122, and w.sub.133 would further have the same value if all scores s.sub.ij"k were equally persuasive. Both the simple model of Table 5 and equality in the w.sub.ij"k values functioned well for the defense spending analysis of Fan (1987, Chapter 4). (III-5) For opinion predictions, the computer also loads a set of "population conversion rules." These rules are summarized as an array of constants k'.sub.2j'rj. These constants are used in equation A.26 of Fan (1987). This equation is reproduced here: ##EQU5## The G".sub.j' (t) terms are from equation A.29 (see above), R=the number of random messages collected in step I, T=the total number of messages identified as relevant in step I, .DELTA.t=the time interval used by the computer for iterative opinion calculations, B.sub.j (t)=the percentages of the population in subpopulations P.sub.j as discussed in step III-2, and "modified persuasibility constants" k'.sub.2j'rj (from Table 5) describe the ability of persuasive forces G.sub.j' to move persons from a "target subpopulation" P.sub.r to a "destination subpopulation" P.sub.j. The number of people persuaded to change their opinions from that of P.sub.r to that of P.sub.j is proportional to the size B.sub.r of the target subpopulation and the magnitude of the persuasive force G".sub.j'. The constant of proportionality is the modified persuasibility constant k'.sub.2j'rj. If a G".sub.j, cannot cause any conversion of people in P.sub.r to join P.sub.j' then k'.sub.2j'rj =0. For example, information favoring Less defense spending should not persuade those favoring Less spending to support More spending. The computer assumes that all k'.sub.2j'rj have the same constant value denoted by k'.sub.2 whenever a transition can be caused. In FIG. 4.2 of Fan (1987) it is assumed that the population conversions for defense spending are as follows: G".sub.1 converts members from P.sub.3 favoring Less to P.sub.2 favoring Same, G".sub.1 converts members from P.sub.2 favoring Same to P.sub.1 favoring More, G".sub.2 converts members from P.sub.3 favoring Less to P.sub.2 favoring Same, G".sub.2 converts members from P.sub.1 favoring More to P.sub.2 favoring Same, G".sub.3 converts members from P.sub.1 favoring More to P.sub.2 favoring Same, G".sub.3 converts members from P.sub.2 favoring Same to P.sub.3 favoring Same. These conversions would lead to the population conversion rules of Table 6:
TABLE 6
______________________________________
Population conversion rules
All non-zero k'.sub.2j'rj are entered with their appropriate
indices and have the constant value of k'.sub.2. In this Table:
the position of More corresponds to j = 1, j' = 1, and r = 1;
the position of Same corresponds to j = 2, j' = 2, and r = 2; and
the position of Less corresponds to j = 3; j' = 3, and r = 3.
Index j' for
Index r for Index j for destination
persuasive
target subpopulation P.sub.j
force G".sub.j'
subpopulation P.sub.r
j = 1 j = 2 j = 3
______________________________________
r = 1 0.0 0.0 0.0
j' = 1 r = 2 k'.sub.2121
0.0 0.0
r = 3 0.0 k'.sub.2132
0.0
r = 1 0.0 k'.sub.2212
0.0
j' = 2 r = 2 0.0 0.0 0.0
r = 3 0.0 k'.sub.2232
0.0
r = 1 0.0 d'.sub.2312
0.0
j' = 3 r = 2 0.0 0.0 k'.sub.2323
r = 3 0.0 0.0 0.0
______________________________________
(III-6) Using the loaded rules, the computer performs calculations of public opinion as a time trend using equations A.29 and A.26 as follows: (a) The computer chooses as t, the time of the first poll t.sub.P1. (b) The computer calculates, for time t=t.sub.P1 +.DELTA.t, all values of G".sub.j' (t) using equation A.29, the s.sub.ij"k and t.sub.k loaded in step III-1 from step II, constant p and constants w.sub.ij'j" assigned in step III-4. (c) The computer uses the poll percentages, B.sub.P1j at t.sub.P1 as the first B.sub.j (t-.DELTA.t) in equation A.26. (The B.sub.r (t-.DELTA.t) values are the same.) (d) The computer calculates B.sub.j (t) at t=(t.sub.P1 +.DELTA.t) from these B.sub.j (t-.DELTA.t), the R and T values calculated in the initial story retrieval step I-2, the G".sub.j' (t) calculated in step b immediately above, a specified t, and the k'.sub.2j'rj terms assigned in step III-5. (e) The computer repeats the calculations of step b above after advancing time t by t to obtain values of G".sub.j' (t) one t later. The computer repeats step d using as B.sub.j (t-.DELTA.t) the B.sub.j (t) calculated in the previous step d. (f) The computer repeats step e, advancing time t in increments of .DELTA.t, until the t is greater than the t.sub.Pn of the last poll time with index n. The result is a set of values for B.sub.j at times .DELTA.t apart. The computer writes all these results to disk, and displays the opinion time trend as a graph on the monitor and on the printer. An example of these data plotted as time trends is shown in FIG. 4.6 Fan (1987). The measured B.sub.Pj (t) values from the published polls described in step III-3 above are also plotted as squares for comparison. The computer also writes the calculated values of G".sub.j (t) to disk. The G".sub.j (t) functions used for the computation of FIG. 4.6 of Fan (1987) are plotted as the time trends of FIG. 4.4 of Fan (1987). (g) During the calculations of step f, the computer calculates and sums a series of "squared deviations" between the calculated opinion and the published opinion values. These squared deviations are calculated whenever the time t.sub.Pn of a published opinion poll coincides with one of the times t in the calculations of step h or is between two of these times t. If time t.sub.Pn coincides with one of the times of step f, then the computer calculates, as the squared deviation, the square of the difference between the calculated opinion B.sub.j (t.sub.Pn) and the poll value favoring the same position B.sub.Pnj (t.sub.Pn). A separate squared deviation is calculated for each position j and all squared deviations are summed. If time t.sub.Pn is between two calculation times t-.DELTA.t and t, and if .DELTA.t is 24 hours or less, then the computer calculates the squared deviations between the poll value B.sub.pnj and both B.sub.j (t-.DELTA.t) and B.sub.j (t). The smaller of the two squared deviations is used for the summation. This decision is based on the argument that there are at least 24 hour uncertainties in the times of the polls and in the times of the AP messages so it is not unreasonable to choose the lesser of the deviations for estimating the calculation errors. If the .DELTA.t is longer than 24 hours, then the computer calculates the B.sub.j (t.sub.Pn) corresponding to the measured B.sub.Pnj by performing a linear interpolation between the B.sub.j (t-.DELTA.t) and B.sub.j (t) to obtain the B.sub.j (t.sub.Pn). After calculations at all of the times of step g, the computer will have calculated squared deviations for all poll points. The computer then computes a mean squared deviation by dividing the sum of the squared deviations by the total number of deviations computed. The computer writes the mean squared deviation to the display device and the printer. (III-7) Optionally, the computer computes the mean squared deviation for a number of trial values of constant p, the k'.sub.2j'rj, and the w.sub.ij'j". The computer chooses as optimal those constants those giving the minimum mean squared deviation. Alternative Embodiments Although a specific example of the preferred embodiment is given above, a number of modifications of this embodiment are possible within the scope of this invention: I. Alternative Messages and Scoring Broadly speaking, the determinations in the preferred embodiment occur in three defined steps: collecting messages, scoring the messages, and using the message scores to determine time trends of public opinion. Since these steps are largely independent, it is possible to vary both the messages and their scoring. The essential feature of the messages is that they must be representative of those available to the population and relevant to the issue for which opinion is computed. As discussed in Chapter 1 of Fan (1987), the messages can be from any source ranging from personal experiences to those in the mass media. Any of these messages can be used in the computations of this patent so long as three critical features can be assigned: (a) numerical scores for the extent to which different attributed sources in the message support different positions of the issue, (b) a time dependent function describing the availability of the message to the population, and (c) a numerical validity score for the reputation of the medium transmitting the message. Therefore, the following alternative methods can be used: I-A. Alternative Messages In addition to AP stories in the preferred embodiment, messages can be collected from any source relevant to the issue for which opinion calculations are made. These can include other mass media messages both in the written press and in the electronic press. The messages can be in any form ranging from personal experiences, through words and pictures on written pages to broadcasts via television and radio. Besides actual messages which the computer can retrieve and score (see following section), it is also possible to postulate specified numerical scores for items a and c in this alternative embodiment. Then a mathematical function can be postulated for item b. With these specifications, it is possible to include in opinion calculations messages which cannot be measured easily but which can be modeled mathematically. I-B. Message Scoring Every message must be scored for the extent to which it supports different positions within the issue being analyzed. These scores s.sub.ij"k in equation A.29 of the preferred embodiment were obtained by the computer assisted content analysis of step II of the preferred embodiment. This same procedure can be performed for any message comprised of text which can be transferred to characters readable by computer. If the text is only found as written words on paper, it is possible for the system operator to read and enter the text into the computer by use of the keyboard. Alternatively, it is possible to use an electronic device to read the text and convert it into computer readable form. I-C. Message Availability to the Population For AP stories in the preferred embodiment, it was assumed that information in an AP dispatch would have its maximum persuasive force on the date of the story. After that time, the effectiveness was postulated to decrease exponentially with time with a characteristic persistence constant. This time course is described in the e.sup.-p(t-t.sbsp.k.sup.) term of equation A.29. This same time course can also be used for other mass media messages from newspapers, television and radio. However, other mathematical functions may be more appropriate for other message sources. For example, information from a book may have a time course which increases over a substantial time period before finally decreasing. In principle, it is possible to postulate any arbitrary time dependent mathematical function to describe the availability of a message to the population. Such a function would replace the e.sup.-p(t-t.sbsp.k.sup.) in equation A.29. It is also possible to measure this availability directly. For a book, for instance, it is possible to approximate the availability by the measured pattern of sales over time. Since the e.sup.-p(t-t.sbsp.k.sup.) term is specific for each message with index k, this term could be replaced by a different measured or postulated function for each message. The discussion so far has only been for the time course of the message availability. More completely, the availability of the message also includes a scaling factor describing the number of people reached at a particular time--for instance when the message's persuasive force was the greatest. The larger the audience at this time, the greater will be the total effect of the message. If only AP messages are used, then this scaling factor will be constant for all messages so the factor was absorbed into constant k'.sub.2 of equation A.26 (see Appendix A of Fan, 1987). This procedure means that different k'.sub.2 should be used for messages from different media. Therefore, if messages from more than one medium are used, messages from different media will need different k'.sub.2. I-D. Validity of the Medium Like the scaling factor for message availability just discussed, the reputation of the medium is also absorbed into constant k'.sub.2 (see of Fan, 1987) for the analysis of AP stories. Again, if other types of messages are used, this k'.sub.2 should differ according to the medium. II. Replacements for G" Functions In, it is proposed that the G" functions in equation A.26 can be replaced by equation A.13 reproduced below: H.sub.jr (t)=G.sub.j (t)/(d.sub.jr.G.sub.r (t)+d.sub.jj.G.sub.j (t)+1) (A.13) where d.sub.jr and d.sub.jj are both constants. In the preferred embodiment, these constants were assumed to be sufficiently small that their product with the G functions in equation A.13 are much less than 1. Functions G" and G are related to each other by a constant factor as discussed in of Fan (1987). To use this equation in opinion calculations, equation A.26 is replaced by equation A. 15 of Fan (1987) reproduced below: dB.sub.j (t)/dt= (A.15) ##EQU6## This equation can be solved at intervals of .DELTA.t essentially as described in step III of the preferred embodiment once the H.sub.j'r (t) are calculated. All other terms in the equation have already been described in that step III. III. Inclusion of Unawareness So far in this patent, the assumption has been made that all people were aware of the issue being analyzed. When a significant fraction of the population is unaware, it is possible to use equations A.34-A.36 of Fan (1987) to determine expected public opinion (see of Fan, 1987 for justifications). These equations are reproduced below: ##EQU7## for all j and j' where u is constant and ##EQU8## for all odd i, all j", and all k where t<t.sub.k. Finally, ##EQU9## for all j' and r. The computer calculates functions F'.sub.j' (t) using equation A.35, and the same scores s.sub.ij"k, w.sub.ij'j" values and constant p as appear in equation A.29. However, the summation this time is only over odd i instead of all i. The computer then reads values for constants u and the k'.sub.lj'j as well an initial value, typically from measured opinion polls, for the fraction (1-A(t-t) of the population at (t- t) who are unaware. Employing these data, the computer can calculate A(t) at increasing intervals of t using the A(t) of one calculation as the A(t-t) of the next calculation (see analogous strategy for calculations B.sub.j (t) as described in step III-9 of the preferred embodiment). The calculated values of A(t) and the data discussed above can then be inserted into equation A.36 to compute the fraction A(t).B.sub.j (t) of the population holding the opinion with the corresponding index j. IV. Alternate Concept and Diagnostic Symbols In the preferred embodiment, the concept and diagnostic symbols all comprised of a single character. Alternatively, it is possible to use concept and diagnostic symbols containing any combination of data bits. For example, strings of characters can be used. It is only necessary that reserved codes (not necessarily the "<" and ">" of the preferred embodiment) be used to mark the beginnings and ends of concept symbols. With these control codes, the computer can identify the beginnings and ends of the concept and diagnostic symbols. With this identification, the computer can remove concept symbols and text equivalents from a previous text filtration step before any further text analysis steps. In the text of the output files from the preferred embodiment, the control codes for the beginnings and ends of the concept symbols were "<" and ">". In the text equivalents, on the other hand, the control codes marking the beginnings and ends of the concept symbols were spaces. This difference illustrates that it is unnecessary that the control codes be the same in the text and in the text equivalents in the output files. V. Parsing of Text by Words Instead of Characters In the preferred embodiment, words were defined as any arbitrary string of characters. With this definition, comparisons between words in the text and words in the dictionary were performed by permitting a word to start at any character in the text. Also, any leading and trailing letters were permitted. Alternatively, it is possible to require that the words begin and/or end with defined control codes such as spaces, carriage returns, etc. In this case, the dictionary searches could be for words beginning or ending with a control character. The dictionary could also have reserved control characters at the beginnings, in the middles, and/or at the ends of word entries to indicate that replacement characters are possible for the control characters. VI. Opinion Determinations without Reference to Poll Measurements Opinion determinations in the preferred embodiment began with the first B.sub.j (t-.DELTA.t) being taken from the results of a public opinion poll. However, it is also possible to take advantage of the results of FIG. 4.10 of Fan (1987). This figure shows that the opinion calculations will converge to a consensus value as time proceeds regardless of the first B.sub.j (t-.DELTA.t). Therefore, it is possible to perform two opinion calculations beginning with widely disparate values for an opinion B.sub.j (t-.DELTA.t) e.g. 0% and 100%. Opinion in the time period during which the two calculations converged could then be taken as a reasonable estimate of expected opinion. VII. Alternative Determinations of Scores for Blocks of Text In step II-B4 of the preferred embodiment, the block of text scores s.sub.ij"kq for a particular i and j" were obtained after division by the sum of all s.sub.ij"kq. Alternative divisors with appropriate weights could be used based on some combination of s.sub.ij"kq scores. As yet another alternative, there might only be the summing of the counts of concept symbols with no subsequent division step. VIII. Alternate Use of the Specified Distance When the specified distance is greater than zero in step II-A15 of the preferred embodiment, a matching target symbol can only marked if it is separated from the matching operator symbol by no more than one diagnostic distance number. Alternatively, the separation can involve more than one diagnostic distance if the total of all intervening diagnostic distances between the matching operator and target symbols is less than the specified distance. The same specified direction rules would still be used as in the preferred embodiment. IX. Alternations in the Contents and Text Equivalents of Blocks of Text In step II-A15 of the preferred embodiment, the essential concepts of a block of text are summarized as a final transformed text equivalent. The concepts from one block of text can be transferred to the text equivalent of the following block of text by the use of carryover symbols which are added to the following text equivalents (see steps II-A14 and II-A16). This is done when the contents of one block of text implied the presence of certain concepts in other blocks of text from the same story. This procedure can be generalized in two ways. The first generalization is to alter not just the text equivalent of the following block of text but also text equivalents of other specified blocks of text. The modification can involve not only the addition but also the deletion or replacement of elements in these other text equivalents. The other generalization is the alteration of the text of other blocks of text. For example, the sample text equivalent of step II-A16, carried the connotation of "American." Besides using carryover symbols to insert the concept symbol for "American" into the text equivalent of the next block of text, it is possible to insert the word "American" into the text of another specified block of text. Words would be inserted in, altered in or deleted from the text of a block of text when it is desirable that the words should continue to appear in all subsequent filtration or scoring steps for a block of text. Conversely, if it is preferable that the connotation should disappear in subsequent manipulations, it would be more appropriate to change text equivalents since text equivalents are always erased before any following text analysis steps. X. Hardware Based System In this alternative embodiment, the microprocessor 10 and software (FIG. 1) is replaced by an alternative hardwired logic system. Referring now to FIGS. 2 and 3, a block diagram and flow diagram, respectively, of the opinion prediction operation of this hardware system is shown. The system includes text retrieval logic 50, text filtration logic 52, numerical scoring logic 53, prediction and output logic 54, and control logic 56. Output logic 54 is connected to drive a monitor and a printer, while control logic 56 includes a keyboard input. Text retrieval logic 50 is connected to a communication link 51 from which data may be acquired, for example, from a data bank. Digital storage unit 57, either random access memory or magnetic medium based, is connected to all logic modules 50, 52, 53 and 54. Control logic 56 includes hardwired logic for controlling the other logic modules to perform the functions outlined in the flow diagram of FIG. 3. Referring to FIG. 3, the start of operation of the system is represented by block 60. In block 61, logic 56 causes retrieval logic 50 to command an external data base via a communication link 51 to retrieve text and to store it in storage unit 57, in the same manner as described above with respect to step I of the software embodiment. In block 62, retrieval logic 50 receives the text of retrieved messages from the data base via the link 51 and stores the text in storage unit 57. If a signal indicates that more text is to be retrieved in block 63, the retrieval logic 50 repeats blocks 61 and 62. When no more text is to be retrieved, the logic 50 will have finished the equivalent of step I of the preferred embodiment. At this point, control logic 56 causes filtration logic 52 to activate the first dictionary and its corresponding set of text filtration rules (block 64), which are predetermined and stored in storage unit 57. Logic 52 uses this dictionary and set of rules to remove text irrelevant to the prediction in the same manner as described above with respect to step II of the software embodiment. At the end of this text filtration, the filtered text is stored in storage unit 57 and logic 52 checks for the presence of a subsequent dictionary and its corresponding set of text filtration rules (block 65). If another pair is found, logic 52 repeats the filtration process using the paired dictionary and set of rules. After no more filtration dictionaries and rules are found, the logic 52 signals control logic 56 to activate the scoring logic 53. Logic 53 operates pursuant to present text scoring dictionary and rules stored in unit 57 (block 66) and assigns numerical scores for the remaining text, in the same manner as described above with respect to step II of the software embodiment. After calculation of the scores, the scores are stored in storage unit 57. Before or during the operation of the system, the operator has the option of entering the results of data from measured public opinion polls into storage unit 57 via the keyboard and logic 56 (block 67). In block 68, control logic 56 activates prediction logic 54 which uses specified parameters, refining weights w.sub.ij'j" and population conversion rules k'.sub.2j'rj stored in unit 57 to compute time trends of public opinion in the same manner as described above with respect to step III of the software embodiment. The results are plotted as time trends on the printer 12 with reference to FIG. 1. For comparison, the processor also plots the results of time trends from measured opinion polls. Logic 54 may optionally also compute statistical comparisons based on the differences between predicted and measured opinion values, if such values are available. XI. Predictions for Habits For the purpose of clarity and to aid in the understanding of the invention, all embodiments presented above have focused on the ability of information to change public opinion. However, opinion is only one example of a social trait. As indicated above, this invention also encompasses other social traits including habits like smoking. The mathematical model in equations 1-7 of the Background of the Invention section above has already been extended to cover habits in "Ideodynamic predictions for the evolution of habits" by David P. Fan in Journal of Mathematical Sociology, Vol. 11, pp. 265-281, 1985. However, since that model also used equations 1-7, that theory extension was also inoperable. Therefore, an alternative embodiment could involve adding some of the concepts in that former extension to the embodiments described above to yield a functional system for predicting the fraction of the population with certain habits. By their very nature, habits are activities performed at frequent intervals. Therefore, members of the population can be assigned to various positions based on the frequency with which they repeat the activities. In the smoking example, both smoking and non-smoking would be defined as habits. If desired, it is also possible to divide the smokers according to the frequency with which they smoke. Each position as defined in the preferred embodiment would have both a corresponding habit and a corresponding opinion. However, the fraction of the population with a habit need not match the fraction of the population holding the opinion that the habit should be adopted. For example, there may be many smokers who would like to quit. In this embodiment for habits, opinion and habits are predicted independently of each other. Nevertheless, the two predictions share the same three steps of: (I) gathering messages relevant to the habit, (II) scoring the messages numerically for their ability to influence specified target subpopulations, and (III) analyzing the ability of the messages, as represented by their scores, to change the habits of appropriate target populations. XI-A. Messages in the Mass Media All messages relevant to opinion change for a habit are also be assumed to be pertinent to change in practice of that habit. Thus anti-smoking messages would be assumed to able to influence both opinion on the desirability of smoking and the chances that that people will stop. These messages would be those described in the preferred embodiment and would lead to functions G".sub.j' of equation A.29. XI-B. Mathematical Functions Replacing Measured Messages Some messages are difficult to measure directly. Since the ultimate impact of messages is described by persuasive force functions G".sub.j' in the preferred embodiment, messages which cannot be measured can nevertheless be included in the analysis if they can be modeled by postulated mathematical G".sub.j' functions. For habits, the paper by Fan cited earlier in this section includes two functions describing a "recidivism" effect and a "social pressure" effect. The recidivism effect in turn incorporates functions describing the "nostalgia" and "euphoria" phenomena. These functions are described in detail below: XI-B1. Recidivism Functions A person who has recently changed a habit has a "nostalgia" for the old habit and hence a better chance of reverting to an old habit than someone who had never had the former habit. Again, in the smoking example, a smoker who has quit will be more likely to start than someone who has never smoked. The system of this invention can account for this phenomenon by assigning mathematical "nostalgia" functions corresponding to personal experience infons favoring the start of smoking. These functions would have high values shortly after a change of habit with the values diminishing as time proceeds. Similarly, the system of this invention can also include "euphoria" functions describing the honeymoon feeling just after a change of habit in which a person is very happy to have successfully made the habit change. As with the nostalgia function, the euphoria function will also decrease with time. The euphoria and nostalgia functions were merged into equations 16 describing the combined "net recidivism" effect in the paper by Fan in the Journal of Mathematical Sociology mentioned earlier in this section. The combined effects yielded a recidivism persuasive force function G.sub.jR (t',t) with the form: G.sub.jR (t',t)=k.sub.R.e.sup.-k N.sup.(t-t').(1-e.sup.-k U.sup.(t-t')). (16) This function uses the term G since it has the same purpose as the G functions of step III of the preferred embodiment, namely the persuasion of susceptible members of the population to undergo social change. In this function, subscript j indicates the position toward which the function is likely to draw recruits and subscript R refers to the function describing the recidivism effect. The function is measured at time t and depends on an earlier time t' at which members of the subpopulation left the habit characterized by subscript j. If G.sub.jR (t',t) is the function describing the recidivism to smoking, then t' would be the time before the measurement time t when the smoker had quit. Parameters k.sub.R, k.sub.N, and k.sub.U are constants. In habit computations, this function G.sub.jR is added to the G".sub.j' functions of equation A.29 and entered in place of the G".sub.j' in equation A.26. XI-B2. Social Pressure Functions Besides the recidivism infon impact functions, it may also be convenient to postulate other functions G reflecting informational forces for social change. For example, in the Fan paper discussed earlier in this section, it was postulated that there are social pressure messages which can be modeled by assuming that messages in favor of smoking or non-smoking occur in proportion to the number of people observed in these categories. This phenomenon is described by equation 9 of the paper by Fan discussed earlier in this section: G.sub.jS (t)=k.sub.S.D.sub.j (t) (9) where G.sub.jS carries subscript S denoting that it is a social pressure infon impact function. Subscript j refers to the position the function favors, k.sub.S is a constant and D.sub.j is the fraction of the population practicing the habit corresponding to the position indexed by j. XI-C. Habit Trend Determinations The basic system of this invention can be used to compute the fraction of the population with a habit by a method essentially the same as that described in step III of the preferred embodiment. However, not all people would would like to change a habit may actually do so. Therefore, the k'.sub.2j'rj terms in Table 6 describing the ability of a message to cause a social change would be smaller for behavior change than for opinion change. In the calculations, recidivism persuasive force functions, tailored for each subpopulation, are added to the social pressure persuasive forces and the G functions from measured messages as described above for opinion change. Since the recidivism persuasive force functions depend on the time of a prior habit change, the population needs to be divided into many small subpopulations depending on the time of the previous habit change. The computations of step III of the preferred embodiment are made for the various subpopulations and the results for the subpopulations ar | ||||||
