Multilingual

Recording medium and character string collating apparatus for full-text character data

6260051

Abstract

All two-character chains including two general characters and all three-character chains including one special character between two general characters are detected from a registration character string in which a large number of special characters not having any meaning are frequently arranged, or all two-character chains including two general or symbolic characters are detected from a converted registration character string produced by changing each special character of the registration character string to one type of symbolic character determined in correspondence to one general character adjacent to the special character. Also, occurrence frequencies of the general or symbolic characters of each chain are counted and stored in a recording medium with the registration character chains. When a retrieval character chain is input, occurrence frequencies of particular character chains corresponding to all retrieval character chains detected from the retrieval character string in the same manner are read out from the recording medium and are collated with each other, and a particular character string agreeing with the retrieval character chain is retrieved from the registration character string. Because an occurrence frequency of any special character is not counted or the special characters are changed to various types of symbolic characters, a recording area required for the occurrence frequencies of the registration character chains can be reduced.


Claims

What is claimed is:

1. A recording medium for recording information of a text in which each of a plurality of special characters of the same character type is intermittently arranged in a plurality of general characters classified into a plurality of character types, comprising:

a first character chain recording region for recording all general two-character chains detected from a registration character string of the text, each general two-character chain including a fore general character and a rear general character arranged just after the fore general character in the registration character string, and the rear general character of a first general two-character chain placed just before a second general two-character chain agreeing with the fore general character of the second general two-character chain;

a second character chain recording region for recording all special character chains detected from the registration character string of the text, each special character chain including a fore general character, at least one special character and a rear general character arranged in that order in the registration character string, the rear general character of one special character chain placed just before one general two-character chain recorded in the first character chain recording region agreeing with the fore general character of the general two-character chain, the fore general character of one special character chain placed just after one general two-character chain recorded in the first character chain recording region agreeing with the rear general character of the general two-character chain, and the rear general character of a first special character chain placed just before a second special character chain agreeing with the fore general character of the second special character chain;

a first occurrence frequency recording region for recording a pair of occurrence frequencies of the fore and rear general characters of each general two-character chain recorded in the first character chain recording region as a general occurrence frequency set, the occurrence frequency of one general character of a particular type placed in a particular position of the registration character string denoting the number of general characters of the same particular type existing in an area between a starting position of the registration character string and the particular position of the registration character string; and

a second occurrence frequency recording region for recording a pair of occurrence frequencies of the fore and rear general characters of each special character chain recorded in the second character chain recording region as a special occurrence frequency set.

2. A recording medium according to claim 1 in which each special character is a space frequently occurring in the text written in Hangul language or a space frequently occurring in the text written in English to divide words.

3. A recording medium for recording information of a text in which each of a plurality of special characters of the same character type is intermittently arranged in a plurality of general characters classified into a plurality of character types, comprising:

a character chain recording region for recording all registration two-character chains detected from a converted registration character string which is produced from a registration character string of the text by converting each special character arranged in the registration character string into a particular type of symbolic character determined according to a type of a general character spaced at N characters (N is an integral number equal to or higher than 1) apart from the special character, each registration two-character chain including a fore general character or a fore symbolic character and a rear general character or a rear symbolic character arranged just after the fore character in the converted registration character string, the rear general character or the rear symbolic character of a first registration two-character chain agreeing with the fore general character or the fore symbolic character of a second registration two-character chain placed just before the first registration two-character chain; and

an occurrence frequency recording region for recording a pair of occurrence frequencies of the fore general character or the fore symbolic character and the rear general character or the rear symbolic character of each registration two-character chain recorded in the character chain recording region as a registration occurrence frequency set, the occurrence frequency of one general character or symbolic character of a particular type placed in a particular position of the converted registration character string denoting the number of general characters or symbolic characters of the same particular type existing in an area between a starting position of the converted registration character string and the particular position of the converted registration character string.

4. A recording medium according to claim 3 in which each special character is a space frequently occurring in the text written in Hangul language or a space frequently occurring in the text written in English to divide words.

5. A recording medium for recording information of a text in which each of a plurality of special characters of the same character type is intermittently arranged in a plurality of general characters classified into a plurality of character types, comprising:

a character chain recording region for recording all registration two-character chains detected from a converted registration character string which is produced from a registration character string of the text by replacing each special character arranged in the registration character string with a first particular type of symbolic character determined according to a type of one general character adjacent to the special character and a second particular type of symbolic character determined according to a type of the other general character adjacent to the special character, each registration two-character chain including a fore general character or a fore symbolic character and a rear general character or a rear symbolic character arranged just after the fore character in the converted registration character string, and the rear general character or the rear symbolic character of a first registration two-character chain agreeing with the fore general character or the fore symbolic character of a second registration two-character chain placed just before the first registration two-character chain in the converted registration character string; and

an occurrence frequency recording region for recording a pair of occurrence frequencies of the fore general character or the fore symbolic character and the rear general character or the rear symbolic character of each registration two-character chain recorded in the character chain recording region as a registration occurrence frequency set, the occurrence frequency of one general character or symbolic character of a particular type placed in a particular position of the converted registration character string denoting the number of general characters or symbolic characters of the same particular type existing in an area between a starting position of the converted registration character string and the particular position of the converted registration character string.

6. A recording medium according to claim 5 in which each special character is a space frequently occurring in the text written in Hangul language or a space frequently occurring in the text written in English to divide words.

7. A recording medium for recording information of a text in which each of a plurality of special characters of the same character type is intermittently arranged in a plurality of general characters classified into a plurality of character types, comprising:

a first character chain recording region for recording all general two-character chains detected from a registration character string of the text, each general two-character chain including a fore general character and a rear general character arranged just after the fore general character in the registration character string, and the rear general character of each general two-character chain agreeing with the fore general character of another general two-character chain;

a second character chain recording region for recording a plurality of two-character chain sets, respectively composed of a first two-character chain including a fore general character and a rear general character in that order, a second two-character chain including the fore general character and one special character in that order and a third two-character chain including the special character and the rear general character, each two-character chain set being produced from one of all special three-character chains detected from the retrieval character string, each special three-character chain including the fore general character, the special character and the rear general character arranged in that order in the text, the rear general character of one special three chain placed just before one general two-character chain recorded in the first character chain recording region agreeing with the fore general character of the general two-character chain, the fore general character of one special three-character chain placed just after one general two-character chain recorded in the first character chain recording region agreeing with the rear general character of the general two-character chain, and the rear general character of a first special three-character chain placed just before a second special three-character chain agreeing with the fore general character of the second special three-character chain;

a first occurrence frequency recording region for recording a pair of occurrence frequencies of the fore and rear general characters of each general two-character chain recorded in the first character chain recording region as a general occurrence frequency set, the occurrence frequency of one general character of a particular type placed in a particular position of the registration character string denoting the number of general characters of the same particular type existing in an area between a starting position of the registration character string and the particular position of the registration character string; and

a second occurrence frequency recording region for recording a pair of occurrence frequencies of the fore and rear general characters of each first two-character chain recorded in the second character chain recording region as a first special occurrence frequency set, recording a pair of occurrence frequencies of the fore general character and the special character of each second two-character chain recorded in the second character chain recording region as a second special occurrence frequency set on condition that the occurrence frequency of the special character is set to a fixed value, and recording a pair of occurrence frequencies of the special character and the rear general character of each third two-character chain recorded in the second character chain recording region as a third special occurrence frequency set on condition that the occurrence frequency of the special character is set to the fixed value.

8. A recording medium according to claim 7 in which each special character is a space frequently occurring in the text written in Hangul language or a space frequently occurring in the text written in English to divide words.

9. A recording medium for recording information of a text in which each of a plurality of special characters of the same character type is intermittently arranged in a plurality of general characters classified into a plurality of character types, comprising:

a first character chain recording region for recording all general two-character chains detected from a registration character string of the text, each general two-character chain including a fore general character and a rear general character arranged just after the fore general character in the registration character string, and the rear general character of each general two-character chain agreeing with the fore general character of another general two-character chain;

a second character chain recording region for recording a plurality of two-character chain sets, respectively composed of a fore two-character chain including a fore general character and one special character in that order and a rear two-character chain including the special character and a rear general character, each two-character chain set being produced from one of all special three-character chains detected from the retrieval character string, each special three-character chain including the fore general character, the special character and the rear general character arranged in that order in the text, the rear general character of one special three-character chain placed just before one general two-character chain recorded in the first character chain recording region agreeing with the fore general character of the general two-character chain, the fore general character of one special three-character chain placed just after one general two-character chain recorded in the first character chain recording region agreeing with the rear general character of the general two-character chain, and the rear general character of a first special three-character chain placed just before a second special three-character chain agreeing with the fore general character of the second special three-character chain;

a first occurrence frequency recording region for recording a pair of occurrence frequencies of the fore and rear general characters of each general two-character chain recorded in the first character chain recording region as a general occurrence frequency set, the occurrence frequency of one general character of a particular type placed in a particular position of the registration character string denoting the number of general characters of the same particular type existing in an area between a starting position of the registration character string and the particular position of the registration character string; and

a second occurrence frequency recording region for recording a pair of occurrence frequencies of the fore general character and the special character of each fore two-character chain recorded in the second character chain recording region as a first special occurrence frequency set on condition that the occurrence frequency of the special character is set to zero, and recording a pair of occurrence frequencies of the special character and the rear general character of each rear two-character chain recorded in the second character chain recording region as a second special occurrence frequency set on condition that the occurrence frequency of the special character is set to zero.

10. A recording medium according to claim 9 in which each special character is a space frequently occurring in the text written in Hangul language or a space frequently occurring in the text written in English to divide words.

11. A recording medium for recording information of a text in which each of a plurality of special characters of the same character type is intermittently arranged in a plurality of general characters classified into a plurality of character types, comprising:

a character chain recording region for recording all general two-character chains and all character chain sets detected from a registration character string of the text, each general two-character chain including a fore general character and a rear general character arranged just after the fore general character in the registration character string, each character chain set being composed of a fore two-character chain and a rear two-character chain, each character chain set being obtained by detecting all special three-character chains including a fore general character, one special character and a rear general character arranged in that order in the text, converting the special character of each special three-character chain into a central general character having the same character type as that of the rear general character to produce a converted three-character chain including the fore general character, the central general character and the rear general character and decomposing each converted three-character chain into one fore two-character chain including the fore general character and the central general character and one rear two-character chain including the central general character and the rear general character, the fore general character of each special three-character chain placed just after one general two-character chain agreeing with the rear general character of the general two-character chain, the rear general character of each special three-character chain placed just before one general two-character chain agreeing with the fore general character of the general two-character chain, the rear general character of a first general two-character chain placed just before a second general two-character chain agreeing with the fore general character of the second general two-character chain, and the rear general character of a first special three-character chain placed just before a second special three-character chain agreeing with the fore general character of the second special three-character chain; and

an occurrence frequency recording region for recording a pair of occurrence frequencies of the fore general character and the rear general character of each general two-character chain recorded in the character chain recording region as an occurrence frequency set, recording a pair of occurrence frequencies of the fore general character and the central general character of the fore two-character chain of each character chain set recorded in the character chain recording region as an occurrence frequency set, and recording a pair of occurrence frequencies of the central general character and the rear general character of the rear two-character chain of each character chain set recorded in the character chain recording region as an occurrence frequency set by setting the occurrence frequency of the rear general character as that of the central general character, the occurrence frequency of one general character of a particular type placed in a particular position of the registration character string denoting the number of general characters of the same particular type existing in an area between a starting position of the registration character string and the particular position of the registration character string.

12. A recording medium according to claim 11 in which each special character is a space frequently occurring in the text written in Hangul language or a space frequently occurring in the text written in English to divide words.

13. A recording medium for recording information of a text in which each of a plurality of special characters of the same character type is intermittently arranged in a plurality of general characters classified into a plurality of character types, comprising:

a character chain recording region for recording all general two-character chains and all special two-character chains detected from a registration character string of the text, each general two-character chain including a fore general character and a rear general character arranged just after the fore general character in the registration character string, each special two-character chain including one special character as a fore character and a rear general character or a fore general character and one special character as a rear character arranged in that order in the registration character string, the fore character of each special two-character chain placed just after one general two-character chain agreeing with the rear general character of the general two-character chain, the rear character of each special two-character chain placed just before one general two-character chain agreeing with the fore general character of the general two-character chain, the rear general character of a first general two-character chain placed just before a second general two-character chain agreeing with the fore general character of the second general two-character chain, and the rear character of a first special two-character chain placed just before a second special two-character chain agreeing with the fore character of the second special two-character chain; and

an occurrence frequency recording region for recording a pair of occurrence frequencies of the fore general character and the rear general character of each general two-character chain recorded in the character chain recording region as an occurrence frequency set, recording an occurrence frequency of the fore or rear general character and a limited occurrence frequency of the rear or fore special character of each special two-character chain recorded in the character chain recording region as an occurrence frequency set, the occurrence frequency of each character of a particular type placed in a particular position of the registration character string denoting the number of characters of the same particular type existing in an area between a starting position of the registration character string and the particular position of the registration character string, and the limited occurrence frequency of each special character being obtained by setting a plurality of N limited values (N is an integer higher than 1) different from each other and lower than or equal to a maximum value as a set of N limited values and allocating the N limited values to each group of N special characters arranged in the registration character string on condition that each limited value selected in a predetermined order from one group of N limited values is allocated as one limited occurrence frequency to one special character selected from one group of N special characters in the order of arranging the special characters in the registration character string.

14. A recording medium according to claim 13 in which each special character is a space frequently occurring in the text written in Hangul language or a space frequently occurring in the text written in English to divide words.

15. A recording medium according to claim 13 in which the set of N limited values is obtained by dividing an occurrence frequency of each special character by the maximum value to obtain a remainder for each special character, setting one remainder having a value of 0 to the maximum value, and setting the limited occurrence frequency of each special character to the remainder corresponding to the special character.

16. A recording medium for recording information of a text in which each of a plurality of special characters of the same character type is intermittently arranged in a plurality of general characters classified into a plurality of character types, comprising:

a character chain recording area for recording all general two-character chains and all special two-character chains detected from a registration character string of the text, each general two-character chain including a fore general character and a rear general character arranged just after the fore general character in the registration character string, each special two-character chain including a fore special character and a rear general character or a fore general character and a rear special character arranged in that order in the registration character string, the fore character of each special two-character chain placed just after one general two-character chain agreeing with the rear general character of the general two-character chain, the rear character of each special two-character chain placed just before one general two-character chain agreeing with the fore general character of the general two-character chain, the rear general character of a first general two-character chain placed just before a second general two-character chain agreeing with the fore general character of the second general two-character chain, and the rear character of a first special two-character chain placed just before a second special two-character chain agreeing with the fore character of the second special two-character chain; and

an occurrence frequency recording area for recording a pair of occurrence frequencies of the fore general character and the rear general character of each general two-character chain recorded in the character chain recording area as an occurrence frequency set and recording a pair of occurrence frequencies of the fore character and the rear character of each special two-character chain recorded in the character chain recording area as an occurrence frequency set, the occurrence frequency of each particular special character placed in a particular position of the registration character string denoting the number of special characters existing in an area between a starting position of the registration character string and the particular position of the registration character string, the occurrence frequency of each particular general character of a particular type placed in a particular position of the registration character string denoting the number of general characters of the same particular type existing in an area between a starting position of the registration character string and the particular position of the registration character string in cases where each of the general characters including the particular general character is not adjacent to any special character, and the occurrence frequency of each particular general character being set to the same prescribed value in cases where each of the general characters including the particular general character is adjacent to one special character.

17. A recording medium according to claim 16 in which each special character is a space frequently occurring in the text written in Hangul language or a space frequently occurring in the text written in English to divide words.

18. A recording medium according to claim 16 in which the occurrence frequency recording area comprises:

a first occurrence frequency recording region for recording one occurrence frequency of one fore general character of each general two-character chain;

a second occurrence frequency recording region for recording one occurrence frequency of one rear general character of each general two-character chain;

a third occurrence frequency recording region for recording one occurrence frequency of one fore special character of each special two-character chain having the fore special character;

a fourth occurrence frequency recording region for recording one occurrence frequency of one rear general character of each special two-character chain having the fore special character;

a fifth occurrence frequency recording region for recording one occurrence frequency of one fore general character of each special two-character chain having the rear special character; and

a sixth occurrence frequency recording region for recording one occurrence frequency of one rear special character of each special two-character chain having the rear special character,

a memory size of the first occurrence frequency recording region is the same as that of the second occurrence frequency recording region, a memory size of the third occurrence frequency recording region is larger than that of the fourth occurrence frequency recording region, and a memory size of the sixth occurrence frequency recording region is larger than that of the fifth occurrence frequency recording region.

19. A recording medium for recording information of a text in which each of a plurality of special characters of the same character type is intermittently arranged in a plurality of general characters classified into a plurality of character types and at least two general characters exist between each pair of special characters, comprising:

a character chain recording area for recording all general two-character chains detected from a registration character string of the text and recording a special two-character chain detected from the registration character string for each special character, each general two-character chain including a fore general character and a rear general character arranged just after the fore general character in the registration character string, each special two-character chain including a fore general character placed two characters before one special character and a rear general character placed just after the special character in the registration character string, the fore character of a first general two-character chain placed just after a second general two-character chain agreeing with the rear general character of the second general two-character chain; and

an occurrence frequency recording area for recording a pair of occurrence frequencies of the fore general character and the rear general character of each general two-character chain recorded in the character chain recording area as an occurrence frequency set and recording a pair of occurrence frequencies of the fore general character and the rear general character of each special two-character chain recorded in the character chain recording area as an occurrence frequency set, the occurrence frequency of each particular general character of a particular type placed in a particular position of the registration character string denoting the number of general characters of the same particular type existing in an area between a starting position of the registration character string and the particular position of the registration character string.

20. A recording medium according to claim 19 in which each special character is a space frequently occurring in the text written in Hangul language or a space frequently occurring in the text written in English to divide words.

21. A recording medium for recording information of a text in which each of a plurality of special characters of the same character type is intermittently arranged in a plurality of general characters classified into a plurality of character types, comprising:

a first character chain recording region for recording all general two-character chains detected from a registration character string of the text, each general two-character chain including a fore general character and a rear general character arranged just after the fore general character in the registration character string, and the rear general character of one general two-character chain agreeing with the fore general character of another general two-character chain for each general two-character chain;

a second character chain recording region for recording all special character chains detected from the registration character string of the text, each special character chain including a fore general character, one special character and a rear general character arranged in that order in the registration character string, the rear general character of one special character chain placed just before one general two-character chain recorded in the first character chain recording region agreeing with the fore general character of the general two-character chain, the fore general character of one special character chain placed just after one general two-character chain recorded in the first character chain recording region agreeing with the rear general character of the general two-character chain, and the rear general character of a first special character chain placed just before a second special character chain agreeing with the fore general character of the second special character chain; and

a position number recording region for recording a position number of each general two-character chain recorded in the first character chain recording region and recording a position number of each special character chain recorded in the second character chain recording region, the position number of each character chain representing the general two-character chains and the special character chains being indicated by an occurrence position number of the fore or rear general character of the character chain, and the occurrence position number of each general character being obtained by numbering all general characters of the registration character string in the order of arranging the general characters in the registration character string.

22. A recording medium according to claim 21 in which each special character is a space frequently occurring in the text written in Hangul language or a space frequently occurring in the text written in English to divide words.

23. A recording medium for recording information of a text in which each of a plurality of special characters of the same character type is intermittently arranged in a plurality of general characters classified into a plurality of character types, comprising:

a character chain recording region for recording all registration two-character chains detected from a converted registration character string which is produced from a registration character string of the text by converting each special character arranged in the registration character string into a particular type of symbolic character determined according to a type of a general character spaced at N characters (N is an integral number equal to or higher than 1) apart from the special character, each registration two-character chain including a fore general character or a fore symbolic character and a rear general character or a rear symbolic character arranged just after the fore character in the converted registration character string, the rear general character or the rear symbolic character of a first registration two-character chain agreeing with the fore general character or the fore symbolic character of a second registration two-character chain placed just before the first registration two-character chain; and

a position number recording region for recording a position number of each registration two-character chain recorded in the character chain recording region, the position number of each registration two-character chain being indicated by an occurrence position number of the fore or rear character of the registration two-character chain, and the occurrence position number of each character being obtained by numbering all general characters and symbolic characters of the converted registration character string in the order of arranging the general characters and symbolic characters in the converted registration character string.

24. A recording medium according to claim 23 in which each special character is a space frequently occurring in the text written in Hangul language or a space frequently occurring in the text written in English to divide words.

25. A recording medium for recording information of a text in which each of a plurality of special characters of the same character type is intermittently arranged in a plurality of general characters classified into a plurality of character types, comprising:

a character chain recording region for recording all registration two-character chains detected from a converted registration character string which is produced from a registration character string of the text by replacing each special character arranged in the registration character string with a first particular type of symbolic character determined according to a type of one general character adjacent to the special character and a second particular type of symbolic character determined according to a type of the other general character adjacent to the special character, each registration two-character chain including a fore general character or a fore symbolic character and a rear general character or a rear symbolic character arranged just after the fore character in the converted registration character string, and the rear general character or the rear symbolic character of a first registration two-character chain agreeing with the fore general character or the fore symbolic character of a second registration two-character chain placed just before the first registration two-character chain in the converted registration character string; and

a position number recording region for recording a position number of each registration two-character chain recorded in the character chain recording region, the position number of each registration two-character chain being indicated by an occurrence position number of the fore or rear character of the registration two-character chain, and the occurrence position number of each character being obtained by numbering all general characters and symbolic characters of the converted registration character string in the order of arranging the general characters and symbolic characters in the converted registration character string.

26. A recording medium according to claim 25 in which each special character is a space frequently occurring in the text written in Hangul language or a space frequently occurring in the text written in English to divide words.

27. A recording medium for recording information of a text in which each of a plurality of special characters of the same character type is intermittently arranged in a plurality of general characters classified into a plurality of character types, comprising:

a character chain recording region for recording all general two-character chains and all character chain sets detected from a registration character string of the text, each general two-character chain including a fore general character and a rear general character arranged just after the fore general character in the registration character string, each character chain set being composed of a fore two-character chain and a rear two-character chain obtained by detecting all special three-character chains, respectively including a fore general character, one special character and a rear general character arranged in that order in the registration character string, converting the special character of each special three-character chain into a central general character having the same character type as that of the rear general character to produce a converted special three-character chain including the fore general character, the central general character and the rear general character and decomposing each converted special three-character chain into one fore two-character chain including the fore general character and the central general character as a rear general character and one rear two-character chain including the central general character as a fore general character and the rear general character, the fore general character of each special three-character chain placed just after one general two-character chain agreeing with the rear general character of the general two-character chain, the rear general character of each special three-character chain placed just before one general two-character chain agreeing with the fore general character of the general two-character chain, the rear general character of a first general two-character chain placed just before a second general two-character chain agreeing with the fore general character of the second general two-character chain, and the rear general character of a first special three-character chain placed just before a second special three-character chain agreeing with the fore general character of the second special three-character chain; and

a position number recording region for recording a position number of each general two-character chain recorded in the character chain recording region, recording a position number of the fore two-character chain of each character chain set recorded in the character chain recording region, and recording a position number of the rear two-character chain of each character chain set recorded in the character chain recording region, the position number of each two-character chain being indicated by an occurrence position number of the fore or rear general character of the two-character chain, the occurrence position number of each general character being obtained by numbering all general characters of the registration character string in the order of arranging the general characters in the registration character string, and the occurrence position number of the fore general character of each rear two-character chain being set to that of the rear general character of the rear two-character chain.

28. A recording medium according to claim 27 in which each special character is a space frequently occurring in the text written in Hangul language or a space frequently occurring in the text written in English to divide words.

29. A recording medium for recording information of a text in which each of a plurality of special characters of the same character type is intermittently arranged in a plurality of general characters classified into a plurality of character types, comprising:

a general character chain recording region for recording all general two-character chains and all character chain sets detected from a registration character string of the text, each general two-character chain including a fore general character and a rear general character arranged just after the fore general character in the registration character string, each character chain set being composed of a first two-character chain, a second two-character chain arranged just after the first two-character chain in the registration character string and a third two-character chain arranged just after the second two-character chain in the registration character string, the first, second and the third two-character chains of each character chain set being obtained by detecting all special three-character chains, respectively including a fore general character, one special character and a rear general character arranged in that order in the registration character string and decomposing each special three-character chain into one first two-character chain including the fore general character and the rear general character, one second two-character chain including the fore general character and the special character as a rear character and one third two-character chain including the special character as a fore character and the rear general character, the first two-character chains and the second two-character chains being arranged in the order of arranging the first and second two-character chains in the registration character string for each type of fore general character, the fore general character of each special three-character chain placed just after one general two-character chain agreeing with the rear general character of the general two-character chain, the rear general character of each special three-character chain placed just before one general two-character chain agreeing with the fore general character of the general two-character chain, the rear general character of a first general two-character chain placed just before a second general two-character chain agreeing with the fore general character of the second general two-character chain, and the rear general character of a first special three-character chain placed just before a second special three-character chain agreeing with the fore general character of the second special three-character chain; and

a position number recording region for recording a position number of each general two-character chain recorded in the character chain recording region, recording a position number of each first two-character chain recorded in the character chain recording region, recording a position number of each second two-character chain recorded in the character chain recording region, and recording a position number of each third two-character chain recorded in the character chain recording region, the position number of each general two-character chain being indicated by an occurrence position number of the fore general character of the general two-character chain, the position number of each first two-character chain being indicated by an occurrence position number of the fore general character of the first two-character chain, the position number of each second two-character chain being indicated by an occurrence position number of the rear general character of the second two-character chain, the occurrence position number of each general character being obtained by numbering all general characters of the registration character string in the order of arranging the general characters in the registration character string, the position number of each third two-character chain being set to a fixed value, and the position numbers of the first and second two-character chains being arranged according to the arranging order of the first and second two-character chains.

30. A recording medium according to claim 29 in which each special character is a space frequently occurring in the text written in Hangul language or a space frequently occurring in the text written in English to divide words.

31. A recording medium for recording information of a text in which each of a plurality of special characters of the same character type is intermittently arranged in a plurality of general characters classified into a plurality of character types, comprising:

a character chain recording region for recording all first two-character chains detected from a converted registration character string, which is obtained from the registration character string by converting each pair of one special character and a general character placed just after the special character in the registration character string into a symbolic character determined in correspondence to a character type of the general character, to include a fore general character and a rear general character or a rear symbolic character arranged just after the fore general character in each first two-character chain, recording all second two-character chains, respectively including a fore general character placed two characters before one symbolic character and the symbolic character as a rear character, detected from the converted registration character string, and recording all special two-character chains, respectively including a fore symbolic character and a rear general character arranged in that order in the converted registration character string, detected from the converted registration character string, each group of first and second two-character chains respectively including the same type of fore general character and one group of special two-character chains respectively including one type of symbolic character determined in correspondence to the type of fore general character being arranged in one two-character chain table to produce the two-character chain table for each type of fore general character; and

a position number recording region for recording a position number of each first two-character chain recorded in the character chain recording region, recording a position number of each second two-character chain recorded in the character chain recording region, and recording a position number of each special two-character chain recorded in the character chain recording region, the position number of each two-character chain being indicated by an occurrence position number of the fore character of the two-character chain, and the occurrence position numbers of the general and symbolic characters being obtained by numbering all general and symbolic characters of the converted registration character string in the order of arranging the general and symbolic characters in the converted registration character string.

32. A recording medium according to claim 31 in which each special character is a space frequently occurring in the text written in Hangul language or a space frequently occurring in the text written in English to divide words.

33. A character string collating apparatus for collating a registration character string with a registration character string of a text, in which each of a plurality of special characters of the same character type is intermittently arranged in a plurality of general characters classified into a plurality of character types, to retrieve a particular character string agreeing with the retrieval character string from the registration character string, comprising:

first registration character chain detecting means for detecting all registration general two-character chains existing in the registration character string of the text, each registration general two-character chain including a fore general character and a rear general character arranged just after the fore general character in the registration character string, and the rear general character of a first registration general two-character chain placed just before a second registration general two-character chain in the registration character string agreeing with the fore general character of the second registration general two-character chain;

second registration character chain detecting means for detecting a registration special character chain from the registration character string of the text for each special character, each registration special character chain including a fore general character, one special character and a rear general character arranged in that order in the text, the rear general character of one registration general two-character chain agreeing with the fore general character of one registration special character chain placed just after the registration general two-character chain in the registration character string, and the rear general character of one registration special character chain agreeing with the fore general character of one registration general two-character chain placed just after the registration special character chain in the registration character string;

first occurrence frequency calculating means for calculating a pair of occurrence frequencies of the fore and rear general characters of each registration general two-character chain detected by the first registration character chain detecting means as an occurrence frequency set, the occurrence frequency of each particular general character of a particular type placed in a particular position of the registration character string denoting the number of general characters of the same particular type existing in an area between a starting position of the registration character string and the particular position of the registration character string;

second occurrence frequency calculating means for calculating a pair of occurrence frequencies of the fore and rear general characters of each registration special character chain detected by the second registration character chain detecting means as an occurrence frequency set;

registration character chain classifying means for classifying each group of registration general two-character chains, which respectively include the same type of fore general character and the same type of rear general character, detected by the first registration character chain detecting means into one general two-character chain type, and classifying each group of registration special character chains, which respectively include the same type of fore general character and the same type of rear general character, detected by the second registration character chain detecting means into one special character chain type;

first retrieval character chain detecting means for detecting all retrieval general two-character chains existing in the retrieval character string, each retrieval general two-character chain including a fore general character and a rear general character arranged just after the fore general character in the retrieval character string, and the rear general character of a first retrieval general two-character chain placed just before a second retrieval general two-character chain in the retrieval character string agreeing with the fore general character of the second retrieval general two-character chain;

second retrieval character chain detecting means for detecting all retrieval special character chains existing in the retrieval character string, each retrieval special character chain including a fore general character, one special character and a rear general character arranged in that order in the retrieval character string, the rear general character of one retrieval general two-character chain agreeing with the fore general character of one retrieval special character chain placed just after the retrieval general two-character chain in the retrieval character string, and the rear general character of one retrieval special character chain agreeing with the fore general character of one retrieval general two-character chain placed just after the retrieval special character chain in the retrieval character string;

control means for specifying a plurality of particular general two-character chain type and a particular special character chain types, which are classified by the registration character chain classifying means, corresponding to the retrieval general two-character chains detected by the first retrieval character chain detecting means and the retrieval special character chains detected by the second retrieval character chain detecting means, detecting a retrieval chain order of arranging the retrieval general two-character chains and the registration special character chains in the retrieval character string, and determining a particular chain order corresponding to the retrieval chain order for the particular general two-character chain types and the particular special character chain types;

collating means for repeatedly receiving the occurrence frequency sets of one particular general two-character chain type or one particular special character chain type specified by the control means from the first occurrence frequency calculating means or the second occurrence frequency calculating means in the particular chain order for the particular general two-character chain types and the particular special character chain types, and performing a collating operation for the particular general two-character chain types and the particular special character chain types according to the occurrence frequencies of the particular general two-character chain types and the occurrence frequencies of the particular special character chain types, detecting a series of particular occurrence frequency sets of the particular general two-character chain types and the particular special character chain types on condition that a plurality of particular registration general two-character chains and particular registration special character chains having the particular occurrence frequency sets are connected in series in the registration character string; and

character string detecting means for detecting a particular character string agreeing with the retrieval character string from the registration character string according to the particular registration general two-character chains, the particular registration special character chains, the particular occurrence frequency sets of the particular registration general two-character chains and the particular occurrence frequency sets of the particular registration special character chains detected by the collating means.

34. A character string collating apparatus according to claim 33 in which each special character is a space frequently occurring in the text written in Hangul language or a space frequently occurring in the text written in English to divide words.

35. A character string collating apparatus according to claim 33 in which the collating operation performed by the collating means is that one occurrence frequency of the fore general character in each occurrence frequency set of either a first particular general two-character chain type or a first particular special character chain type is collated with an occurrence frequency of the rear general character in a particular occurrence frequency set of either a second particular general two-character chain type or a second particular special character chain type placed just before the first particular general two-character chain type or the first particular special character chain type in the particular chain order to determine a particular occurrence frequency set of either the first particular general two-character chain type or the first particular special character chain type on condition that an occurrence frequency of the fore general character in the particular occurrence frequency set of either the first particular general two-character chain type or the first particular special character chain type agrees with the occurrence frequency of the rear general character in the particular occurrence frequency set of either the second particular general two-character chain type or the second particular special character chain type.

36. A character string collating apparatus according to claim 33, further comprising:

recording means for recording the general two-character chain types and the special character chain types classified by the registration character chain classifying means, the occurrence frequency sets calculated by the first occurrence frequency calculating means for each general two-character chain type and the occurrence frequency sets calculated by the second occurrence frequency calculating means for each special character chain type, the particular general two-character chain types and the particular special character chain types recorded in the recording means being specified by the control means, and the occurrence frequency sets recorded in the recording means being received by the collating means under the control of the control means.

37. A character string collating apparatus according to claim 36 in which an identifier is attached to the special character chain types to distinguish the special character chain types from the general two-character chain types.

38. A character string collating apparatus according to claim 33 in which a series of special characters arranged in the registration character string or the retrieval character string is detected as a single special character by the second registration character chain detecting means or the second retrieval character chain detecting means.

39. A character string collating apparatus for collating a retrieval character string with a registration character string of a text, in which each of a plurality of special characters of the same character type is intermittently arranged in a plurality of general characters classified into a plurality of character types, to retrieve a particular character string agreeing with the retieval character string from the registration character string, comprising:

registration character string converting means for producing a converted registration character string from the registration character string by converting each special character arranged in the registration character string into a symbolic character according to a general-symbolic character type relationship between a character type of the symbolic character and a character type of a general character spaced at N characters (N is an integral number equal to or higher than 1) apart from the special character;

registration character chain detecting means for detecting all registration two-character chains existing in the converted registration character string produced by the registration character string converting means, each registration two-character chain including a fore general character or a fore symbolic character and a rear general character or a rear symbolic character arranged just after the fore character in the converted registration character string;

occurrence frequency calculating means for calculating a pair of occurrence frequencies of the fore general character or the fore symbolic character and the rear general character or the rear symbolic character of each registration two-character chain detected by the registration character chain detecting means as an occurrence frequency set, the occurrence frequency of each particular general character or symbolic character of a particular type placed in a particular position of the converted registration character string denoting the number of general characters or symbolic characters of the same particular type existing in an area between a starting position of the converted registration character string and the particular position of the converted registration character string;

registration character chain classifying means for classifying each group of registration two-character chains, which respectively include the same type of fore general character or the same type of fore symbolic character and the same type of rear general character or the same type of rear symbolic character, detected by the registration character chain detecting means into one two-character chain type;

retrieval character string converting means for producing a converted retrieval character string from the retrieval character string by converting each special character arranged in the registration character string into a symbolic character according to the general-symbolic character type relationship;

retrieval character chain detecting means for detecting all retrieval two-character chains existing in the converted retrieval character string, each retrieval two-character chain including a fore general character or a fore symbolic character and a rear general character or a rear symbolic character arranged just after the fore character in the converted retrieval character string;

control means for specifying a plurality of particular two-character chain type, which are classified by the registration character chain classifying means, corresponding to the retrieval two-character chains detected by the retrieval character chain detecting means, detecting a retrieval chain order of arranging the retrieval two-character chains in the converted retrieval character string, and determining a particular chain order corresponding to the retrieval chain order for the particular two-character chain types;

collating means for repeatedly receiving the occurrence frequency sets of one particular two-character chain type specified by the control means from the occurrence frequency calculating means in the particular chain order for the particular two-character chain types, performing a collating operation for the particular two-character chain types according to the occurrence frequency sets of the particular two-character chain types, and detecting a series of particular occurrence frequency sets of a series of particular registration two-character chains corresponding to the particular two-character chain types arranged in the particular chain order on condition that the series of particular registration two-character chains having the particular occurrence frequency sets are connected in series in the converted registration character string; and

character string detecting means for detecting a particular character string agreeing with the retrieval character string from the registration character string according to the particular registration two-character chains and the particular occurrence frequency sets of the particular registration two-character chains detected by the collating means.

40. A character string collating apparatus according to claim 39 in which each special character is a space frequently occurring in the text written in Hangul language or a space frequently occurring in the text written in English to divide words.

41. A character string collating apparatus according to claim 39 in which the collating operation performed by the collating means is that one occurrence frequency of the fore general character in each occurrence frequency set of a first particular two-character chain type is collated with an occurrence frequency of the rear general character in a particular occurrence frequency set of a second particular two-character chain type placed just before the first particular two-character chain type in the particular chain order to determine a particular occurrence frequency set of the first particular two-character chain type on condition that an occurrence frequency of the fore general character in the particular occurrence frequency set of the first particular two-character chain type agrees with the occurrence frequency of the rear general character in the particular occurrence frequency set of the second particular two-character chain type.

42. A character string collating apparatus according to claim 39, further comprising:

recording means for recording the registration two-character chain types classified by the registration character chain classifying means and the occurrence frequency sets calculated by the occurrence frequency calculating means for each two-character chain type, the particular two-character chain type recorded in the recording means being specified by the control means, and the occurrence frequency sets of the particular two-character chain type recorded in the recording means being received by the collating means under the control of the control means.

43. A character string collating apparatus according to claim 39 in which a series of special characters arranged in the registration character string or the retrieval character string is converted into one symbolic character determined according to a character type of a general character spaced at N characters apart from the series of special characters by the registration character string converting means or the retrieval character string converting means.

44. A character string collating apparatus according to claim 39 in which the converted registration character string is produced from the registration character string by the registration character string converting means by converting each special character arranged in the registration character string into a symbolic character determined according to a type of general character adjacent to the special character.

45. A character string collating apparatus for collating a retrieval character string with a registration character string of a text, in which each of a plurality of special characters of the same character type is intermittently arranged in a plurality of general characters classified into a plurality of character types, to retrieve a particular character string agreeing with the retrieval character string from the registration character string, comprising:

registration character string converting means for producing a converted registration character string from the registration character string by replacing each special character arranged in a registration character string of the text with a first symbolic character and a second symbolic character according to a general-symbolic character type relationship in which a character type of the first symbolic character corresponds to a character type of one general character adjacent to the special character and a character type of the second symbolic character corresponds to a character type of the other general character adjacent to the special character;

registration character chain detecting means for detecting all registration two-character chains existing in the converted registration character string produced by the registration character string converting means, each registration two-character chain including a fore general character or a fore symbolic character and a rear general character or a rear symbolic character arranged just after the fore character in the converted registration character string;

occurrence frequency calculating means for calculating a pair of occurrence frequencies of the fore general character or the fore symbolic character and the rear general character or the rear symbolic character of each registration two-character chain detected by the registration character chain detecting means as an occurrence frequency set, the occurrence frequency of each particular general character or symbolic character of a particular type placed in a particular position of the converted registration character string denoting the number of general characters o r symbolic characters of the same particular type existing in an area between a starting position of the converted registration character string and the particular position of the converted registration character string;

registration character chain classifying means for classifying the registration two-character chains, which respectively include the same type of fore general character or the same type of fore symbolic character and the same type of rear general character or the same type of rear symbolic character, detected by the registration character chain detecting means into one two-character chain type;

retrieval character string converting means for producing a converted retrieval character string from the retrieval character string by replacing each special character arranged in the retrieval character string with a symbolic character and another second symbolic character according to the general-symbolic character type relationship;

retrieval character chain detecting means for detecting all retrieval two-character chains existing in the converted retrieval character string, each retrieval two-character chain including a fore general character or a fore symbolic character and a rear general character or a rear symbolic character arranged just after the fore character in the converted registration character string;

control means for specifying a plurality of particular two-character chain type, which are classified by the registration character chain classifying means, corresponding to the retrieval two-character chains detected by the retrieval character chain detecting means, detecting a retrieval chain order of arranging the retrieval two-character chains in the converted registration character string, and determining a particular chain order corresponding to the registration chain order for the particular two-character chain types;

collating means for repeatedly receiving the occurrence frequency sets of one particular two-character chain type specified by the control means from the occurrence frequency calculating means in the particular chain order for the particular two-character chain types, performing a collating operation for the particular two-character chain types according to the occurrence frequency sets of the particular two-character chain types, and detecting a series of particular occurrence frequency sets of a series of particular registration two-character chains corresponding to the particular two-character chain types arranged in the particular chain order on condition that the series of particular registration two-character chains having the particular occurrence frequency sets are connected in series in the converted registration character string; and

character string detecting means for detecting a particular character string agreeing with the retrieval character string from the registration character string according to the particular registration two-character chains and the particular occurrence frequency sets of the particular registration two-character chains detected by the collating means.

46. A character string collating apparatus according to claim 45 in which each special character is a space frequently occurring in the text written in Hangul language or a space frequently occurring in the text written in English to divide words.

47. A character string collating apparatus according to claim 45 in which the collating operation performed by the collating means is that one occurrence frequency of the fore general character in each occurrence frequency set of a first particular two-character chain type is collated with an occurrence frequency of the rear general character in a particular occurrence frequency set of a second particular two-character chain type placed just before the first particular two-character chain type in the particular chain order to determine a particular occurrence frequency set of the first particular two-character chain type on condition that an occurrence frequency of the fore general character in the particular occurrence frequency set of the first particular two-character chain type agrees with the occurrence frequency of the rear general character in the particular occurrence frequency set of the second particular two-character chain type.

48. A character string collating apparatus according to claim 45, further comprising:

recording means for recording the registration two-character chain types classified by the registration character chain classifying means and the occurrence frequency sets calculated by the occurrence frequency calculating means for each two-character chain type, the particular two-character chain types recorded in the recording means being specified by the control means, and the occurrence frequency sets recorded in the recording means being received by the collating means under the control of the control means.

49. A character string collating apparatus according to claim 45 in which a series of special characters arranged in the registration character string or the retrieval character string is replaced with a particular type of symbolic character determined according to a type of one general character adjacent to the series of special characters and another particular type of symbolic character determined according to a type of the other general character adjacent to the series of special characters.

50. A character string collating apparatus for collating a retrieval character string with a registration character string of a text, in which each of a plurality of special characters of the same character type is intermittently arranged in a plurality of general characters classified into a plurality of character types, to retrieve a particular character string agreeing with the retrieval character string from the registration character string, comprising:

registration character chain detecting means for detecting all registration general two-character chains existing in the registration character string of the text, each registration general two-character chain including a fore general character and a rear general character arranged just after the fore general character in the registration character string, and the rear general character of a first registration general two-character chain placed just before a second registration general two-character chain in the registration character string agreeing with the fore general character of the second registration general two-character chain;

registration character chain producing means for detecting a registration special three-character chain, including a fore general character, one special character and a rear general character arranged in that order in the registration character string, from the registration character string for each special character, and producing a first registration two-character chain including the fore general character and the rear general character in that order, a second registration two-character chain including the fore general character and the special character in that order and a third registration two-character chain including the special character and the rear general character from each registration special three-character chain, the rear general character of one registration general two-character chain placed just before one registration special three-character chain in the registration character string agreeing with the fore general character of the first registration two-character chain produced from the registration special three-character chain, and the fore general character of one registration general two-character chain placed just after one registration special three-character chain in the registration character string agreeing with the rear general character of the third registration two-character chain produced from the registration special three-character chain;

first occurrence frequency calculating means for calculating a pair of occurrence frequencies of the fore and rear general characters of each registration general two-character chain detected by the registration character chain detecting means as an occurrence frequency set, the occurrence frequency of each particular general character of a particular type placed in a particular position of the registration character string denoting the number of general characters of the same particular type existing in an area between a starting position of the registration character string and the particular position of the registration character string;

second occurrence frequency calculating means for calculating a pair of occurrence frequencies of the fore and rear general characters of each first registration two-character chain produced by the registration character chain producing means as an occurrence frequency set, determining an occurrence frequency set of each second registration two-character chain produced by the registration character chain producing means by setting an occurrence frequency of the special character of the second registration two-character chain to a fixed value and calculating an occurrence frequency of the fore general character of the second registration two-character chain, and determining an occurrence frequency set of each third registration two-character chain produced by the registration character chain producing means by setting an occurrence frequency of the special character of the third registration two-character chain to the fixed value and calculating an occurrence frequency of the rear general character of the third registration two-character chain;

registration character chain classifying means for classifying the registration general two-character chains, which respectively include the same type of fore general character and the same type of rear general character, detected by the registration character chain detecting means into one general two-character chain type, classifying the first registration two-character chain, which respectively include the same type of fore general character and the same type of rear general character, produced by the registration character chain producing means into one first two-character chain type, classifying the second registration two-character chains, which respectively include the same type of fore general character and the special character, produced by the registration character chain producing means into one second two-character chain type, and classifying the third registration two-character chain, which respectively include the special character and the same type of rear general character, produced by the registration character chain producing means into one third two-character chain types;

first retrieval character chain detecting means for detecting all retrieval general two-character chains existing in the retrieval character string, each retrieval general two-character chain including a fore general character and a rear general character arranged just after the fore general character in the retrieval character string, and the rear general character of a first retrieval general two-character chain placed just before a second retrieval general two-character chain in the retrieval character string agreeing with the fore general character of the second retrieval general two-character chain;

second retrieval character chain detecting means for detecting all retrieval special three-character chains, respectively including a fore general character, one special character and a rear general character arranged in that order in the retrieval character string, from the retrieval character string, the rear general character of one retrieval general two-character chain placed just before one retrieval special three-character chain in the retrieval character string agreeing with the fore general character of the retrieval special three-character chain, and the fore general character of one retrieval general two-character chain placed just after one retrieval special three-character chain in the retrieval character string agreeing with the rear general character of the retrieval special three-character chain;

control means for specifying a plurality of particular general two-character chain types, particular first two-character chain types, particular second two-character chain types and particular third two-character chain types, which are classified by the registration character chain classifying means, corresponding to the retrieval general two-character chains detected by the first retrieval character chain detecting means and the retrieval special three-character chains detected by the second retrieval character chain detecting means, detecting a retrieval chain order of arranging the retrieval general two-character chains and the registration special three-character chains in the retrieval character string, and determining a particular chain order corresponding to the retrieval chain order for the particular general two-character chain types, the particular first two-character chain types, the particular second two-character chain types and the particular third two-character chain types;

collating means for repeatedly receiving the occurrence frequency sets of one particular general two-character chain type, one particular first two-character chain type, one particular second two-character chain type or one particular third two-character chain type specified by the control means from the first occurrence frequency calculating means or the second occurrence frequency calculating means in the particular chain order for the particular general two-character chain types, the particular first two-character chain types, the particular second two-character chain types and the particular third two-character chain types, and performing a collating operation for the particular general two-character chain types, the particular first two-character chain types, the particular second two-character chain types and the particular third two-character chain types in which the occurrence frequencies of the occurrence frequency sets of the series of particular two-character chain types detected by the control means are collated with each other to ascertain a connection between each pair of particular general two-character chain types having particular occurrence frequency sets, a connection between each particular first two-character chain type having a particular occurrence frequency set and one particular general two-character chain type having a particular occurrence frequency set and a connection between each particular third two-character chain type having a particular occurrence frequency set and one particular general two-character chain type having a particular occurrence frequency set, and a plurality of particular occurrence frequency sets of the series of particular two-character chain types are detected on condition that a plurality of particular registration two-character chains indicated by the particular occurrence frequency sets are connected with each other in series in the registration character string; and

character string detecting means for detecting a particular character string agreeing with the registration character string from the registration character string according to the series of particular registration two-character chains and the particular occurrence frequency sets of the series of particular registration two-character chains detected by the collating means.

51. A character string collating apparatus according to claim 50 in which each special character is a space frequently occurring in the text written in Hangul language or a space frequently occurring in the text written in English to divide words.

52. A character string collating apparatus according to claim 50 in which the collating operation performed by the collating means is that one occurrence frequency of the fore general character in each occurrence frequency set of a first particular general two-character chain type is collated with an occurrence frequency of the rear general character in a particular occurrence frequency set of a second particular general two-character chain type placed just before the first particular general two-character chain type in the particular chain order to determine a particular occurrence frequency set of the first particular general two-character chain type on condition that the occurrence frequency of the fore general character in the particular occurrence frequency set of the first particular general two-character chain type agrees with the occurrence frequency of the rear general character in the particular occurrence frequency set of the second particular general two-character chain type,

one occurrence frequency of the fore general character in each occurrence frequency set of one particular first two-character chain type is collated with an occurrence frequency of the rear general character in a particular occurrence frequency set of one particular general two-character chain type placed just before the particular first two-character chain type in the particular chain order to determine a particular occurrence frequency set of the particular first registration two-character chain type on condition that the occurrence frequency of the fore general character in the particular occurrence frequency set of the particular first two-character chain type agrees with the occurrence frequency of the rear general character in the particular occurrence frequency set of the particular general two-character chain type,

one occurrence frequency of the fore general character in each occurrence frequency set of one particular second two-character chain type is collated with an occurrence frequency of the fore general character in the particular occurrence frequency set of the particular first two-character chain type placed just before the particular second two-character chain type in the particular chain order to determine a particular occurrence frequency set of the particular second two-character chain type on condition that the occurrence frequency of the fore general character in the particular occurrence frequency set of the particular second two-character chain type agrees with the occurrence frequency of the fore general character in the particular occurrence frequency set of the particular first two-character chain type, and

one occurrence frequency of the rear general character in each occurrence frequency set of one particular third two-character chain type is collated with an occurrence frequency of the rear general character in the particular occurrence frequency set of the particular first two-character chain type placed just before the particular third two-character chain type in the particular chain order to determine a particular occurrence frequency set of the particular third two-character chain type on condition that the occurrence frequency of the rear general character in the particular occurrence frequency set of the particular third two-character chain type agrees with the occurrence frequency of the rear general character in the particular occurrence frequency set of the particular first two-character chain type.

53. A character string collating apparatus according to claim 50, further comprising:

recording means for recording the general two-character chain types, the first two-character chain types, the second two-character chain types and the third two-character chain types classified by the registration character chain classifying means, recording the occurrence frequency sets calculated by the first occurrence frequency calculating means for each general two-character chain type, recording the occurrence frequency sets calculated by the second occurrence frequency calculating means for each first two-character chain type, recording the occurrence frequency sets calculated by the second occurrence frequency calculating means for each second two-character chain type, and recording the occurrence frequency sets calculated by the second occurrence frequency calculating means for each third two-character chain type, the particular general two-character chain types, the particular first two-character chain types, the particular second two-character chain types and the particular third two-character chain types recorded in the recording means being specified by the control means, and the occurrence frequency sets recorded in the recording means being received by the collating means under the control of the control means.

54. A character string collating apparatus according to claim 50 in which a series of special characters arranged in the registration character string or the retrieval character string is detected as a single special character by the registration character chain producing means or the second retrieval character chain detecting means.

55. A character string collating apparatus for collating a retrieval character string with a registration character string of a text, in which each of a plurality of special characters of the same character type is intermittently arranged in a plurality of general characters classified into a plurality of character types, to retrieve a particular character string agreeing with the retrieval character string from the registration character string, comprising:

registration character chain detecting means for detecting all registration general two-character chains existing in the registration character string of the text, each registration general two-character chain including a fore general character and a rear general character arranged just after the fore general character in the registration character string, and the rear general character of a first registration general two-character chain placed just before a second registration general two-character chain in the registration character string agreeing with the fore general character of the second registration general two-character chain;

registration character chain producing means for detecting a registration special three-character chain, including a fore general character, one special character and a rear general character arranged in that order in the registration character string, from the registration character string for each special character, and producing a fore registration two-character chain including the fore general character and the special character in that order and a rear registration two-character chain including the special character and the rear general character in that order from each registration special three-character chain, the rear general character of one registration general two-character chain placed just before one registration special three-character chain in the registration character string agreeing with the fore general character of the fore registration two-character chain produced from the registration special three-character chain, and the fore general character of one registration general two-character chain placed just after one registration special three-character chain in the registration character string agreeing with the rear general character of the rear registration two-character chain produced from the registration special three-character chain;

first occurrence frequency calculating means for calculating a pair of occurrence frequencies of the fore and rear general characters of each registration general two-character chain detected by the registration character chain detecting means as an occurrence frequency set, the occurrence frequency of each particular general character of a particular type placed in a particular position of the registration character string denoting the number of general characters of the same particular type existing in an area between a starting position of the registration character string and the particular position of the registration character string;

second occurrence frequency calculating means for determining an occurrence frequency set of each fore registration two-character chain produced by the registration character chain producing means by setting an occurrence frequency of the special character of the fore registration two-character chain to zero and calculating an occurrence frequency of the fore general character of the fore registration two-character chain, and determining an occurrence frequency set of each rear registration two-character chain produced by the registration character chain producing means by setting an occurrence frequency of the special character of the rear registration two-character chain to zero and calculating an occurrence frequency of the rear general character of the rear registration two-character chain;

registration character chain classifying means for classifying each group of registration general two-character chains, which respectively include the same type of fore general character and the same type of rear general character, detected by the registration character chain detecting means into one general two-character chain type, classifying each group of fore registration two-character chains, which respectively include the same type of fore general character and the special character, produced by the registration character chain producing means into one fore two-character chain type, and classifying each group of rear registration two-character chains, which respectively include the special character and the same type of rear general character, produced by the registration character chain producing means into one rear two-character chain type;

first retrieval character chain detecting means for detecting all retrieval general two-character chains existing in the retrieval character string, each retrieval general two-character chain including a fore general character and a rear general character arranged just after the fore general character in the retrieval character string, and the rear general character of a first retrieval general two-character chain placed just before a second retrieval general two-character chain in the retrieval character string agreeing with the fore general character of the second retrieval general two-character chain;

second retrieval character chain detecting means for detecting all retrieval special three-character chains, respectively including a fore general character, one special character and a rear general character arranged in that order in the retrieval character string, from the retrieval character string, the rear general character of one retrieval general two-character chain placed just before one retrieval special three-character chain in the retrieval character string agreeing with the fore general character of the retrieval special three-character chain, and the fore general character of one retrieval general two-character chain placed just after one retrieval special three-character chain in the retrieval character string agreeing with the rear general character of the retrieval special three-character chain;

control means for specifying a plurality of particular general two-character chain types and particular fore and rear special two-character chain types, which are classified by the registration character chain classifying means, corresponding to the retrieval general two-character chains detected by the first retrieval character chain detecting means and the retrieval special three-character chains detected by the second retrieval character chain detecting means, detecting a retrieval chain order of arranging the retrieval general two-character chains and the retrieval special three-character chains in the retrieval character string, and determining a particular chain order corresponding to the retrieval chain order for the particular general two-character chain types and the particular fore and rear special two-character chain types;

collating means for repeatedly receiving the occurrence frequency sets of one particular general two-character chain type, one particular fore special two-character chain type or one particular rear special two-character chain type specified by the control means from the first occurrence frequency calculating means or the second occurrence frequency calculating means in the particular chain order for the particular general two-character chain types and the particular fore and rear special two-character chain types, and performing a collating operation in which the occurrence frequencies of the occurrence frequency sets of the series of particular general two-character chain types and particular fore and rear special three-character chain types detected by the control means are collated with each other to ascertain a connection between each pair of particular general two-character chain types having particular occurrence frequency sets, a connection between each particular fore two-character chain type having a particular occurrence frequency set and one particular general two-character chain type having a particular occurrence frequency set and a connection between each particular rear two-character chain type having a particular occurrence frequency set and one particular general two-character chain type having a particular occurrence frequency set, and a plurality of particular occurrence frequency sets of the particular two-character chain types are detected on condition that a series of particular registration two-character chains having the particular occurrence frequency sets are connected with each other in series in the retrieval character string; and

character string detecting means for detecting a particular character string agreeing with the retrieval character string from the registration character string according to the series of particular registration two-character chains and the particular occurrence frequency sets detected by the collating means.

56. A character string collating apparatus according to claim 55 in which each special character is a space frequently occurring in the text written in Hangul language or a space frequently occurring in the text written in English to divide words.

57. A character string collating apparatus according to claim 55 in which the collating operation performed by the collating means is that one occurrence frequency of the fore general character in each occurrence frequency set of a first particular general two-character chain type is collated with an occurrence frequency of the rear general character in a particular occurrence frequency set of a second particular general two-character chain type placed just before the first particular general two-character chain type in the particular chain order to determine a particular occurrence frequency set of the first particular general two-character chain type on condition that the occurrence frequency of the fore general character in the particular occurrence frequency set of the first particular general two-character chain type agrees with the occurrence frequency of the rear general character in the particular occurrence frequency set of the second particular general two-character chain type,

one occurrence frequency of the fore general character in each occurrence frequency set of one particular fore two-character chain type is collated with an occurrence frequency of the rear general character in a particular occurrence frequency set of one particular general two-character chain type placed just before the particular fore two-character chain type in the particular chain order to determine a particular occurrence frequency set of the particular fore registration two-character chain type on condition that the occurrence frequency of the fore general character in the particular occurrence frequency set of the particular fore two-character chain type agrees with the occurrence frequency of the rear general character in the particular occurrence frequency set of the particular general two-character chain type,

one occurrence frequency of the fore general character in each occurrence frequency set of one particular general two-character chain type is collated with an occurrence frequency of the rear general character in a particular occurrence frequency set of one particular rear two-character chain type placed just before the particular general two-character chain type in the particular chain order to determine a particular occurrence frequency set of the particular general two-character chain type on condition that the occurrence frequency of the fore general character in the particular occurrence frequency set of the particular general two-character chain type agrees with the occurrence frequency of the rear general character in the particular occurrence frequency set of the particular rear registration two-character chain type.

58. A character string collating apparatus according to claim 55, further comprising:

recording means for recording the general two-character chain types, the fore two-character chain types and the rear two-character chain types classified by the registration character chain classifying means, recording the occurrence frequency sets calculated by the first occurrence frequency calculating means for each general two-character chain type, recording the occurrence frequency sets calculated by the second occurrence frequency calculating means for each fore two-character chain type, and recording the occurrence frequency sets calculated by the second occurrence frequency calculating means for each rear two-character chain type, the series of particular general two-character chain types and particular fore and rear special three-character chain types recorded in the recording means being specified by the control means, and the occurrence frequency sets recorded in the recording means being received by the collating means.

59. A character string collating apparatus according to claim 55 in which an identifier is attached to the fore two-character chain types and the rear two-character chain types to distinguish the fore two-character chain types and the rear two-character chain types from the general two-character chain types.

60. A character string collating apparatus according to claim 55 in which a series of special characters arranged in the registration character string or the retrieval character string is detected as a single special character by the registration character chain producing means or the second retrieval character chain detecting means.

61. A character string collating apparatus for collating a retrieval character string with a registration character string of a text, in which each of a plurality of special characters of the same character type is intermittently arranged in a plurality of general characters classified into a plurality of character types, to retrieve a particular character string agreeing with the retrieval character string from the registration character string, comprising:

registration character chain detecting means for detecting all registration general two-character chains existing in the registration character string of the text, each registration general two-character chain including a fore general character and a rear general character arranged just after the fore general character in the registration character string, and the rear general character of a first registration general two-character chain placed just before a second registration general two-character chain in the registration character string agreeing with the fore general character of the second registration general two-character chain;

registration character chain producing means for detecting a registration special three-character chain, including a fore general character, one special character and a rear general character arranged in that order in the registration character string, from the registration character string for each special character, converting each registration special three-character chain into a converted registration special three-character chain including the fore general character, a central general character having the same character type as that of the rear general character and the rear general character in that order, and producing a fore registration two-character chain including the fore general character and the central general character in that order and a rear registration two-character chain including the central general character and the rear general character in that order from each converted registration special three-character chain, the rear general character of one registration general two-character chain placed just before one registration special three-character chain in the registration character string agreeing with the fore general character of the registration special three-character chain, and the fore general character of one registration general two-character chain placed just after one registration special three-character chain in the registration character string agreeing with the rear general character of the registration special three-character chain;

first occurrence frequency calculating means for calculating a pair of occurrence frequencies of the fore and rear general characters of each registration general two-character chain detected by the registration character chain detecting means as an occurrence frequency set, the occurrence frequency of each particular general character of a particular type placed in a particular position of the registration character string denoting the number of general characters of the same particular type existing in an area between a starting position of the registration character string and the particular position of the registration character string;

second occurrence frequency calculating means for calculating a rear occurrence frequency of the rear general character of each registration special three-character chain in the registration character string, setting a central occurrence frequency of the central general character to the rear occurrence frequency of the rear general character placed just after the central general character in each converted registration special three-character chain, calculating a fore occurrence frequency of the fore general character of each registration special three-character chain in the registration character string, determining a set of the fore occurrence frequency and the central occurrence frequency as an occurrence frequency set of each fore registration two-character chain produced by the registration character chain producing means, and determining a set of the central occurrence frequency and the rear occurrence frequency as an occurrence frequency set of each rear registration two-character chain produced by the registration character chain producing means;

registration character chain classifying means for classifying each group of registration general two-character chains, which respectively include the same type of fore general character and the same type of rear general character, detected by the registration character chain detecting means into one general two-character chain type, classifying each group of fore registration two-character chains, which respectively include the same type of fore general character and the same type of central general character, produced by the registration character chain producing means into one fore two-character chain type, and classifying each group of rear registration two-character chains, which respectively include the same type of central general character and the same type of rear general character, produced by the registration character chain producing means into one rear two-character chain type;

first retrieval character chain detecting means for detecting all retrieval general two-character chain existing in the retrieval character string, each retrieval general two-character chain including a fore general character and a rear general character arranged just after the fore general character in the retrieval character string, and the rear general character of a first retrieval general two-character chain placed just before a second retrieval general two-character chain in the retrieval character string agreeing with the fore general character of the second retrieval general two-character chain;

second retrieval character chain detecting means for detecting a retrieval special three-character chain, including a fore general character, one special character and a rear general character arranged in that order in the retrieval character string, from the retrieval character string for each special character, converting each retrieval special three-character chain into a converted retrieval special three-character chain including the fore general character, a central general character having the same character type as that of the rear general character and the rear general character in that order to produce a converted retrieval character string from the retrieval character string, and producing a fore retrieval two-character chain including the fore general character and the central general character in that order and a rear retrieval two-character chain including the central general character and the rear general character in that order from each converted retrieval special three-character chain, the rear general character of one retrieval general two-character chain placed just before one retrieval special three-character chain in the retrieval character string agreeing with the fore general character of the retrieval special three-character chain, and the fore general character of one retrieval general two-character chain placed just after one retrieval special three-character chain in the retrieval character string agreeing with the rear general character of the retrieval special three-character chain;

control means for specifying a plurality of particular general two-character chain types and particular fore and rear two-character chain types, which are classified by the registration character chain classifying means, corresponding to the retrieval general two-character chains detected by the first retrieval character chain detecting means and the fore and rear retrieval two-character chains detected by the second retrieval character chain detecting means, detecting a retrieval chain order of arranging the retrieval general two-character chains and the fore and rear retrieval two-character chains in the converted retrieval character string, and determining a particular chain order corresponding to the retrieval chain order for the particular general two-character chain types and the particular fore and rear two-character chain types;

collating means for repeatedly receiving the occurrence frequency sets of one particular general two-character chain type, one particular fore two-character chain type or one particular rear two-character chain type specified by the control means from the first occurrence frequency calculating means or the second occurrence frequency calculating means in the particular chain order for the particular general two-character chain types and the particular fore and rear two-character chain types, and performing a collating operation in which the occurrence frequencies of the occurrence frequency sets of the series of particular general two-character chain types and particular fore and rear retrieval special two-character chain types detected by the control means are collated with each other to ascertain a connection between each pair of particular general two-character chain types having particular occurrence frequency sets, a connection between each particular fore two-character chain type having a particular occurrence frequency set and one particular general two-character chain type having a particular occurrence frequency set and a connection between each particular rear two-character chain type having a particular occurrence frequency set and one particular general two-character chain type having a particular occurrence frequency set, and a plurality of particular occurrence frequency sets of the particular two-character chain types are detected on condition that a plurality of particular registration two-character chains having the particular occurrence frequency sets are connected with each other in series in the converted retrieval character string; and

character string detecting means for detecting a particular character string agreeing with the retrieval character string from the registration character string according to the series of particular registration two-character chains and the particular occurrence frequency sets detected by the collating means.

62. A character string collating apparatus according to claim 61 in which each special character is a space frequently occurring in the text written in Hangul language or a space frequently occurring in the text written in English to divide words.

63. A character string collating apparatus according to claim 61 in which the collating operation performed by the collating means is that one occurrence frequency of the fore general character in each occurrence frequency set of a first particular general two-character chain type is collated with an occurrence frequency of the rear general character in a particular occurrence frequency set of a second particular general two-character chain type placed just before the first particular general two-character chain type in the particular chain order to determine a particular occurrence frequency set of the first particular general two-character chain type on condition that the occurrence frequency of the fore general character in the particular occurrence frequency set of the first particular general two-character chain type agrees with the occurrence frequency of the rear general character in the particular occurrence frequency set of the second particular general two-character chain type,

one occurrence frequency of the fore general character in each occurrence frequency set of one particular fore two-character chain type is collated with an occurrence frequency of the rear general character in a particular occurrence frequency set of one particular general two-character chain type placed just before the particular fore two-character chain type in the particular chain order to determine a particular occurrence frequency set of the particular fore two-character chain type on condition that the occurrence frequency of the fore general character in the particular occurrence frequency set of the particular fore two-character chain type agrees with the occurrence frequency of the rear general character in the particular occurrence frequency set of the particular general two-character chain type,

one occurrence frequency of the fore general character in each occurrence frequency set of one particular rear two-character chain type is collated with an occurrence frequency of the rear general character in a particular occurrence frequency set of one particular fore two-character chain type placed just before the particular rear two-character chain type in the particular chain order to determine a particular occurrence frequency set of the particular rear two-character chain type on condition that the occurrence frequency of the fore general character in the particular occurrence frequency set of the particular rear two-character chain type agrees with the occurrence frequency of the rear general character in the particular occurrence frequency set of the particular fore two-character chain type, and

one occurrence frequency of the fore general character in each occurrence frequency set of one particular general two-character chain type is collated with an occurrence frequency of the rear general character in a particular occurrence frequency set of one particular rear two-character chain type placed just before the particular general two-character chain type in the particular chain order to determine a particular occurrence frequency set of the particular general two-character chain type on condition that the occurrence frequency of the fore general character in the particular occurrence frequency set of the particular general two-character chain type agrees with the occurrence frequency of the rear general character in the particular occurrence frequency set of the particular rear two-character chain type.

64. A character string collating apparatus according to claim 61, further comprising:

recording means for recording the general two-character chain types, the fore two-character chain types and the rear two-character chain types classified by the registration character chain classifying means, recording the occurrence frequency sets calculated by the first occurrence frequency calculating means for each general two-character chain type, recording the occurrence frequency sets calculated by the second occurrence frequency calculating means for each fore two-character chain type, and recording the occurrence frequency sets calculated by the second occurrence frequency calculating means for each rear two-character chain type, the series of particular general two-character chain types and particular fore and rear special three-character chain types recorded in the recording means being specified by the control means, and the occurrence frequency sets recorded in the recording means being received by the collating means under the control of the control means.

65. A character string collating apparatus according to claim 61 in which a series of special characters arranged in the registration character string or the retrieval character string is detected as a single special character by the registration character chain producing means or the second retrieval character chain detecting means.

66. A character string collating apparatus for collating a retrieval character string with a registration character string of a text, in which each of a plurality of special characters of the same character type is intermittently arranged in a plurality of general characters classified into a plurality of character types, to retrieve a particular character string agreeing with the retrieval character string from the registration character string, comprising:

registration general character chain detecting means for detecting all registration general two-character chains existing in the registration character string of the text, each registration general two-character chain including a fore general character and a rear general character arranged just after the fore general character in the registration character string, and the rear general character of a first registration general two-character chain placed just before a second registration general two-character chain in the registration character string agreeing with the fore general character of the second registration general two-character chain;

registration special character chain detecting means for detecting all registration special two-character chains, respectively including one special character as a fore character and a rear general character or a fore general character and one special character as a rear character arranged in that order in the registration character string, the fore character of each registration special two-character chain placed just after one registration general two-character chain agreeing with the rear general character of the registration general two-character chain, the rear character of each registration special two-character chain placed just before one registration general two-character chain agreeing with the fore general character of the registration general two-character chain, and the rear character of a first registration special two-character chain placed just before a second registration special two-character chain agreeing with the fore character of the second registration special two-character chain;

first occurrence frequency calculating means for calculating a pair of occurrence frequencies of the fore and rear general characters of each registration general two-character chain detected by the registration general character chain detecting means as an occurrence frequency set, the occurrence frequency of each character of a particular type placed in a particular position of the registration character string denoting the number of characters of the same particular type existing in an area between a starting position of the registration character string and the particular position of the registration character string;

second occurrence frequency calculating means for calculating an occurrence frequency of the fore or rear general character and a limited occurrence frequency of the rear or fore special character of each registration special two-character chain detected by the registration special character chain detecting means as an occurrence frequency set, the limited occurrence frequency of each special character being obtained by setting a plurality of N limited values (N is an integer higher than 1) different from each other and lower than or equal to a maximum value as a set of N limited values and allocating the N limited values to each group of N special characters arranged in the registration character string on condition that each limited value selected in a predetermined order from one group of N limited values is allocated as one limited occurrence frequency to one special character selected from one group of N special characters in the order of arranging the special characters in the registration character string;

registration character chain classifying means for classifying each group of registration general two-character chains, which respectively include the same type of fore general character and the same type of rear general character, detected by the registration general character chain detecting means into one general two-character chain type, classifying each group of registration special two-character chains, which respectively include one special character of the same limited occurrence frequency as one fore character, detected by the registration special character chain detecting means into one first special two-character chain type, and classifying each group of registration special two-character chains, which respectively include one special character of the same limited occurrence frequency as one rear character, detected by the registration special character chain detecting means into one second special two-character chain type;

registration special two-character chain table producing means for producing a first special two-character chain table in which a plurality of registration special two-character chains respectively including one special character of the same limited occurrence frequency as one fore character and the occurrence frequency sets of the registration special two-character chains are arranged in the order of arranging the registration special two-character chains in the retrieval character string, and producing a second special two-character chain table in which a plurality of registration special two-character chains respectively including one special character of the same limited occurrence frequency as one rear character and the occurrence frequency sets of the registration special two-character chains are arranged in the order of arranging the registration special two-character chains in the retrieval character string;

first retrieval character chain detecting means for detecting all retrieval general two-character chains existing in the retrieval character string, each retrieval general two-character chain including a fore general character and a rear general character arranged just after the fore general character in the retrieval character string, and the rear general character of a first retrieval general two-character chain placed just before a second retrieval general two-character chain in the retrieval character string agreeing with the fore general character of the second retrieval general two-character chain;

second retrieval character chain detecting means for detecting all retrieval special two-character chains, respectively including one special character as a fore character and a rear general character or a fore general character and one special character as a rear character arranged in that order in the retrieval character string, the fore character of each retrieval special two-character chain placed just after one retrieval general two-character chain agreeing with the rear general character of the retrieval general two-character chain, the rear character of each retrieval special two-character chain placed just before one retrieval general two-character chain agreeing with the fore general character of the retrieval general two-character chain, and the rear character of a first retrieval special two-character chain placed just before a second retrieval special two-character chain agreeing with the fore charac