Language processing method and apparatus4805100Abstract When a Japanese or Korean bunsetsu-phrase lattice, that is, a set of linguistic units called bunsetsu-phrases each element of which having various starting and ending positions, and numerical values representing the reliability of each bunsetsu-phrase are given, optimum sequences of bunsetsu-phrases as Japanese or Korean clauses or sentences are selected under a criterion of degree of acceptability of a clause or sentence with considerably small amount of computation; the syntactic structures and the degree of acceptability of the optimum sequences are calculated simultaneously; the gist of the invention residing in that processing proceeds from shorter sequences to longer sequences storing the results for shorter sequences and utilizing later the results in calculating for longer sequences thus systematically avoiding the repeated calculation of partially the same quantity. Claims What is claimed is: Description BACKGROUND OF THE INVENTION
TABLE 1
______________________________________
M
N 5 10
______________________________________
5 5.8 .times. 10.sup.4
1.6 .times. 10.sup.6
10 8.0 .times. 10.sup.10
6.3 .times. 10.sup.13
15 1.7 .times. 10.sup.17
3.9 .times. 10.sup.21
20 4.6 .times. 10.sup.23
2.9 .times. 10.sup.29
______________________________________
The numbers of elements in the set .orgate. ((S.sub.0,S.sub.1, . . .
,S.sub.m) .epsilon. D(1,N))[KB(B(S.sub.0 + 1,S.sub.1), B(S.sub.1 +
1,S.sub.2), . . . ,B(S.sub.m-1 + 1,S.sub.m))].
Here, the number of elements in B(i,j) is supposed to be the same for all i and j(1.ltoreq.i.ltoreq.j.ltoreq.N), and is represented by M. It is seen that in the case of the enumeration method, the number of arithmetic operations grows rapidly as N increases, and becomes an extremely large number even for moderate value of N so that it is extremely difficult to apply the enumeration method to the above-mentioned problem in practice. It is an object of the present invention to overcome the defects encountered in the prior art methods. More specifically, it is an object of the present invention to provide a language processing method which is remarkably efficient as compared with the prior art methods for the reason that the number of arithmetic operations is of the order of a polynomial with respect to the length of the phonetic symbol string and the number of elements of each bunsetsu-phrase set. It is another object of the present invention to provide a language processing apparatus which performs the language processing method of the present invention in an efficient manner to overcome the defects of the prior art. It is apparent that the present invention can be applied not only to the Japanese or Korean language but also to any languages which have grammatical structure based on the dependency relation between words or phrases. 1. Theoretical Foundations of the Invention 1.1. Fundamental Recurrence Equations 1.1.1. The Case of General Bunsetsu-Phrase Lattice Prior to the description of the present invention, recurrence equations which play fundamental rolls in the present invention will be described. First, the following definitions are made: [D6] For a fixed natural number N, and for i, j, x satisfying 1.ltoreq.i.ltoreq.m.ltoreq.j.ltoreq.N, x.epsilon.B(m,j), ##EQU7## When m=i, D(i,m-1) is not defined. In this case, the following convention is adopted: ##EQU8## In case of (2), there exist many dependency structures X which minimize P(X)+S(X) in general, so that OPTKS(i,j,m,x) becomes a set. In the present invention, the following two recurrence equations [T1] and [T2] for OPTPS and OPTKS, respectively, play the fundamental roles. [T1] For 1.ltoreq.i.ltoreq.m.ltoreq.j.ltoreq.N, x.epsilon.B(m,j), (1) if i=m, then OPTPS(i,j,m,x)=S(x), and (2) if i<m, then OPTPS(i,j,m,x) =min(i.ltoreq.n.ltoreq.k.ltoreq.m-1)[min(y.epsilon.B(n,k)) [OPTPS(i,k,n,y)+PEN(y,x)]+OPTPS(k+1,j,m,x)]. [T2] For 1.ltoreq.i.ltoreq.m.ltoreq.j.ltoreq.N, x.epsilon.B(m,j), (1) if i=m, then OPTKS (i,j,m,x)={<x>}, and (2) if i<m, then OPTKS(i,j,m,x) =.orgate.((n,k,y).epsilon.KTS(i,j,m,x)) [{X.sym.Y.vertline.X.epsilon.OPTKS(i,k,n,y),Y.epsilon.OPTKS(k+1,j,m,x)}], where KTS (i,j,m,x) is the set of all the triplets (n,k,y) of n, k, y which attain the minimum in (2) of [T1]. Next, these recurrence equations are proved. To this end, the following [E3] is shown first: ##EQU9## (proof) Let Z.epsilon..orgate.((s.sub.0,s.sub.1,s.sub.2, . . . , s.sub.p).epsilon.D(i,m-1)) [KB(B(s.sub.0 +1,s.sub.1),B(s.sub.1 +1,s.sub.2), . . . ,B(s.sub.p-1 +1,s.sub.p),{x})], then there exist (s.sub.0,s.sub.1, s.sub.2, . . . , s.sub.p).epsilon.D(i,m-1) and x.sub.1 .epsilon.B(s.sub.0 +1, s.sub.1), x.sub.2 .epsilon.B(s.sub.1 +1,s.sub.2), . . . , xp.epsilon.B(s.sub.p), such that Z.epsilon.K(x.sub.1 x.sub.2 . . . x.sub.p x). According to [E2], there exist t satisfying 1.ltoreq.t.ltoreq.p, and Y, X satisfying Y.epsilon.K(x.sub.1 x.sub.2 . . . x.sub.t), X.epsilon.K(x.sub.t+1 x.sub.t+2 . . . x.sub.p x) such that Z=Y.sym.X. Let n=s.sub.t-1 +1, k=s.sub.t, y=x.sub.t, then ##EQU10## This shows that the left side of the equation to be proved is included in the right side. The fact that the right side is included in the left side can be shown in a manner substantially similar to that described above. Next, T1 is proved by using E3. (proof of T1) ##EQU11## Thus [T1] is proved. (proof of T2) Since (1) is obvious from its definition, only (2) will be shown. First, it should be noted that for X satisfying X.epsilon..orgate.((s.sub.0,s.sub.1 s.sub.2, . . . s.sub.p).epsilon.D(i,m-1)) [KB(B(s.sub.0 +1,s.sub.1),B(s.sub.1 +1,s.sub.2), . . . ,B(s.sub.p-1 +1,s.sub.p),{x})], X.epsilon.OPTKS(i,j,m,x) is equivalent with P(X)+S(X)=OPTPS(i,j,m,x). In order to prove (2), it suffices to show (a) OPTKS(i,j,m,x) .epsilon..orgate.((n,k,y).epsilon.KTS(i,j,m,x)) [{Y.sym.X.vertline.Y.epsilon.OPTKS(i,k,n,y), X.epsilon.OPTKS(k+l,j,m,x)}] and (b) OPTKS(i,j,m,x) .epsilon..orgate.((n,k,y).epsilon.KTS(i,j,m,x)) [{Y.sym.X.vertline.Y.epsilon.OPTKS(i,k,n,y),X.epsilon.OPTKS(k+l,j,m,x)}]. Now (a) can be shown as follows: Let Z.epsilon..orgate.((n,k,y).epsilon.KTS(i,j,m,x)) [{Y.sym.X.vertline.Y.epsilon.OPTKS(i,k,n,y),X.epsilon.OPTKS(k+l,j,m,x)}], then there exist (n, k, y).epsilon.KTS(i,j,m,x), Y.epsilon.OPTKS(i,k, n, y), X.epsilon.OPTKS(k+l,j,m,x) such that Z=Y.sym.X. Obviously, ##EQU12## Therefore Z.epsilon.OPTKS(i,j,m,x). In order to show (b), let Z.epsilon.OPTKS(i,j,m,x). Since ##EQU13## according to [E3], there exist n, k which satisfy i.ltoreq.n.ltoreq.k.ltoreq.m-1, and y, Y, X satisfying y.epsilon.B(n,k), ##EQU14## such that Z=Y.sym.X On the other hand, P(Z)+S(Z)=P(Y)+P(X)+S(Y)+S(X)+PEN(y,z), and since Z minimizes the left side of this equation, Y and X must also minimize the right side of the same equation. Therefore, as is apparent from the proof of [T1] (n, k, y).epsilon.KTS(i,j,m,x) and Y.epsilon.OPTKS(i,k, n, y), X.epsilon.OPTKS(k+1,j,m,x)}] Hence, ##EQU15## Thus, [T2] is proved. OPTPS(i,j,m,x) has four variables i, j, m and x. Since m represents the starting position of the bunsetsu-phrase x, m is uniquely determined depending on x. Therefore, if we define [D7] (1) IN(x)=(the starting position of bunsetsu-phrase x), (2) B(i,j)=.orgate.(i.ltoreq.k.ltoreq.j)B(k,j), and (3) for x.epsilon.B(i,j) OPT(i,j,x)=OPTPS(i,j,IN(x),x), then, [T1] may be rewritten as follows: [T1'] For 1.ltoreq.i.ltoreq.j.ltoreq.N and x.epsilon.B(i,j), (1) if i=IN(x), then OPT(i,j,x)=S(x) and (2) if i<IN(x), then OPT(i,j,x)=min(i.ltoreq.k.ltoreq.IN(x)-1) [min(y.epsilon.B(i,k))[OPT(i,k,y)+PEN(y,x)] +OPT(k+1,j,k)] As to OPTKS, we define similarly [D8] For 1.ltoreq.i.ltoreq.j.ltoreq.N and x.epsilon.B(i,j), OPTK(i,j,x)=OPTKS(i,j,IN(x),x). Furthermore, we denote the set of pairs (k,y) of k and y which attain the minimum value in (2) of [T1'] by KT(i,j,x), Then, [T2] may be rewritten as follows: [T2'] For 1.ltoreq.i.ltoreq.j.ltoreq.N, x.epsilon.B(i,j) (1) if i=IN(x), then OPTK(i,j,x)=}<x>} and (2) if i<IN(x), then OPTK(i,j,x) =.orgate.((k,y).epsilon.KT(i,j,x)) [{Y.sym.X.vertline.Y.epsilon.OPTK(i,k,y), X.epsilon.OPTK(k+1,j,x))}]. When (k,y) is an element of KT(i,j,x), k is called an optimum segmentation point for i, j and x, while y is called an optimum bunsetsu-phrase corresponding to k. 1.1.2. The Case in Which the Bunsetsu-Phrase Lattice Becomes a Sequence of Bunsetsu-Phrase Sets. As is apparent from the definitions of OPT(i,j,x) and OPTK(i,j,x), the value of the first variable is significant only if it is equal to the starting position of a bunsetsu-phrase which is not a dummy and is equal to the number obtained by adding 1 to the ending position of a bunsetsu-phrase which is not a dummy, or if it is equal to 1. We can therefore renumber, beginning with 1, the value which can be taken by the first variable, after removing the irrelevant values. In the same way, the value of the second variable is significant only if it is equal to the ending position of bunsetsu-phrase which is not a dummy and is equal to the number obtained by subtracting 1 from the starting position of a bunsetsu-phrase which is not a dummy, or if it is equal to N. We can therefore renumber, beginning with 1, the value which can be taken by the second variable after removing the irrelevant values. An example in which the above-mentioned renumbering of the phonetic symbol positions is very effective will be described below. Assume that relative to the phonetic symbol positions 1 through N, a segmentation 0=s.sub.0 <s.sub.1 < . . . <Sn=N exists such that every bunsetsu-phrase in the bunsetsu-phrase lattice starts from s.sub.i-1 +1 and ends at s.sub.i for some i (1.ltoreq.i.ltoreq.n). That is, in this case, the bunsetsu-phrase lattice consists of a sequence of bunsetsu-phrase sets B(s.sub.0 +1,1,s.sub.1), B(s.sub.1 +1,s.sub.2), . . . , B(s.sub.n-1 +1,s.sub.n). The output from a word processor or a speech recognizer which accepts bunsetsu-phrases separated with spaces or pauses takes this form. It this case, for the value of the first variable of OPT(i,j,x) and OPTK(i,j,x), only s.sub.0 +1, s.sub.1 +1, . . . , s.sub.n-1 +1 are significant, and for the value of the second variable, only s.sub.1, s.sub.2, . . . , s.sub.n are significant. When n is replaced by N and when these significant values are renumbered from 1 to N, the first and second variables of OPT(i,j,x) and OPTK(i,j,x) have the values ranging from 1 to N. Furthermore, these numbers indicate the order of bunsetsu-phrase sets. Let B.sub.i =B(s.sub.i-1 +1,s.sub.i), (1.ltoreq.i.ltoreq.N), then [T1'] and [T2'] may be rewritten as follows as is apparent from the above description: [T1"] For 1.ltoreq.i.ltoreq.j.ltoreq.N and x.epsilon.B.sub.j (1) if i=j, then OPT(i,j,x)=S(x), and (2) if i<j, then OPT(i,j,x) =min (i.ltoreq.k.ltoreq.j-1)[min(y.epsilon.B.sub.k)[OPT(i,k,y) +PEN(y,x)]+OPT(k+1,j,x)]. [T2"] For 1.ltoreq.i.ltoreq.j.ltoreq.N, x.epsilon.B.sub.j (1) if i=j, then OPTK(i,j,k)={<x>}, and (2)if i<j, then OPTK(i,j,k) =.orgate.((k,y).epsilon.KT(i,j,x))[{Y.sym.X.vertline.Y.epsilon.OPTK(i,k,y), X.orgate.OPTK(k+1,j,x)}], where KT(i,j,x) represents the set of the pairs (k,y) of k and y which attain the minimum value in (2) of [T1"]. 1.2. Methods for Determining the Value of OPT and the Pair of Optimum Segmenting Point and Optimum Bunsetsu-Phrase: 1.2.1. The Case of General Bunsetsu-Phrase Lattice The part (1) of [T1'] shows the fact that when i=IN(x), the value of OPT(i,j,x) is determined to be the value of S(x), while (2) of [T1'] shows the fact that when i<IN(x), if OPT(i,k,y) and OPT(k+1, j,x) (i.ltoreq.k.ltoreq.IN(x)-1, y.epsilon.B(i,k) have been already calculated, the value of OPT (i,j,x) can be obtained by solving a minimization problem with one variable twice. By these facts, the calculation of OPT(i,j,x) (1.ltoreq.i.ltoreq.N, x.epsilon.B(i,j)) can proceed from intervals with i=j to larger intervals including the previous intervals. In the process of determining the value of OPT (i,j,x), the pairs of optimum segmenting point and optimum bunsetsu-phrase are also determined. When the value of OPT (1,N,x) is calculated for each x.epsilon.B(1,N), this phase of calculation is completed. 1.2.2. The Case of a Sequence of Bunsetsu-Phrase Sets The part (1) of [T1"] shows the fact that when i=j, OPT(i,j,x) is determined to be the value of S(x), while (2) of [T1"] shows the fact that when i<j, if OPT(i,k,y) and OPT(k+1,j,x) (i.ltoreq.k.ltoreq.j-1, y.epsilon.B.sub.k) have been already calculated, OPT(i,j,x) can be obtained by solving a minimization problem with one variable twice. By these facts, the calculation of OPT(i,j,x)(1.ltoreq.i.ltoreq.j.ltoreq.N, x.epsilon.B.sub.j) can proceed from intervals with i=j to larger intervals including the previous intervals. In the process of determining the value of OPT (i,j,x), the pairs of optimum segmenting point and optimum bunsetsu-phrase are also determined. When the value of OPT (i,N,x) is calculated for each x.epsilon.B.sub.N, this phase of calculation is completed. 1.3. Methods for Determining the Optimum Dependency Structure and the Degree of Acceptability of This Structure For the sake of simplicity in explanation, the case in which the pair of the optimum segmenting point and the optimum bunsetsu-phrase is uniquely determined will be described. In this case, OPTK(i,j,x) is equal to only one dependency structure. 1.3.1. The Case of General Bunsetsu-Phrase Lattice Since ##EQU16## by calculating the right side of this equation, the degree of acceptability of the optimum dependency structure on the optimum bunsetsu-phrase sequence can be obtained. Furthermore, let x.sub.0 =argmin(x.epsilon.B(1,N))[OPT(1,N,x)] then the optimum dependency structure on the optimum bunsetsu-phrase sequence is given by OPTK(1,N,x.sub.0). The determination of the structure of OPTK (1,N,x.sub.0) is carried out as follows. If IN(x.sub.0)=1, then according to (1) of [T2'], OPTK(1,N,x.sub.0)=<x.sub.0 > so the optimum dependency structure is determined. ##EQU17## where k.sub.1 is the optimum segmenting point and x.sub.1 is the optimum bunsetsu-phrase for 1, N, x.sub.0, If IN(x.sub.1).noteq.1, by using the optimum segmenting point k.sub.2 and the optimum bunsetsu-phrase for 1,k.sub.1,x.sub.1, OPTK(1,k.sub.1,x.sub.1) can be further decompsed as follows: OPTK(1,k.sub.1,x.sub.1) =OPTK(1,k.sub.2,x.sub.2).sym.OPTK(k.sub.2 +1,k.sub.1,x.sub.1) If IN(x.sub.0).noteq.k.sub.1 +1, by using the optimum segmenting point k.sub.3 and the optimum bunsetsu-phrase x.sub.3 for k.sub.1 +1,N,x.sub.0, OPTK(k.sub.1 +1,k.sub.2,x.sub.3) can be also decomposed as follows: ##EQU18## Such decomposition operations are carried out until IN(x)=i holds for all OPTK(i,j,x) which appear in the process. The OPTK(i,j,x) in which IN(x)=i holds is replaced by the dependency structure consisting of only one bunsetsu-phrase according to (1) of [T2'], and then the insertion operations are carried out in the reverse order of the decomposition operations. Thus, the optimum bunsetsu-phrase sequence and the optimum dependency structure thereon can be obtained simultaneously. When there are more than one pairs of the optimum segmenting point and the optimum bunsetsu-phrase, the same operations are carried out for all such pairs, and OPTK(1,N,x.sub.0) consists of all the dependency structures thus obtained. 1.3.2. The Case of a Sequence of Bunsetsu-Phrase Sets. First, as in the case of the general bunsetsu-phrase lattice, by calculating min(x.epsilon.B.sub.N)[OPT(1,N,x)] the degree of acceptability for the optimum dependency structure on the optimum bunsetsu-phrase sequence is calculated. Furthermore, let x.sub.0 =argmin(x.epsilon.B.sub.N)[OPT(1,N,x)] then the optimum dependency structure on the optimum bunsetsu-phrase sequence is given by OPTK(1,N,x.sub.0) The determination of the structure of OPTK(1,N,x.sub.0) is carried out as follows: If N=1, then according to (1) of [T2"], OPTK(1,N,x.sub.0)=<x.sub.0 > so the optimum dependency structure is determined. If N.noteq.1, then according to (2) of [T2"], OPTK(1,N,x.sub.0) =OPTK(1,k.sub.1,x.sub.1).sym.OPTK(k.sub.1 +1,N,x.sub.0) where k.sub.1 is the optimum segmenting point and x.sub.1 is the optimum bunsetsu-phrase for 1, N, x.sub.0. If k.sub.1 .noteq.1, by using the optimum segmenting point k.sub.2 and the optimum bunsetsu-phrase x.sub.2 for 1,k.sub.1,x.sub.1, OPTK(1,k.sub.1,x.sub.1) can be decomposed as follows: OPTK(1,k.sub.1,x.sub.1) =OPTK(1,k.sub.2,k.sub.2).sym.OPTK(k.sub.2 +1,k.sub.1,x.sub.1) In a similar manner, if N.noteq.k.sub.1 +1, by using the optimum segmenting point k.sub.3 and the optimum bunsetsu-phrase x.sub.3 for k.sub.1 +1,N,x.sub.0,OPTK(k.sub.1 +1,N,x.sub.0) can be decomposed as follows: ##EQU19## Such decomposition operations are carried out until i=j holds for all OPTK(i,j,x,) which appear in the process. The OPTK(i,j,x) in which i=j, by using (1) of [T2"], is replaced by a dependency structure consisting of only one bunsetsu-phrase according to (1) of [T2"], and then insertion operations are carried out in the reverse order of the decomposition operations. Thus, the optimum bunsetsu-phrase sequence and the optimum dependency structure thereon can be obtained simultaneously. When there are more than one pairs of the optimum segmentation point and the optimum bunsetsu-phrase, the same operations are carried out for all such pairs, and OPTK(1,N,x.sub.0) consists of all the dependency structures thus obtained as in the case of the general bunsetsu-phrase lattice. The above and other objects, effects, features and advantages of the present invention will become more apparent from the following description of preferred embodiments thereof taken in conjunction with the accompanying drawings. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram showing the first embodiment of an apparatus for carrying out the present invention; FIGS. 2A and 2B show a flowchart representing an example of the control sequence in the first embodiment; FIGS. 3A and 3B show an example of the construction of tables required for carrying out the control sequence defined in the flowchart shown in FIGS. 2A and 2B or 5A and 5B; FIG. 4 is a block diagram showing the second embodiment of an apparatus for carrying out the presence invention; and FIGS. 5A and 5B show a flowchart showing an example of the control sequence in the second embodiment. DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS Apparatus and Sequences to Carry out the Invention 2.1 The Case of General Bunsetsu-Phrase Lattice The first embodiment of an apparatus to carry out the present invention based on item 1.2.1. is shown in FIG. 1. In the following description, the phonetic symbol positions are represented by 1, 2, . . . , N; the number of elements of the bunsetsu-phrase set B(i,j) is represented by NUM(i,j); and the elements of B(i,j) are represented by x.sub.i,j,1, x.sub.i,j,2, . . . , x.sub.i,j, NUM(i,j). In FIG. 1, SC designates a buffer memory such as a RAM for storing the value S(x.sub.i,j,q), i.e. the degree of reliability of each bunsetsu-phrase, transferred through an input terminal i.sub.1 ; and BUF, a buffer memory such as a RAM for holding therein bunsetsu-phrase sets transferred through a bunsetsu-phrase input terminal i.sub.2. When the present invention is applied to speech recognition, for instance, each bunsetsu-phrase of the bunsetsu-phrase lattice coming out from the speech recognizer is applied to the bunsetsu-phrase input terminal i.sub.2, while the degree of realibility of each bunsetsu-phrase coming out from the speech recognizer is applied to the input terminal i.sub.1. When the present invention is applied to Japanese word processing of the type in which a non-segmented Kana sentence is converted into a sentence consisting of Kana and Kanji, a morphological analysis of a given Kana string a.sub.1, a.sub.2, . . . , a.sub.N is carried out by any suitable prior art method and bunsetsu-phrase candidates for each i,j(1.ltoreq.i.ltoreq.j.ltoreq.N) each having a substring a.sub.i, a.sub.i+1, . . . , a.sub.j of the original Kana string are all enumerated and applied to the input terminal i.sub.2. In this case, the degree of reliability of each bunsetsu-phrase which is determined by such information as the frequency of use of a word is applied to the input terminal i.sub.1. PE is a unit for calculating the degree of dependency between two bunsetsu-phrases x and y read out from BUF. T1 and T2 are RAMS for realizing tables TABLE1 and TABLE2 in the flowchart as shown in FIGS. 2A and 2B, respectively. INIT is a detector unit for detecting whether the starting position of bunsetsu-phrase x.sub.i,j,q is equal to i or not; that is, whether or not IN(x.sub.i,j,q)=i. SEL is a data selected unit for selecting S(x.sub.i,j,q) from SC in response to the reception of a signal from INIT representing that the starting position of a bunsetsu-phrase x.sub.i,j,q is equal to i, and writing the value into TABLE1(i,j,q) realized as T1. ADD1 is an adder for adding a value stored in the TABLE1(i,k,p) to the value of PEN(x.sub.i,k,p,x.sub.i,j,q). MIN1 is a minimum value detector for detecting a minimum value of the output from the adder ADD1 when said p is varied, and for detecting p which gives said minimum value. ADD2 is an adder for adding the output from the minimum value detector MIN1 to the value in TABLE1(k+1,j,q). MIN2 is a minimum value detector for detecting a minimum value of the output from the adder ADD2 when said k is varied, and for detecing k which gives said minimum value. CONT designates a control unit for controlling the whole apparatus to work in the predetermined operation sequence, and, for instance, comprises a central processing unit CPU, a memory MEM1 in the form of a ROM for storing therein the control sequence and a working memory MEM2 in the form of a RAM. The results of the calculation written into RAMs T1 and T2 are read out from output terminals O.sub.1 and O.sub.2, respectively. FIGS. 2A and 2B show a flowchart illustrating an example of control sequence stored in MEM1 beforehand in the first embodiment shown in FIG. 1 for obtaining the pair of the optimum segmenting point and the optimum bunsetsu-phrase used to decide the optimum dependency structure on the optimum bunsetsu-phrase sequence and its degree of acceptability. First this flowchart will be described. In addition to the flowchart as shown in FIGS. 2A and 2B, two three-dimensional tables TABLE1(i,j,q) and TABLE2(i,j,q) having, as shown in FIGS. 3A and 3B, the same number of columns and rows as the total number N of the phonetic symbol positions under consideration and the same number of sections as the number NUM(i,j) of elements of bunsetsu-phrase sets B(i,j) (1.ltoreq.i.ltoreq.j.ltoreq.N,1.ltoreq.q.ltoreq.NUM(i,j)) are required. The suffixes of each table represents the positions of corresponding row, column and section from the left to right, respectively. TABLE1(i,j,q) is used to store the values of OPT(i,j,x.sub.i,j,q) while TABLE2(i,j,q) is used to store the pairs of the optimum segmenting point and the optimum bunsetsu-phrase number for i,j,x.sub.i,j,q. Operations in the flowchart proceed as follows: .circle.1 In steps S1-S13 of the flowchart as shown in FIGS. 2A and 2B, the column number j of each table is incremented from 1 to N and the operation .circle.2 to be described below is carried out for each column. .circle.2 In steps S2-S11, the row number i is decremented from j to 1 and then the following operation .circle.3 is executed. .circle.3 In steps S3 to S9, q is incremented from 1 to NUM(i,j) and the following operations (1) and (2) are executed. (1) If IN(x.sub.i,j,q)=1 in step S4, then the following operation [F1] is executed in step S7: [F1] The value of S(x.sub.i,j,q) is stored in TABLE1(i,j,q). (2) If IN(x.sub.i,j,q)>i in step S4, [F2] and [F3] below are executed in step S5 and step S6, respectively. [F2] ##EQU20## is calculated and the value is stored in TABLE1(i,j,q). [F3] The pair (k,p) of k and p which attain the minimum value in [F2] is stored in TABLE2(i,j,q). According to the above-mentioned procedure, each row, each column and each section in TABLE1 and TABLE2 are filled with the calculated values sequentially. When j>N in step S13, the calculation is completed, and the values of OPT(1,N,x.sub.1,N,q) (1.ltoreq.p.ltoreq.NUM(1,N)) are stored in TABLE1(1,N,q). Since the information concerning the optimum segmenting points and optimum bunsetsu-phrase numbers is stored in TABLE2, the optimum bunsetsu-phrase sequence and the optimum dependency structure thereon can be composed using the above information in the manner described in item 1.3.1. 2.2. The Case of a Sequence of Bunsetsu-Phrase Sets The second embodiment of an apparatus to carry out the present invention is shown FIG. 4. Since a sequence of bunsetsu-phrase sets is a special form of a bunsetsu-phrase lattice, it is apparent that the above-described method for the general bunsetsu-phrase lattice is also applicable to the present case. However, the following method according to item 1.2.2 is more efficient. In the following description, the positions of bunsetsu-phrase sets are represented by 1, 2, . . . , N, the number of elements of the bunsetsu-phrase set B.sub.j, by NUM(j); and the elements of B.sub.j, by x.sub.j,1, x.sub.j,2, . . . , x.sub.j,NUM(j). In FIG. 4, SC designates a buffer memory such as a RAM for storing the value of the degree of reliability for each bunsetsu-phrase transferred through an input terminal i.sub.1 ; and BUF, a buffer memory such as a RAM for storing bunsetsu-phrase sets transferred through a bunsetsu-phrase input terminal i.sub.2. When the present invention is applied to speech recognition, for instance, each bunsetsu-phrase candidate derived from a speech recognizer is transferred through the input terminal i.sub.2, while the degree of reliability of the corresponding bunsetsu-phrase is transferred through the terminal i.sub.1, PE is a unit for calculating the degree of dependency PEN(x,y) between two bunsetsu-phrases x and y read out from BUF. T1 and T2 are RAMs for realizing tables TABLE1 and TABLE2 in the flowchart shown in FIGS. 5A and 5B. ADD1 is an adder for adding a value stored in TABEL1(i,k,p) to the value of PEN (x.sub.k,p,x.sub.j,q). MIN1 is a minimum value detector for detecting a minimum value of the output from ADD1 when said p is varied, and for detecting p which attains the minimum value. ADD2 is another adder for adding the output from the minimum value detector MIN1 to the value in TABLE1(k+1,j,q). MIN2 is a minimum value detector for detecting the minimum value of the output from the adder ADD2 when said k is varied, and for detecting k which attains the minimum value. CONT is a control unit for controlling the whole apparatus to work in the predetermined operation sequence and, for instance, comprises a central processing unit CPU, a memory MEM1 in the form of a ROM for storing the control sequence and a working memory MEM2 in the form of a RAM. The results of calculation written into RAM T1 and T2 are read out from output terminals O.sub.1 and O.sub.2, respectively. FIGS. 5A and 5B show a flowchart illustrating an example of the control sequence stored in the MEM1 for obtaining the pair of the optimum segmenting point and the optimum bunsetsu-phrase used to decide the optimum dependency structure on the optimum bunsetsu-phrase sequence, and its degree of acceptability. This flowchart will be described below. In addition to the flowchart as shown in FIG. 5, two three-dimentional tables TABLE1(i,j,q) and TABLE2(i,j,q) (1.ltoreq.i.ltoreq.j.ltoreq.N,1.ltoreq.q.ltoreq.NUM(j)) having the same number of columns and rows as the length N of the sequence of bunsetsu-phrase sets under consideration and the same number of sections as the number of elements NUM(j) of the j-th bunsetsu-phrase set. The suffixes of each table represent the position of corresponding column, row and section from left to right, respectively. TABLE1(i,j,q) stores the values of OPT(i,j,x.sub.j,q) while TABLE2(i,j,q) stores the pairs of the optimum segmenting point and the optimum bunsetsu-phrase number for i,j,x.sub.j,q. Operations in the flowchart proceed as follows: .circle.1 In steps S1-S13 of the flowchart as shown in FIGS. 5A and 5B, the column number j of each table is incremented from 1 to N and the operation .circle.2 to be described below is carried out for each column. .circle.2 In steps S2-S11, the row number i is decremented from j to 1 and .circle.3 below is executed. .circle.3 In steps S3-S9, q is incremented from 1 to NUM(j) and (1) and (2) to be described below are executed. (1) If j=i in step S4, then the following operation is executed in step S7: [F1'] The value of S(x.sub.j,q) is stored in TABLE1(i,j,q). (2) if j>i in step S4, [F2'] to be described below is executed in step S5 and [F3'] to be described below is executed in step S6. [F2' min(i.ltoreq.k.ltoreq.j-1)[min(1.ltoreq.p.ltoreq.NUM(k)) [TABLE1(i,k,p)+PEN(x.sub.k,p, x.sub.j,q)]+TABLE1(k+1,j,q)] is calculated and the value is stored in TABLE1(i,j,q). [F3']. The pair (k,p) of k and p which attain the minimum value in [F2'] is stored in TABLE2(i,j,q). Each column, each row and each section of TABLE1 and TABLE2 are filled with the calculated value sequentially as described above. When j>N in step S13, the calculation is completed and the values of OPT(1,N,x.sub.N,q) (1.ltoreq.q.ltoreq.NUM(N)) are stored in TABLE1(1,N,q). Since the information concerning the optimum segmenting points and the optimum bunsetsu-phrase numbers is stored in TABLE2, the optimum bunsetsu-phrase sequence and optimum dependency structure thereon can be composed according to the method described in the above item 1.3.2. 2.3. Composition of the Optimum Dependency Structure In order to actually carry out the present invention, in addition to the flowchart as shown in FIGS. 2A and 2B or in FIGS. 5A and 5B, a mechanism for composing the optimum bunsetsu-phrase sequence and the optimum dependency structure thereon is needed, but the gist of the present invention is to calculate the contents in TABLE1 and TABLE2 so that the description of the mechanism for composing the optimum bunsetsu-phrase sequence and the optimum dependency structure thereon is limited to the scope of item 1.3.1 or 1.3.2. It should be noted, however, that when the calculation of the contents in TABLE1 and TABLE2 is accomplished, the greater part of the computation required for obtaining the optimum bunsetsu-phrase sequence and the optimum dependency structure thereon is completed. 2.4. Remarks on Non-uniqueness of the Pair of Optimum Segmenting Point and Optimum Bunsetsu-Phrase Sometimes, there exist more than one pairs of k and p which attain the minimum value in [F2] or [F2']. In this case, TABLE2(i,j,q) is so designed and constructed as to store more than one pairs of numerical values, and all such pairs are to be stored in TABLE2(i,j,q) in [F3] or [F3']. Even if the flowchart as shown in FIG. 2 or 5 is so modified as to implement this, the amount of computation remains almost unchanged. 2.5. Features of the Invention As described above, in selecting the optimum bunsetsu-phrase sequence from the given bunsetsu-phrase sets B(i,j) (1.ltoreq.i.ltoreq.j.ltoreq.N) corresponding to phonetic symbol position i and j satisfying the condition that the starting position of the first bunsetsu-phrase equals 1, the ending position of the last bunsetsu-phrase equals N and the ending position of a bunsetsu-phrase except the last bunsetsu-phrase added with 1 equals the starting position of the succeeding bunsetsu-phrase, and in obtaining the optimum dependency structure thereon and the degree of acceptability thereof, the feature of the present invention resides in that: the acceptability of the optimum dependency structure and the information required for composing the optimum dependency structure are calculated and stored progressively from shorter intervals to longer intervals with the last bunsetsu-phrase fixed; and in calculating the acceptability of the optimum dependency structure and in obtaining the information required for composing the optimum dependency structure with the last bunsetsu-phrase fixed as x.sub.i,j,q for the interval [i, j] (1.ltoreq.i, j.ltoreq.N), the same kind of information already calculated and stored for the interval [i, k], the same kind of information already calculated and stored for the interval [k+1, j] and the degree of dependency between the bunsetsu-phrases x.sub.i,k,p .epsilon.B(i,k) and x.sub.i,j,q for every possible k and p are the only information referred. The above embodiments have been described in conjunction with the process for obtaining a minimum value because it is assumed that a smaller value of S implies a higher degree of reliability and a smaller value of PEN implies a higher a degree of dependency between two bunsetsu-phrases. However, when a greater value of S implies a higher degree of reliability and a greater value of PEN implies a higher degree of dependency, the process for obtaining a maximum value should be carried out instead of obtaining a minimum value. 3. Advantageous Effects of the Invention The fundamental operations to be performed in the present invention are a comparison operation and addition operation so that the number of operations carried out by the method in accordance with the present invention is compared with that of the enumeration method which is the prior art method. In order to evaluate the number of operations, the followings are assumed: (1) In order to calculate PEN(x,y), J addition operations are required; (2) In order to add m numerical values, m-1 addition operations are required; (3) In order to find a minimum value in m numerical values, m-1 comparison operations are required; and (4) the number of elements in bunsetsu-phrase set (in the case of a sequence of bunsetsu-phrase set, B.sub.k : and in the case of bunsetsu-phrase lattice, B(i,j)) are the same for all k, or for all i and j. Then, the number of operations is determined by the following three parameters: M: In the case of a bunsetsu-phrase lattice, the number of elements in B(i,j), and in the case of a sequence of bunsetsu-phrase sets, the number of elements in B.sub.k. N: In the case of a bunsetsu-phrase lattice, the total length of the phonetic symbol string while in the case of a sequence of bunsetsu-phrase sets, the total length of the sequence. The number of operations required for calculating PEN(x,y) expressed in terms of the number of addition operations. Under the above-mentioned assumption, the number of operations is calculated as follows: THE CASE OF GENERAL BUNSETSU-PHRASE LATTICE 3.1.1. The Present Invention (1) The number of addition operations =M.sup.2 (J+1)(N-1)N(N+1)(N+3)/120 +M(N-1)N(N+1)(N+2)/24 (2) The number of comparison operations =M.sup.2 (N-1)N(N+1)(N+2)(N+3)/120 -M(N-1)N(N+1)(N+2)/24 3.1.2. The Enumeration Method (The Prior Art Method) (1) The number of addition operations =.SIGMA.(0.ltoreq.n.ltoreq.N-1) [.sub.N-1 C.sub.n .multidot.{knum(n+1).multidot.(J+1).multidot.n+n}.multidot.M.sup.n+1 ] (2) The number of comparison operations =.SIGMA.(0.ltoreq.n.ltoreq.N-1)[.sub.N-1 C.sub.n .multidot.knum(n+1).multidot.M.sup.n+1)-1 where knum(n) represents the number of dependency structures on a sequence of bunsetsu-phrase which has a length n and is defined in Summary of Invention. The results of the calculations for these numbers of addition operations and comparison operations when J=1, M=5,10 and N=5,10,15 and 20 are shown in TABLE 2.
TABLE 2
______________________________________
The numbers of operations in the case of
the bunsetsu-phrase lattice
M
5 10
N addition comparison
addition
comparison
______________________________________
The 5 3.0 .times. 10.sup.3
1.3 .times. 10.sup.3
1.2 .times. 10.sup.4
5.4 .times. 10.sup.3
present
10 6.7 .times. 10.sup.4
3.1 .times. 10.sup.4
2.6 .times. 10.sup.5
1.3 .times. 10.sup.5
invention
15 4.4 .times. 10.sup.5
2.1 .times. 10.sup.5
1.7 .times. 10.sup.6
8.5 .times. 10.sup.5
20 1.7 .times. 10.sup.6
8.3 .times. 10.sup.5
6.8 .times. 10.sup.6
3.4 .times. 10.sup.6
The 5 4.5 .times. 10.sup.5
5.8 .times. 10.sup.4
1.3 .times. 10.sup.7
1.6 .times. 10.sup.6
enumer-
10 .sup. 1.4 .times. 10.sup.12
.sup. 8.0 .times. 10.sup.10
.sup. 1.1 .times. 10.sup.15
.sup. 6.3 .times. 10.sup.13
ation 15 .sup. 4.6 .times. 10.sup.18
.sup. 1.7 .times. 10.sup.17
.sup. 1.1 .times. 10.sup.23
.sup. 3.9 .times. 10.sup.21
method 20 .sup. 1.7 .times. 10.sup.25
.sup. 4.6 .times. 10.sup.23
.sup. 1.1 .times. 10.sup.31
.sup. 2.9 .times. 10.sup.29
______________________________________
THE CASE OF A SEQUENCE OF BUNSETSU-PHRASE SETS 3.2.1. The Present Invention (1) The number of addition operations =((J+1)M+1)MN(N-1)(N+1)/6 (2) The number of comparison operations =M.sup.2 N(N-1)(N+1)/6-MN(N-1)/2 3.2.2. The Enumeration Method (The Prior Art Method) (1) The number of addition operations =(knum(N).multidot.(J+1).multidot.(N-1)+(N-1)).multidot.M.sup.N (2) The number of comparison operations =knum(N).multidot.M.sup.N -1
TABLE 3
______________________________________
The numbers of operations in the case of
a sequence of bunsetsu-phrase sets
M
5 10
N addition comparison
addition
comparison
______________________________________
The 5 1.1 .times. 10.sup.3
4.5 .times. 10.sup.2
4.2 .times. 10.sup.3
1.9 .times. 10.sup.3
present
10 9.1 .times. 10.sup.3
3.9 .times. 10.sup.3
3.5 .times. 10.sup.4
1.6 .times. 10.sup.4
invention
15 3.1 .times. 10.sup.4
1.3 .times. 10.sup.4
1.2 .times. 10.sup.5
5.5 .times. 10.sup.4
20 7.3 .times. 10.sup.4
3.2 .times. 10.sup.4
2.8 .times. 10.sup.5
1.3 .times. 10.sup.5
The 5 3.6 .times. 10.sup.5
4.4 .times. 10.sup.4
1.2 .times. 10.sup.7
1.4 .times. 10.sup.6
enumer-
10 .sup. 8.5 .times. 10.sup.11
.sup. 4.7 .times. 10.sup.10
.sup. 8.8 .times. 10.sup.14
.sup. 4.9 .times. 10.sup.13
ation 15 .sup. 2.3 .times. 10.sup.18
.sup. 8.2 .times. 10.sup.16
.sup. 7.5 .times. 10.sup.22
.sup. 2.7 .times. 10.sup.21
method 20 .sup. 6.4 .times. 10.sup.24
.sup. 1.7 .times. 10.sup.23
.sup. 6.7 .times. 10.sup.30
.sup. 1.8 .times. 10.sup.29
______________________________________
The results of calculations for these numbers of addition operations and comparison operations when J=1, M=5 and 10, N=5,10,15 and 20 are shown in TABLE 3. As is apparent from TABLE 2 and TABLE 3, the efficiency of the present invention becomes higher as the values of M and N become larger. For instance, in the case of J=1, M=10 an N=20, the present invention brings the number of operations down to about one 10.sup.24 -th to 10.sup.25 -th of the enumeration method.
|
Same subclass Same class Consider this |
||||||||||
