Method and apparatus for "wrong word" spelling error detection and correction5258909Abstract A method of detecting and correcting an error in a string of information signals. When each information signal represents a word, the method detects and corrects spelling errors. The method detects and corrects an error which is a properly spelled word, but which is the wrong (not intended) word. For example, the method is capable of detecting and correcting a misspelling of "HORSE" as "HOUSE". In the spelling error detection and correction method, a first word in an input string of words is changed to form a second word different from a first word to form a candidate string of words. The spellings of the first word and the second word are in the spelling dictionary. The probability of occurrence of the input string of words is compared to the product of the probability of occurrence of the candidate string of words multiplied by the probability of misrepresenting the candidate string of words as the input string of words. If the former is greater than or equal to the latter, no correction is made. If the former is less than the latter, the candidate string of words is selected as a spelling correction. Claims We claim: Description BACKGROUND OF THE INVENTION
TABLE 1
__________________________________________________________________________
Input
Word Trigram
COMPONENTS OF lnP(Ti/Wi)
String Logarithm
for for for for
(Ti = Wi)
Trigrams Probability
Pt = 0.9999
Pt = 0.999
Pt = 0.99
Pt = 0.9
__________________________________________________________________________
I .sub.-- -- I
-3.47634
-0.00010
-0.00100
-0.01005
-0.10536
submit
.sub.-- I submit
-8.47750
-0.00010
-0.00100
-0.01005
-0.10536
that I submit that
-1.23049
-0.00010
-0.00100
-0.01005
-0.10536
is submit that is
-4.74311
-0.00010
-0.00100
-0.01005
-0.10536
what that is what
-3.04882
-0.00010
-0.00100
-0.01005
-0.10536
is is what is
-3.07193
-0.00010
-0.00100
-0.01005
-0.10536
happening
what is happening
-4.88977
-0.00010
-0.00100
-0.01005
-0.10536
in is happening in
-1.72564
-0.00010
-0.00100
-0.01005
-0.10536
this happening in this
-3.84228
-0.00010
-0.00100
-0.01005
-0.10536
case in this case
-2.49284
-0.00010
-0.00100
-0.01005
-0.10536
. this case.
-2.05863
-0.00010
-0.00100
-0.01005
-0.10536
lnP(Wi) =
-39.05735
lnp(Ti/Wi) = -0.0011
-0.0110
-0.1106
-1.1590
__________________________________________________________________________
Candidate
Word Trigram
COMPONENTS OF lnP(Ti/Wc)
String Logarithm
for for for for
(Wc) Trigrams Probability
Pt = 0.9999
Pt = 0.999
Pt = 0.99
Pt = 0.9
__________________________________________________________________________
a .sub.-- -- a
-3.96812
-9.21034
-6.90776
-4.60517
-2.30259
submit
.sub.-- a submit
-10.20667
-0.00010
-0.00100
-0.01005
-0.10536
that a submit that
-3.69384
-0.00010
-0.00100
-0.01005
-0.10536
is submit that is
-4.74311
-0.00010
-0.00100
-0.01005
-0.10536
what that is what
-3.04882
-0.00010
-0.00100
-0.01005
-0.10536
is is what is
-3.07193
-0.00010
-0.00100
-0.01005
-0.10536
happening
what is happening
-4.88977
-0.00010
-0.00100
-0.01005
-0.10536
in is happening in
-1.72564
-0.00010
-0.00100
-0.01005
-0.10536
this happening in this
-3.84228
-0.00010
-0.00100
-0.01005
-0.10536
case in this case
-2.49284
-0.00010
-0.00100
-0.01005
-0.10536
. this case.
-2.05863
-0.00010
-0.00100
-0.01005
-0.10536
lnP(Wc) =
-43.74165
lnp(Ti/Wc) = -9.2113
-6.9178
-4.7057
-3.3562
__________________________________________________________________________
EXAMPLE II In this example, the input word string T.sub.i =W.sub.i is: "I submit that is what is happening in this case". The first word T.sub.1 =W.sub.1 whose spelling is being checked is "submit". The word "submit" has two simple misspellings: "summit" or "submits". In this example, the second word W.sub.2 is selected to be "summit". Therefore, the candidate word string W.sub.c (the candidate sentence) is "I summit that is what is happening in this case." Table 3 shows the logarithms of the probabilities, and Table 4 provides the totals for Table 3. Again, for each value of P.sub.t, the original sentence is selected over the candidate.
TABLE 3
__________________________________________________________________________
Input
Word Logarithm
COMPONENTS OF lnP(Ti/Wi)
String Trigram
for for for for
(Ti = Wi)
Trigrams Probability
Pt = 0.9999
Pt = 0.999
Pt = 0.99
Pt = 0.9
__________________________________________________________________________
I .sub.-- -- I
-3.47634
-0.00010
-0.00100
-0.01005
-0.10536
submit
.sub.-- I submit
-8.47750
-0.00010
-0.00100
-0.01005
-0.10536
that I submit that
-1.23049
-0.00010
-0.00100
-0.01005
-0.10536
is submit that is
-4.74311
-0.00010
-0.00100
-0.01005
-0.10536
what that is what
-3.04882
-0.00010
-0.00100
-0.01005
-0.10536
is is what is
-3.07193
-0.00010
-0.00100
-0.01005
-0.10536
happening
what is happening
-4.88977
-0.00010
-0.00100
-0.01005
-0.10536
in is happening in
-1.72564
-0.00010
-0.00100
-0.01005
-0.10536
this happening in this
-3.84228
-0.00010
-0.00100
-0.01005
-0.10536
case in this case
-2.49284
-0.00010
-0.00100
-0.01005
-0.10536
. this case.
-2.05863
-0.00010
-0.00100
-0.01005
-0.10536
lnP(Wi) =
-39.05735
lnp(Ti/Wi) = -0.0011
-0.0110
-0.1106
-1.1590
__________________________________________________________________________
Candidate
Word Trigram
COMPONENTS OF lnP(Ti/Wc)
String Logarithm
for for for for
(Wc) Trigrams Probability
Pt = 0.9999
Pt = 0.999
Pt = 0.99
Pt = 0.9
__________________________________________________________________________
I .sub.-- -- I
-3.47634
-0.00010
-0.0010
-0.01005
-0.10536
submit
.sub.-- I submit
-18.48245
-9.90349
-7.60090
-5.29832
-2.99573
that I submit that
-5.49443
-0.00010
-0.00100
-0.0.1005
-0.10536
is submit that is
-3.50595
-0.00010
-0.00100
-0.0.1005
-0.10536
what that is what
-3.04882
-0.00010
-0.00100
-0.0.1005
-0.10536
is is what is
-3.07193
-0.00010
-0.00100
-0.0.1005
-0.10536
happening
what is happening
-4.88977
-0.00010
-0.00100
-0.0.1005
-0.10536
in is happening in
-1.72564
-0.00010
-0.00100
-0.0.1005
-0.10536
this happening in this
-3.84228
-0.00010
-0.00100
-0.0.1005
-0.10536
case in this case
-2.49284
-0.00010
-0.00100
-0.0.1005
-0.10536
. this case.
-2.05863
-0.00010
-0.00100
-0.0.1005
-0.10536
lnP(Wc) =
-52.08908
lnp(Ti/Wc) = -9.9045
-7.6109
-5.3988
-4.0493
__________________________________________________________________________
EXAMPLE III In this example, the input word string T.sub.i =W.sub.i (the original typed sentence) is now "a submit that is what is happening in this case." The first word T.sub.1 =W.sub.1 whose spelling is being checked is "a". The word "a" has the following ten simple misspellings: "I", "at", "as", "an", "am", "ad", "ab", "pa", "or", "ha". A second word W.sub.2 is selected to be "I". Therefore, the candidate string is "I submit that is what is happening in this case." The logarithms of the individual probabilities are shown in Table 5. Note that the probability P(T.sub.1 .vertline.W.sub.2) is equal to (P.sub.t /M) (where M equals 10.) Table 6 provides the totals from Table 5. For all values of P.sub.t, except P.sub.t =0.9, the original sentence is selected over the candidate. When P.sub.t =0.9, the candidate is selected over the original.
TABLE 5
__________________________________________________________________________
Input
Word Trigram
COMPONENTS OF lnP(Ti/Wi)
String Logarithm
for for for for
(Ti = Wi)
Trigrams Probability
Pt = 0.9999
Pt = 0.999
Pt = 0.99
Pt = 0.9
__________________________________________________________________________
a .sub.-- -- a
-3.96812
-0.00010
-0.00100
-0.01005
-0.10536
submit
.sub.-- a submit
-10.20667
-0.00010
-0.00100
-0.01005
-0.10536
that a submit that
-3.69384
-0.00010
-0.00100
-0.01005
-0.10536
is submit that is
-4.74311
-0.00010
-0.00100
-0.01005
-0.10536
what that is what
-3.04882
-0.00010
-0.00100
-0.01005
-0.10536
is is what is
-3.07193
-0.00010
-0.00100
-0.01005
-0.10536
happening
what is happening
-4.88977
-0.00010
-0.00100
-0.01005
-0.10536
in is happening in
-1.72564
-0.00010
-0.00100
-0.01005
-0.10536
this happening in this
-3.84228
-0.00010
-0.00100
-0.01005
-0.10536
case in this case
-2.49284
-0.00010
-0.00100
-0.01005
-0.10536
. this case.
-2.05863
-0.00010
-0.00100
-0.01005
-0.10536
lnP(Wi) =
-43.74165
lnp(Ti/Wi) = -0.0011
-0.0110
-0.1106
-1.1590
__________________________________________________________________________
Candidate
Word Logarithm
COMPONENTS OF lnP(Ti/Wc)
String Trigram
for for for for
(Wc) Trigrams Probability
Pt = 0.9999
Pt = 0.999
Pt = 0.99
Pt = 0.9
__________________________________________________________________________
I .sub.-- -- I
-3.47634
-11.51293
-9.21034
-6.90776
-4.60517
submit
.sub.-- I submit
-8.47750
-0.00010
-0.00100
-0.01005
-0.10536
that I submit that
-1.23049
-0.00010
-0.00100
-0.01005
-0.10536
is submit that is
-4.74311
-0.00010
-0.00100
-0.01005
-0.10536
what that is what
-3.04882
-0.00010
-0.00100
-0.01005
-0.10536
is is what is
-3.07193
-0.00010
-0.00100
-0.01005
-0.10536
happening
what is happening
-4.88977
-0.00010
-0.00100
-0.01005
-0.10536
in is happening in
-1.72564
-0.00010
-0.00100
-0.01005
-0.10536
this happening in this
-3.84228
-0.00010
-0.00100
-0.01005
-0.10536
case in this case
-2.49284
-0.00010
-0.00100
-0.01005
-0.10536
. this case.
-2.05863
-0.00010
-0.00100
-0.01005
-0.10536
lnP(Wc) =
-39.05735
lnp(Ti/Wc) = -11.5139
-9.2203
-7.0083
-5.6588
__________________________________________________________________________
EXAMPLE IV In this example, the input word string T.sub.i =W.sub.i is "I summit that is what is happening in this case." The first word T.sub.1 =W.sub.1 whose spelling is being checked is "summit". The word "summit" has two simple misspellings: "submit" or "summit". The second word W.sub.2 is selected to be "submit". Therefore, the candidate word string W.sub.c is "I submit that is what is happening in this case." Table 7 shows the logarithms of the estimated probabilities of the trigrams and of correctly spelling or incorrectly spelling each word. Since M=2, the probability P(T.sub.1 .vertline.W.sub.2)=(P.sub.t /2). Table 8 provides the totals from Table 7. For all values of P.sub.t, the candidate sentence is selected over the original typed sentence. A correction is therefore made in all cases.
TABLE 7
__________________________________________________________________________
Input
Word Trigram
COMPONENTS OF lnP(Ti/Wi)
String Logarithm
for for for for
(Ti = Wi)
Trigrams Probs.
Pt = 0.9999
Pt = 0.999
Pt = 0.99
Pt = 0.9
__________________________________________________________________________
I .sub.-- -- I
-3.47634
-0.00010
-0.00100
-0.01005
-0.10536
submit
.sub.-- I submit
-18.48245
-0.00010
-0.00100
-0.01005
-0.10536
that I submit that
-5.49443
-0.00010
-0.00100
-0.01005
-0.10536
is submit that is
-4.74311
-0.00010
-0.00100
-0.01005
-0.10536
what that is what
-3.04882
-0.00010
-0.00100
-0.01005
-0.10536
is is what is
-3.07193
-0.00010
-0.00100
-0.01005
-0.10536
happening
what is happening
-4.88977
-0.00010
-0.00100
-0.01005
-0.10536
in is happening in
-1.72564
-0.00010
-0.00100
-0.01005
-0.10536
this happening in this
-3.84228
-0.00010
-0.00100
-0.01005
-0.10536
case in this case
-2.49284
-0.00010
-0.00100
-0.01005
-0.10536
. this case.
-2.05863
-0.00010
-0.00100
-0.01005
-0.10536
lnP(Wi) =
-52.08908
lnp(Ti/Wi) = -0.0011
-0.0110
-0.1106
-1.1590
__________________________________________________________________________
Candidate
Word Logarithm
COMPONENTS OF lnP(Ti/Wc)
String Trigram
for for for for
(Wc) Trigrams Probability
Pt = 0.9999
Pt = 0.999
Pt = 0.99
Pt = 0.9
__________________________________________________________________________
I .sub.-- -- I
-3.47634
-11.51293
-9.21034
-6.90776
-4.60517
submit
.sub.-- I submit
-8.47750
-0.00010
-0.00100
-0.01005
-0.10536
that I submit that
-1.23049
-0.00010
-0.00100
-0.01005
-0.10536
is submit that is
-4.74311
-0.00010
-0.00100
-0.01005
-0.10536
what that is what
-3.04882
-0.00010
-0.00100
-0.01005
-0.10536
is is what is
-3.07193
-0.00010
-0.00100
-0.01005
-0.10536
happening
what is happening
-4.88977
-0.00010
-0.00100
-0.01005
-0.10536
in is happening in
-1.72564
-0.00010
-0.00100
-0.01005
-0.10536
this happening in this
-3.84228
-0.00010
-0.00100
-0.01005
-0.10536
case in this case
-2.49284
-0.00010
-0.00100
-0.01005
-0.10536
. this case.
-2.05863
-0.00010
-0.00100
-0.01005
-0.10536
lnP(Wc) =
-39.05735
lnp(Ti/Wc) = -9.9045
-7.6109
-5.3988
-4.0493
__________________________________________________________________________
|
Same subclass Same class Consider this |
||||||||||
