Kana-Kanji conversion system and a method for producing a Kana-Kanji conversion dictionary5745881Abstract A conversion function for a large Kana-Kanji conversion dictionary. According to the present invention, a Kana-Kanji conversion dictionary is prepared together as a program that includes a search function. Especially in an operating system that supports a dynamic link library routine (DLL), a Kana-Kanji conversion dictionary program is compiled as a DLL, and the dictionary search function is exported so that it can be employed by another program to be usable. Claims What we claim is: Description BACKGROUND OF THE INVENTION
TABLE 1
______________________________________
typedef struct tagDictInfo {
unsigned char *key;
unsigned char *data;
unsigned short grammar;
unsigned short freq;
unsigned short reserved;
} DICT.sub.-- INFO, *PDICT.sub.-- INFO;
______________________________________
In this structure, "key" denotes a key (phonemic) that is employed for searching; "data" denotes data (phrase) that correspond to the keys; "grammar" denotes a grammatical category (part of speech) that is used by the presumption engine of the Kana-Kanji conversion system; and "freq" denotes a frequency at which a phrase appears. The contents of a dict.c source file, which are described below, include the array dict.sub.-- data of actual dictionary entries, and a mathematical function SearchDictData that searches through a dictionary and returns a pointer to a dictionary entry that is found. The dictionary entries and the search function may be designated as individual source files.
TABLE 2
______________________________________
#include "dict.h"
DICT.sub.-- INFO dict.sub.-- data›70000! = {
{ ji .multidot. do .multidot. u .multidot. sha , .vertline.jidousha.vertli
ne., NOUN, HIGH, 0},
{ no .multidot. ru , .vertline.no.vertline. .multidot. ru , VERB, HIGH,
0},
. . .
};
PDICT.sub.-- INFO SearchDictData (key)
unsigned char *key;
int num; // int represents 32 bits in a 32-bit mode
. . .
return &dict.sub.-- data›num!;
}
______________________________________
In the above described code, { ji.do.u.sha , .vertline.jidousha.vertline., NOUN, HIGH, O} is one of the dictionary entries. In the structure, DICT.sub.-- INFO, ji.do.u.sha corresponds to "phonemic characters," .vertline.jidousha.vertline. corresponds to a "phrase," NOUN corresponds to grammar, and HIGH corresponds to freq. In this embodiment, HIGH and LOW, which respectively indicate high frequency and low frequency, correspond to freq. Besides NOUN (noun), grammar can have other values, such as VERB (verb) or ADJ (adjective). Although the values that grammar and freq can take are not described, they are defined in advance in a header file by using enum. Based on a pointer to a series of characters, which is a "key" that was provided as an argument, the function SearchDictData(key) calculates the number, num, of the dictionary arrangement, dict.sub.-- data, that has a key (phonemic) which matches the initial "key," and returns the address with dict.sub.-- data›num!, thereby permitting dict.sub.-- data›num! to be accessed directly. One method for performing the SearchDictData(key) sorts the entry dict.sub.-- data›70000! in advance at a "phonemic character" portion. Then, based on a pointer to a series of characters, "key," that is provided as an argument, a two-character search is employed for the "phonemic" character portion of the dict.sub.-- data›70000! entry, and a number, num, of an entry that contains corresponding phonemic characters is acquired. According to another method, a dictionary entry is sorted in advance at the "phonemic character" portion, an index table is prepared so as to return the entry number of a first dictionary entry that includes phonemic characters that correspond to the first two phonemic characters (e.g., ji.do in ji.do.u.sha ), and a linear search is begun with the entry number for a matched dictionary entry. There can be various other method for fast access. To achieve these methods, dictionary program source code, such as in dict.c, can be automatically generated by using a tool that sorts dictionary entries that are based on phonemic characters, extracts entries from a previously prepared database for the dictionary, and prepares an index table in advance that is based on the extracted dictionary entries. The contents of dict.def are as follows. In the module definition file, a function, SearchDictData, is exported to indicate that this is applicable to other programs.
TABLE 3
______________________________________
LIBRARY DICT
CODE LOADONCALL MOVEABLE DISCARDABLE
DATA LOADONCALL MOVEABLE SINGLE
EXPORTS WEP @1 RESIDENTNAME
SearchDictData
@2
______________________________________
These are complied and linked together to prepare, for example, dict.dll, a dynamic link library routine that includes both the contents of a dictionary and the search function. It should be noted that the next line SearchDictData of EXPORTS in dict.def is a program into which dict.dll is loaded, and shows that the SearchDictData function can be used. B2. Example program for calling a dictionary program An example program for calling a dictionary program is as follows.
TABLE 4
______________________________________
#include "dict.h"
HANDLE hLib;
void Init(void)
.....
hLib = LoadLibrary(DLL dictionary name);
IpSearchDictData =
GetProcAddress(hLib, "SearchDictData");
.....
}
void End(void)
{
.....
FreeLibrary(hLib);
.....
}
void main()
{
Init();
.....
while ( bProcess) {
.....
/* a character was input at the keyboard */
if (input character is a conversion key) {
if ( (prDictData=(* IpSearchDictData) (key)) | =
NULL) {
/* process when data corresponding to key
is present */
}
else {
/* process when data corresponding to key
is not present */
}
}
else {
/* add the input character to key */
}
.....
}
.....
End();
}
______________________________________
The processing of the present invention will now be described while referring to the above described program code and the flowchart in FIG. 2. In the program code, the DLL dictionary name in hLib=LoadLibrary(DLL dictionary name) in Inito is, for example, the file "dict.dll" that is prepared above. With this code, the file dict.dll is loaded into the main memory. This process corresponds to the procedure at step 202 in the flowchart in FIG. 2. IpSearchDictData=GetProcAddress(hLib,"SearchDictData") is a procedure for acquiring an address of a function that is to be called, SearchDictData in dict.dll. FreeLibrary(hLib) in Endo is a procedure for releasing the DLL dictionary that is loaded at hLib=LoadLibrary(DLL dictionary name). In the program code, bProcess is a flag that is set to "1" all during the operation of the Kana-Kanji conversion system. In other words, during the operation of the KanaKanji conversion system, the statements that are enclosed in the "while" loop between step 204 and step 218 are executed repeatedly. In the while loop, when a conversion key is depressed after a character is input at the keyboard (step 202 in FIG. 2), the determination at step 206 is affirmative, and at step 208, the entry in the DLL dictionary is searched for by calling the function prDictData=(* IpSearchDictData)(key). If the result is not NULL, it is assumed that the dictionary entry that corresponds to the key was found at step 212, and a process that is employed when data is found that corresponds to the key, e.g., a process for adding the resultant data to a phrase, is executed at step 214. Although it is not represented in the above program code, a series of characters that is pointed to by the "key" in prDictData=(* IpSearchDictData)(key) is not always the entire series of characters that are input by keys. Generally, a program called a presumption engine is also included in the above program code, or in other source code that is normally compiled. The presumption engine divides the entire series of characters input by keys into proper segments that correspond to the series of characters, and accesses a Kana-Kanji conversion dictionary multiple times by employing as "keys" the individual segments of the series of characters. If the result obtained by calling the function prDictData=(* IpSearchDictData)(key) is NULL, it is assumed that the dictionary entry that corresponds to the key has not been found. At step 216, therefore, another segment of the series of characters is selected by the presumption engine, or another process that is employed when the key does not correspond to the dictionary entry is executed. If, at step 206, the input character is not a code that has been acquired as the result of the depression of a conversion key, only a process that involves the adding of the input character to a series of characters pointed to by a key, is performed (step 210), and program control returns to step 204. The termination of the character input at step 218 is performed, for example, by again depressing a Kanji key. Then, the flag bProcess is set to "0," and the process exits the while loop. In the flowchart in FIG. 2, the program control advances from step 218 to step 220. At step 220, the Kana-Kanji conversion program DLL is released by calling ENDO. In this embodiment, a single Kana-Kanji dictionary DLL is loaded and released; however, a plurality of Kana-Kanji DLLs can be sequentially loaded, with the GetProcAddress function being employed to acquire an address for calling the search function of the individual Kana-Kanji dictionary DLLS, and at step 212, the above described search can be performed for each Kana-Kanji dictionary DLL that has been loaded. In this case, at step 220, all the loaded Kana-Kanji dictionary DLLs are released. This process is required when there are one or more special Kana-Kanji dictionaries for medical terms, computer terms, etc., employed in addition to a basic Kana-Kanji dictionary. A basic Kana-Kanji dictionary DLL may usually be loaded and other special Kana-Kanji dictionary DLLs may be loaded or released at the user's discretion or as specified in the user's setup. Further, a comparatively small user-defined Kana-Kanji conversion dictionary, which enables a user to arbitrarily add "phonemic characters" or "phrases," is provided not in a program form like a DLL, but is instead provided as a data file that is a part of the conventional Kana-Kanji conversion system. C. Other embodiment In the above described embodiment, the Kana-Kanji dictionary program is provided as a DLL, which is loaded into the main memory as needed by using the standard function of the operating system. A standard, single task operating system such as MS-DOS (Microsoft trademark) does not include a function for loading a DLL into the main memory. However, such MS-DOS systems that employ a CPU, such as the 80386 or the 80486, which can use an extended memory of 1 MB or larger, that include a main memory of at least 4 MB and provide as an API a function such as a DOS extender for loading a program into an extended memory, can use a Kana-Kanji dictionary program in an EXE form that is generated by compiling a source file, which includes the following dictionary entry and search functions.
TABLE 5
______________________________________
#include "dict.h"
DICT.sub.-- INFO dict.sub.-- data›70000! = {
{ ji.multidot.do.multidot.u.multidot.sha , .vertline.jidousha.vertline.,
NOUN, HIGH, 0},
{ no.multidot.ru , .vertline.no.vertline..multidot. ru , VERB, HIGH, 0},
.....
};
PDICT.sub.-- INFO SearchDictData(key)
unsigned char *key;
int num;
.....
return &dict.sub.-- data›num!;
}
void main()
{
.....
while ( bProcess) {
.....
/* a character was input at the keyboard */
if (an input character is a conversion key) {
if ( (prDictData=(* SearchDictData(key)) |=
NULL) {
/* process when data corresponding to the
key is present */
}
else {
/* process when data corresponding to the
key is not present */
}
}
else {
/* add the input character to the key */
}
.....
}
.....
}
______________________________________
D. Specific example of a search process When a user inputs ji.do.u.sha by using keys and depresses a conversion key, the presumption engine properly divides a series of input characters and searches the Kana-Kanji DLL.
TABLE 6
______________________________________
Key Data
______________________________________
ji do u sha
.vertline.ji.vertline. .vertline.do.vertline. .vertline.u.
vertline. .vertline.sha.vertline.
ji do .multidot. u sha
.vertline.ji.vertline. .vertline.dou.vertline. .vertline.s
ha.vertline.
ji do .multidot. u .multidot. sha
.vertline.ji.vertline. .vertline.dousha.vertline.
ji .multidot. do u sha
(a key for ji .multidot. do is not present)
ji .multidot. do u .multidot. sha
(a key for ji .multidot. do is not present)
ji .multidot. do .multidot. u .multidot. sha
.vertline.jidou.vertline. .vertline.sha.vertline.
ji .multidot. do .multidot. u .multidot. sha
.vertline.jidousha.vertline.
______________________________________
Through the process steps that are shown in the above table, .vertline.jidousha.vertline. finally is selected. When the conversion key is depressed again, another appropriate phrase (if present) is searched for. As described above, according to the present invention, since a Kana-Kanji dictionary can be prepared in an executable program form, such as a DLL or an EXE, the following substantial effects can be obtained. (1) Even a Kana-Kanji dictionary that is larger than the actually available storage can be loaded into the main memory by using the standard virtual storage control function of an operating system. A designer of a Kana-Kanji conversion system does not have to write the special code that is required to prepare an independent virtual storage control function. (2) Since as much as is possible of even a large Kana-Kanji dictionary is loaded into the main memory, a search process can be executed at high speed. (3) Since the same language that is used for describing the presumption engine can be used to prepare a Kana-Kanji dictionary, the presumption engine can easily interface with the Kana-Kanji dictionary. (4) A special dictionary preparation tool is not required for preparing a Kana-Kanji dictionary, and a common compiler and a linker can be used. While the invention has been shown and described with reference to particular embodiments thereof, it will be understood by those skilled in the art that the foregoing and other changes in form and detail may be made therein without departing from the spirit and scope of the invention.
|
Same subclass Same class Consider this |
||||||||||
