System, method, and program for testing translatability of software by using english multi-byte transliteration creating double-wide characters6425123Abstract A mock translation system, method and program is provided which converts single-byte base-language data and performs a mock translation on it to produce internationalization test data which takes the form of the corresponding base-language data transliterated into and displayed using a double-byte character set to create double-wide characters. The double-wide characters take into account the spacing, i.e., field length, needed to perform an actual translation. This data is stored in localization files and displayed in a software application in place of the English or foreign-language text. By visually inspecting each screen, the programmer or proofreader is able to easily recognize many internationalization errors, without requiring the ability to read any foreign languages. These errors include truncation, alignment, or other formatting errors, and programming errors such as text that is hard-coded, localization files missing from the program build, and text missing from localization files. Claims What is claimed is: Description BACKGROUND OF THE INVENTION
Number of Characters in
English Text Additional Characters Added
Up to 10 20
11-20 20
21-30 24
31-50 30
51-70 28
Over 70 30% of the number of characters
in the English text
This allocation of additional characters accommodates the greatest number of extra characters needed for given ranges, according to the IBM internationalization guidelines. This provides a testing method which will be effective for the widest range of potential language translations. Of course, in practice, those of skill in the art may vary these figures to fit the particular translations that will be made on the software. Any character can be used as the placeholder character, as long as it can be easily distinguished from the text which would normally be present. In one embodiment, the tilde character (.about.) is used, since this character is easy to distinguish, rarely appears in standard English text, and multiple tildes are virtually never placed together in any common English usage. The process of converting the English text into this output is referred to as a mock translation, since the output is stored in the localization file as if it were a translation according to conventional methods. The localization file is then used as if it were a standard file with a foreign translation, but the software application will display the mock translation data instead of the original text or a foreign translation. Referring now to FIG. 2, since the mock translation has distinct beginning and end characters, it becomes a simple process for the user or programmer to check each screen of the executing application to determine if any characters are missing from the beginning or end of the "mock-translated" text. FIG. 2 shows an exemplary computer display 200, which has been built using mock-translated localization files. In this figure, the "Administrators" label 210 appears as it should after mock-translation. Note that the label begins with an open bracket, then a series of tilde placeholder characters appears before the English text, and then a close bracket ends the label. Here, we see that after translation to a foreign language, label 210 will display correctly. Conversely, the "UserLocator" label 220 has not been properly mock-translated, as it appears normal, without brackets or placeholder characters. Since this is the case, it is clear that label 220 would not properly translate to a foreign language; it would appear in English exactly as it does here. The mock translation has allowed this problem to be seen much earlier in the internationalization process, well before the software is actually translated. From such a visual inspection, the error can be identified as one in which the text may have been "hard-coded" into the program. With reference now to FIG. 3, another exemplary computer display 300 is shown. In this figure, note that the "interps" label 310 has been properly mock-translated, as described above. Label 320, however, has not been properly translated. Here, it is immediately apparent that the "Objects" label has been truncated after expansion; the open bracket and placeholder characters are present, but the English text is truncated and no close bracket appears. This type of error indicates that the programmer has not allocated enough room on the display for the translated label; while it appears correctly in English, in some languages it would show an error. Note that even if the entire English word were present, the absence of a closing bracket would indicate that in actual translation, at least the last character of the translated word could be truncated. Label 330 shows a similar problem. Note that here, only the open bracket and the placeholder tildes are shown; this indicates that the text itself has been forced to scroll off the screen. This label must therefore be moved within the software application if it is to appear properly in the final translated product. Again, the error in label 330 is clearly apparent after the mock translation of the preferred embodiment has been performed. Without using the mock translation method, this error would simply not have appeared on-screen until after translation, and the error would therefore be very difficult to detect until very late in the software development process. FIG. 4 shows a sample application display screen 400. Note that this screen is entirely in standard English, including each of the "buttons" at the bottom of the screen, e.g., button 410, and including menu options 420. Referring now to FIG. 5 (and with reference also to FIG. 4), since the mock translation has distinct beginning and end characters, it becomes a simple process for the user or programmer to check each screen of the executing application to determine if any characters are missing from the beginning or end of the "mock-translated" text. Furthermore, since the mock-translated text has been expanded, using placeholder characters, to meet internationalization guidelines, it is also now a simple matter to examine each screen for alignment errors or other formatting errors. Any hard-coded text, which has not been put through the mock-translation process, will also be apparent since there will be no beginning or end markers or placeholder characters. Note, for example, the "Add With Defaults" button. In FIG. 4, of course, this button 410 is all plain English text. In mock-translated FIG. 5, however, it is clear that corresponding button 510 has been mock translated, since brackets and placeholder characters are visible. Menu items 420 in FIG. 4 are similarly mock-translated as menu items 520 in FIG. 5. Note, conversely, that the "Universal" menu item 530 appears exactly as in FIG. 4 as menu item 430; this text has therefore been hard-coded, and this error can be easily spotted and repaired. Another common error, not shown here, which may be easily detected using this mock-translation technique, is the presence of labels or other text which is composed of two or more separately-translated text strings. Because many foreign languages, when translated from English, will rearrange the word order of subject, objects, and verbs, each phrase to be translated should be translated as a whole if it is to be displayed correctly in other languages. For this reason, composed text must be eliminated. Using the mock-translation techniques described herein, it is a simple matter for the software programmer or developer to spot composed text, since placeholder characters will appear between words of a single string of mock-translated text. If brackets or other indicators are used to denote the beginning and/or end of the mock-translated text, the appearance of these indicators with each separate piece of text will indicate text composed of piecemeal parts. Note that in FIGS. 4 and 5, the tilde placeholder has been replaced with a double-wide dash (--). This illustrates another innovative mock-translation technique, useful when the software is to be translated into Japanese or other languages that use double-byte character sets. The United States and other countries which use a standard ASCII character set require only a single byte to identify individual characters. Some other languages, because they are more extensive than English, use a double-byte character set for language generation. Translation of single-byte languages into a double-byte character set for foreign use involves additional concerns because it is possible that the double-byte character may be read as two single-byte characters. One specific (and notorious) example is the "5C" problem; many double-byte characters have "5C" as the second byte, but "5C" represents a backslash character (.backslash.) in a single-byte character set. Therefore, many double-byte characters may be incorrectly displayed as a different character followed by a backslash. The mock translation system provides a solution to this problem, by performing a mock translation as described above, but using double-byte characters for the brackets and placeholder characters. By using a double-byte, double-wide dash character (character 815C) as the placeholder character, double-byte translation problems will also be evident on visual inspection. The double-wide dash character itself is subject to the "5C" problem, so if the display of double-byte characters is problematic, backslash characters will be visible in the placeholder character field. Note that in FIG. 5, translated menu items 520 appear correctly with placeholder dashes, and no backslash characters are visible; this indicates that the mock-translation (for these items) was performed correctly. Further, in this embodiment, the double-byte, double-wide open and close brackets can be used as field boundaries. This process provides the advantages of the basic mock-translation system, with additional capabilities for detecting double-byte problems. Because, after mock translation, the double-byte characters will be present in the visual display of the program, any errors in the program relating to double-byte characters are immediately apparent, instead of much further down the development process, when translation of the program is usually performed. Again, the localization files remain readable to English-speakers, and now allow the software developer to easily check for internationalization problems. Referring now to FIG. 7, a flowchart of a process according to the above embodiments is shown. To test the internationalization of software which uses localization files, the mock translation system first opens each of the localization files (step 700). Each entry in the file is then mock translated; first, a number of placeholder characters is added, according to internationalization guidelines (step 710). Depending on whether the double-byte technique described above is used, the placeholder characters may be single-byte characters such as the standard tilde, or may be a character such as the double-byte, double-wide dash. Next, field-boundary characters, e.g., open and close brackets, are added to the beginning and end of the entry (step 720). Again, these characters may be either single- or double-byte characters. Finally, the translated entries are written back to the localization files (step 730). Now, when the software application is run for testing, the mock-translated text will appear in place of the original text. The preferred embodiment of this invention involves another approach to solving the double-byte problem using mock-translation techniques to replace single-byte English characters with their double-byte equivalents. Most double-byte character sets provide corresponding double-byte English characters, but these characters appear on the screen as double-wide characters, making it easy to distinguish between a single-byte English character and its double-byte equivalent. This characteristic of the double-byte character sets is exploited to reveal internationalization problems. In this embodiment, instead of using placeholder characters, the original English text is replaced with the double-byte equivalent. This produces a visible text string that is twice as wide as the original text, as shown in FIG. 6. The double-wide characteristic accounts for the extra space that is typically needed for translated text. The double-wide characters are used in lieu of the table described above and referred to in the IBM National Language Design Guide: Designing Internationalized Products (IBM 4.sup.th Ed. 1996). FIG. 6 shows another exemplary display screen 600, which corresponds to the untranslated screen in FIG. 4. Note menu items 620; these characters are displayed as double-wide and illustrate proper mock-translation according to this embodiment. Contrast this with button text 610; this text appears as standard, single-width English text. Therefore, the software developer can tell at a glance that some text (the single-width text) has not been properly translated, and whether the translated, double-width text is properly displayed. For example, if the mock-translated text includes single-width text, this is a visual indication that the single width code may have been hard-coded in the program and was not a part of the localization file. The displayed mock-translated text having double wide characters will visually indicate whether the text field for translation is properly aligned and formatted. Also, if part of the double-wide text is missing from view within the displayed text area, this may be a visual indication that not enough space was allocated on the displayed area for translated text. For further assurance that no truncation errors have occurred due to missing text that is not recognized by a viewer as missing, beginning and/or ending text indicators such as double-wide brackets or dashes can be used. If the ending indicator is not displayed, this is a visual indication that the text has been truncated. As such, a later actual translation of the text may not be completely visible on the screen. Likewise, beginning and/or ending text indicators interspersed between individual words or phrases may be a visual indication that the textual phrase improperly comprises piecemeal messages and is not a single message. In addition, double-byte dashes having a "5c" as the second byte, or any other character having a "5c" as one of the bytes, may be used as the beginning and/or ending indicator to further enable a visual inspection for double-byte processing problems. If such a problem did occur, the visual display would indicate unwanted characters in place of the dash or other beginning or ending indicator. Even without using a dash as the beginning or ending character, double-byte processing problems can also be visually determined since some of the double-byte characters contain a "5C" as one of the bytes. As such, the preferred embodiment of this invention provides a mock translation environment for testing software that will be run using double-byte character sets. Since Asian countries use double-byte character sets, the translation of English (single-byte) software into Asian languages often results in many problems specific to double-byte text. By transliterating English text into double-byte characters that look like wide English characters, problems that would occur with the translation into real Asian languages can be identified early in the development/testing process by English-speaking programmers. Some of these problems include a) build process problems; b) localization file retrieval mechanism problems; c) presentation software problems; d) display problems on the GUI or command line such as i) "hard-coded" English strings; ii) "expansion and alignment" errors; iii) missing localization files; and iv) piecemeal messages if beginning and/or ending characters are used; and e) functional and display problems that often happen when the second byte of a double-byte is mistaken for a single byte. For example, the famous "5C" problem which is often the second byte of a double-byte, yet is a backslash ".backslash." in single-byte software. With reference now to FIG. 8, a flowchart of a process according to the previous embodiment is shown. To test the internationalization of software which uses localization files, particularly those which will be translated to double-byte languages, the mock translation system first opens each of the localization files (step 800). Each entry in the file is then mock translated by converting each single-byte character to its double-byte, double-width equivalent (step 810). Finally, the translated entries are written back to the localization files (step 720). Now, when the software application is run for testing, the mock-translated, double-wide text will appear in place of the original text. The mock-translation of data in the localization files can be done in many ways. For example, many localization files are stored in a compiled message catalog format called XPG4. Often, internationalized software will rely on thousands of message catalogs, and if there is an overall change to the data stored in the message catalogs, then it is important to have an automated parser system. According to the preferred embodiment, if the message catalogs have already been compiled before the software is put through mock-translation testing, a parsing tool is provided which can decompile the message catalogs, process them, then recompile them back to the usable message catalog form. For example, in the case of XPG4-format message catalogs, at run-time the message catalogs will already have been compiled by the "gencat" program defined by X/Open. The parser will decompile the catalogs using, for example, the "dumpmsg" program available from Alfalfa Software Incorporated. The parser will then parse the decompiled file by reading each line of the file and determine whether it is a set number, a comment, a key, the only line of a message, the last line of a message, or the middle line of a message. Then, the required insertion can be made to the beginning of every first line of a message, or whichever place is necessary. After all files are processed this way, so the parser will then recompile the message catalogs by a call to the "gencat" program, and the recompiled message catalogs are ready to run with the software application. Of course, the processing of XPG4 files and the specific examples of compiler and decompiler programs are not limiting examples; this process may be performed on any number of localization file or message catalog file formats using many different software tools. In addition, although the preferred embodiment utilizes localization files, the invention can be implemented by parsing the program for any displayable text strings and replacing such strings with the corresponding mock-translation string as disclosed herein. It is important to note that while the present invention has been described in the context of a fully functional data processing system and/or network, those skilled in the art will appreciate that the mechanism of the present invention is capable of being distributed in the form of a computer usable medium of instructions in a variety of forms, and that the present invention applies equally regardless of the particular type of signal bearing medium used to actually carry out the distribution. Examples of computer usable mediums include: nonvolatile, hard-coded type mediums such as read only memories (ROMs) or erasable, electrically programmable read only memories (EEPROMs), recordable type mediums such as floppy disks, hard disk drives and CD-ROMs, and transmission type mediums such as digital and analog communication links. While the invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention.
|
Same subclass Same class Consider this |
||||||||||
