Method and system for implementing user-defined codeset conversions in a computer system6708310Abstract A method and system for performing user-defined code conversions in a computer system. A utility accepts a text file from a user program. This text file contains a series of conditional rules that define a protocol for converting character data between codesets. The utility parses this file and converts it to a binary table format that is then stored in a code conversion table database. The user program then invokes functions contained in the operating system to convert data in accordance with the stored binary table. Claims What is claimed is: Description BACKGROUND OF THE INVENTION
direction {
condition {
between 0.times.64...0.times.7f;
} operation predefined_operation
}
The ESCAPE SEQUENCE condition expression element can be used to define one or more comma-separated escape sequence designators. For example, the following equates an escape sequence to ESC $ ) C and the Shift-Out control character code, 0.times.0e: escapeseq 0.times.1b242943, 0.times.0e; The EXPRESSION elements are legion and detailed in Appendix A, which is incorporated herein by reference. An illustrative example is the following multiplication expression: 0.times.20*10 Operation Conversion Definition Elements Operation elements can be comprised of the following operation expression elements: (1) IF-ELSE operation expressions; (2) OUTPUT operation expressions; and (3) CONTROL operation expressions. An operation can be composed of any number or combination of operation expression elements. The IF-ELSE operation expression element defines a conversion rule that is dependant on the outcome of the boolean result of the IF statement. If the result is true, the task that follows the IF statement is executed. If false, the task that follows the ELSE statement is executed. IF-ELSE statements can be nested to create more complex conversion rules. The following is representative syntax that generates an error message if the remaining output buffer is less than a predefined minimum. Else, the syntax creates a rule that generates an output codeset character representation by performing a logical AND on the input codeset character and the hexadecimal value 0.times.7f:
if (remaining_output_buffer < minimum) {
error E2BIG
} else {
output = input[current_character] & 0.times.7f
}
The OUTPUT operation expression element assigns the right-hand-side of the expression to the output buffer. For example, the following would save 0.times.8080 to the output buffer: output=0.times.8080 In the preferred embodiment, the CONTROL operation expression can be used to (1) return error messages, (2) discard bytes from the input buffer pointer and move the input buffer accordingly, (3) stop the execution of the current operation, (4) execute an initialization operation and set all variables to zero; (5) execute a reset operation and set all variables to zero; (6) execute a predefined named operation, e.g., "operation IS08859-1_to_ISO8859-2; (7) execute a predefined direction, or (8) execute a predefined mapping. The syntax for CONTROL operation expression elements is shown in the attached Appendix A, which has been incorporated herein by reference. With "II" denoting a comment line, a representative syntax for conducting a reset operation is:
operation reset {
// Emit state reset sequence
output = predefined_reset_sequence
}
Mapping Conversion Definition Elements In a preferred embodiment, mappings can specify a direct code conversion mapping by using one or more map pairs. Five possible pairings are (1) HEXADECIMAL-HEXADECIMAL, (2) HEXADECIMAL RANGE-HEXADECIMAL RANGE, (3) `default`-HEXADECIMAL, (4) `default`-`no_change_copy`, and (5) HEXADECIMAL-ERROR. Respectively these mappings can be used to (1) convert a specified hexadecimal value to another hexadecimal value, (2) convert a specified range of hexadecimal values to another range of hexadecimal values, (3) convert an undefined input character to a defaulting hexadecimal value, (4) leave an undefined input character unchanged, and (5) return an error message when a particular input character is encountered. Each map element can also have comma-separated attribute elements. For example the mapping can be encoded as the following table types: dense, hash, binary search tree, index, or a automatically defined type. Illustrative syntax is:
// map with a hash mapping table, using hash factor of 10
map maptype = hash:10;
// multiple or range mapping
// convert hexadecimal range of (0.times.0 to 0.times.10) to (0.times.10 to
0.times.8f)
0.times.0...0.times.7f 0.times.10;
// convert hex value of 0.times.a to 0.times.b
0.times.a 0.times.b
// convert undefined input codeset values to 0.times.20
default 0.times.20;
// return error is input buffer contains 0.times.80
0.times.80 error;
Thus, the preferred embodiment is not limited to simple one-to-one mappings. As a result, mappings can be efficiently defined. Text File Syntax Summary A summarized version of the syntax for the code conversion definition file is provided here. Again, a more detailed version of this syntax is provided in Backus-Naur Form in Appendix A.
Conversion Definition, e.g., US-ASCII%IS08859-1
Conversion Definition Elements
Direction
Condition: Action Pairs
Condition
Between
Escape Sequence
Expression
Action
Direction
Operation
Map
Condition
Operation
Operation: Expression Elements
If-Else Operation Expression
Output Operation Expression
Control Operation Expression
Mapping
Map Attributes
index
hash
binary
dense
Map Pairs
Single Mapping
Multiple Mapping
Default Mapping
Error Mapping The use of the operational-based definition elements allows the preferred embodiment to provide for conversions not only between two single-byte codesets but also between multi-byte codes sets and bidirectional conversion between single- and multi-byte codesets. This is so because the operational elements of the code conversion definition file reduce the space needed to define the conversions. The text file 114 is also advantageous because of its ease of use: The file 114 is composed of concise declaration statements--not complicated instructions written in a program language such as C, which requires compilation before use. As such, the text file, when applied to the other inventive concepts disclosed herein, is a valuable tool for the computer industry. Complete examples of user-defined code conversion text files 114 are provided in Appendix B, which has been incorporated herein by reference. FIG. 2 Functional Overview An overview of logical steps employed in an embodiment of the present invention to produce a code conversion binary table file from a user-defined code conversion text file and to convert codeset data in accordance with that binary table file is shown in FIG. 2. This process begins by defining the code conversion definition text file 114. Once this is done, it is passed to the geniconvtbl( ) utility 122, which interprets the text file 114 and converts it to a binary table file 124. This file transformation takes place to render the text file into a format that the iconv subsystem 202, represented by the functions iconv_open 128, iconv( ) 130, and iconv_close 132, can understand. Once the binary file 124 is produced, it is stored in the database 126. The user-defined codeset conversion is now available for use. Actual conversion begins when character data in an input file 204 is transferred to the operation system by the user program. The iconv subsystem 202 recognizes that the user is requesting a conversion and instantiates the shared object 134 to retrieve the appropriate binary file 124 from the database 126. The subsystem 202 then interfaces with the shared object 134 to translate the data according to the protocol set forth in the binary file 124. The translated data is placed in an output file 206. FIG. 3 Defining the Codeset Conversion The processing steps used in a preferred embodiment of the present invention to store a table representing user-defined code conversion rules are illustrated in FIG. 3. The embodiment begins with a preprocessing step: defining the codeset conversing rules via a code conversion definition file (step 302). Next the conversion definer 118 sends the text file 114 to the geniconvtbl( ) utility 122, which then parses the file for errors (steps 304, 306). If an error is found, the utility 122 returns an error message to the invoking user program 118. Otherwise, the utility 122 converts the text file 114 to a binary file 124 (step 306). Processing is then returned to the calling program 118, which stores the binary file 124 in the database 126 (step 308). FIG. 4 Performing the Codeset Conversion The steps used in an embodiment of the present invention to convert codeset data is shown in FIG. 4. The processing begins when conversion requester 120 invokes iconv_open 128 to obtain a conversion descriptor for the desired conversion (step 402). In brief, this conversion descriptor contains pointers and data that the calling program 120 uses to later invoke iconvo 130. The conversion requester 120 invokes iconv_open 128 by passing to it the "from" and "to" codesets of the desired conversion. The function 128 then searches the database for a corresponding binary file (step 404). If no such file exists, an error message is returned to the invoking program 120. Else, the function 128 instantiates the shared object and returns control to the invoking program 120 (step 406). The conversion requester 120 next calls iconv( ) 130, which interfaces with the shared object to perform the conversion (steps 408, 410). The shared object performs the conversion by accessing the binary file stored in the database 126. Once the conversion is complete, control is again returned to the invoking program 120. The process is completed when the requester 120 calls iconv_close 132 to release the shared object and the conversion descriptor (steps 412, 414). In this manner, the preferred embodiment is able to efficiently convert data between differing codesets. The above description will enable those skilled in the art to make numerous modifications to the described embodiments without departing from the spirt and scope of the claimed invention. Indeed, the chosen embodiments were selected so others could best utilize the invention by making such modifications to tailor the invention to their particular needs. The description therefore should not be read as limiting the invention to the embodiments explained herein. For example, this disclosure has outlined several operation-based definition elements that can be incorporated into the code conversion definition text file. Using the inventive concepts disclosed herein, those skilled in the art may be able to create further such elements that, although not explicitly disclosed, are implicitly disclosed by the concepts of the invention. Another possible modification is to store the text file--as opposed to the binary file--for use in subsequent conversions. In such a scenario, the utility would recall the text file from a database when a conversion is requested. The utility would then convert the text file to a binary file so that conversion could be effectuated. Those skilled in the art will also recognize that the present invention is not limited to any particular CPU 102 or processing technology. Nor is the invention limited to any particular operating system 110. Rather, the invention could be utilized in any operating environment, such as WINDOWS 95, WINDOWS 98, UNIX, MacOS, or any JAVA runtime environment, which refers to the operating environment typified by a JAVA virtual machine and associated JAVA class libraries. JAVA is a registered trademark of SUN MICROSYSTEMS, Inc. Similarly, the inventive concepts were described in part as being contained within random access memory and a hard disk. Those skilled in the art will recognize that these concepts can also be stored and invoked from any media that can store data or have data read from it. Examples of such media include floppy disks, magnetic tapes, phase discs, carrier waves sent across a network, and various forms ROM, such as DVDs and CDs. Thus, the present invention anticipates the use of all computer readable media. Descriptive terms used in the above disclosure also do not limit the invention. For example, the term "user program" was used to describe a preferred embodiment. Those skilled in the art will realize that this term encompasses a variety of application programs and related technologies. Similarly, those skilled in the art will appreciate that the term "database" encompasses similar terms, such as file directories. The above disclosure also used the terms "operation-based conversion elements" and "conditional conversion elements" to explain inventive properties. While helpful for an understanding of the invention, these terms should not be read to limit the invention, as doing so would elevate form over substance. For these descriptive terms do not singularly define the invention. Rather, it is the properties that these terms represent, in conjunction with the entire disclosure, that are important. So long as these properties are being used, it makes little difference what terminology is used. The invention therefore is not circumvented by alterations in terminology or other modifications that read on the appended claims and their equivalents.
APPENDIX A
BACKUS-NAUR FORM FOR CODE CONVERSION DEFINITION
TEXT FILE
iconv_conversion_definition
:CONVERSION_NAME`{`definition_element_list`}`
;
definition_element_list
:definition_element`;`
.vertline.definition_element_list definition_element`;`
;
definition_element
:direction
.vertline.condition
.vertline.operation
.vertline.map
;
direction
:`direction`NAME`{`direction_unit_list`}`
.vertline.`direction``{`direction_unit_list`}`
;
direction_unit_list
:direction_unit
.vertline.direction_unit_list direction_unit
;
direction_unit
:condition action`;`
.vertline.condition NAME`;`
.vertline.NAME action`;`
.vertline.NAME NAME`;`
.vertline.`true` action`;`
.vertline.`true` NAME`;`
;
action
:direction
.vertline.map
.vertline.operation
;
condition
:`condition`NAME`{`condition list``}`
.vertline.condition_list condition_expr`;`
;
condition_list
:condition_expr`;`
.vertline.condition_list condition_expr`;`
;
condition_expr
:`between`range_list
.vertline.expr
.vertline.`escapeseq`escseq_list`;`
;
range_list
:range_pair
.vertline.range_list`,`range_pair
;
range_pair
:HEXADECIMAL`...`HEXADECIMAL
;
escseq_list
:escseq
.vertline.escseq_list`,`escseq
;
escseq
:HEXADECIMAL
;
map
:`map`NAME`{`map_list`}`
.vertline.`map``{`map_list`}`
.vertline.`map`NAME map_attribute`{`map_list`}`
.vertline.`map`map_attribute`{`map_list`}`
;
map_attribute
:map_type`,``output_byte_length``=`DECIMAL
.vertline.map_type
.vertline.`output_byte_length``=`DECIMAL`,`map_type
.vertline.`output_byte_length``=`DECIMAL
;
map_type
:`maptype``=`map_type_name:DECIMAL
.vertline.`maptype``=`map_type_name
;
map_type_name
:`automatic`
.vertline.`index`
.vertline.`hash`
.vertline.`binary`
.vertline.`dense`
;
map_list
:map_pair
.vertline.map_list map_pair
;
map_pair
:HEXADECIMAL HEXADECIMAL
.vertline.HEXADECIMAL`...`HEXADECIMAL
.vertline.`default`HEXADECIMAL
.vertline.`default``no_change_copy`
.vertline.HEXADECIMAL`error`
;
operation
:`operation`NAME`{`op_list`}`
.vertline.`operation``{`op_list`}`
.vertline.`operation``init``{`op_list`}`
.vertline.`operation``reset``{`op_list`}`
;
op_list
:op_unit
.vertline.op_list op_unit
;
op_unit
:`;`
.vertline.expr`;`
.vertline.`error``;`
.vertline.`error`expr`;`
.vertline.`discard``;`
.vertline.`discard`expr`;`
.vertline.`output``=`expr`;`
.vertline.`direction`NAME`;`
.vertline.`operation`NAME`;`
.vertline.`operation``init``;`
.vertline.`operation``reset``;`
.vertline.`map`NAME`;`
.vertline.`map`NAME expr`;`
.vertline.op_if_else`;`
.vertline.`return``;`
.vertline.`printch`expr`;`
.vertline.`printhd`expr`;`
.vertline.`printint`expr`;`
;
op_if_else
:`if``(`expr`)``{`op_list`}`
.vertline.`if``(`expr`)``{`op_list`}``else`op_if_else
.vertline.`if``(`expr`)``{`op_list`}``else``{`op_list`}`
;
expr
:`(`expr`)`
.vertline.NAME
.vertline.HEXADECIMAL
.vertline.DECIMAL
.vertline.`input``[`expr`]`
.vertline.`outputsize`
.vertline.`inputsize`
.vertline.`true`
.vertline.`false`
.vertline.`input```==`expr
.vertline.`expr```==``input`
.vertline.`!`expr // false expression
.vertline.`.about.`expr // bitwise complement expression
.vertline.`-`expr // unary minus expression
.vertline.expr`+`expr
.vertline.expr`-`expr
.vertline.expr`*`expr // multiplication expression
.vertline.expr`/`expr
.vertline.expr`%`expr // remainder expression
.vertline.expr`<<`expr // left-shift expression
.vertline.expr`>>`expr // right-shift expression
.vertline.expr`.vertline.`expr // bitwise OR expression
.vertline.expr` `expr // exclusive OR expression
.vertline.expr`&`expr // bitwise AND expression
.vertline.expr`==`expr // equal-to expression
.vertline.expr`!=`expr // inequality expression
.vertline.expr`>`expr
.vertline.expr`>=`expr
.vertline.expr`<`expr
.vertline.expr`<=`expr
.vertline.NAME`=`expr
.vertline.expr`.vertline..vertline.`expr // logical OR
.vertline.expr`&&`expr // logical AND
;
APPENDIX B
CODE CONVERSION DEFINITION TEXT FILE EXAMPLES
Example 1: Code Conversion from ISO 8859-1 to ISO 646
// lconv code conversion from ISO 8859-1 to ISO 646
ISO8859-1% ISO646
{
// Use dense-encoded internal data structure
map maptype = dense
{
default 0.times.3f;
0.times.0...0.times.7f 0.times.0;
};
}
Example 2: Code Conversion from eucJP to ISO-2022-JP
// lconv code conversion from eucJP to ISO-2022-JP
# include <sys/errno.h>
euJP%ISO-2022-JP {
operation init {
// set codesubset to ASCII
codesubset=1
};
operation reset {
If (codesubset != 0) {
// Emit state reset sequence, ESC (J for ISO-2202-JP)
output = 0.times.1b284a;
}
operation init;
};
direction {
condition { //JIS .times. 0201 Latin (ASCII)
between 0.times.00...0.times.7f;
}operation {
if (codesubset != 0) {
// we will emit four bytes
If (outputsize <=3) {
error E2BIG
}
// Emit state reset sequence, ESC (J.
output = 0.times.1b284a
// set codesubset to ASCII
codesubset = 0;
} else {
if outputsize <=0)
error E2BIG
}
}
output = input [0];
// Move input buffer pointer one byte.
discard;
};
condition { // JIS .times. 0208
between 0.times.a1a1...0.times.fefe;
}operation {
if (codesubset != 1) {
if (outputsize <=4) {
error E2BIG
}
// Emit JIS .times. 0208 sequence, ESC $ B.
output = 0.times.1b2442
codesubset = 1
} else {
if outputsize <=1)
error E2BIG
}
}
output = input [0] & 0.times.7f;
output = input [1] & 0.times.7f;
// Move input buffer pointer two bytes.
discard 2;
};
condition { // JIS .times. 0201 Kana
between 0.times.8ea1...0.times.8edf;
}operation {
if (codesubset != 2) {
if (outputsize <=3) {
error E2BIG
}
// Emit JIS .times. 0201 Kana sequence, ESC (I.
output = 0.times.1b2849
codesubset = 2
} else {
if outputsize <=0)
error E2BIG
}
}
output = input [1] & 127;
// Move input buffer pointer two bytes.
discard 2;
};
condition { //JIS .times. 0212
between 0.times.8fa1a1...0.times.8ffefe;
} operation {
if (codesubset != 3) {
if (outputsize <=5) {
error E2BIG
}
// Emit JIS .times. 0212 sequence, ESC $ (D.
output = 0.times.1b242844
codesubset = 3
|
Same subclass Same class Consider this |
||||||||||
