Text

Method for extracting knowledge from online documentation and creating a glossary, index, help database or the like

6212494

Abstract

A method involving computer-mediated linguistic analysis of online technical documentation to extract and catalog from the documentation knowledge essential to, for example, creating a online help database useful in providing online assistance to users in performing a task. The method comprises stripping markup tags from the documentation, linguistically analyzing and annotating the text, including the steps of morphologically and lexically analyzing the text, disambiguating between possible parts-of-speech for each word, and syntactically analyzing and labeling each word. The method further comprises the steps of combining the linguistically analyzed, annotated, and labeled text and previously stripped markup information into a merged file, mining the merged file for domain knowledge, including the steps of identifying and creating a list of technical terminology, mining the merged file for manifestations of domain primitives and maintaining a list of manifestations of such domain primitives in an observations file, analyzing the discourse context of each sentence or phrase in the merged file, analyzing the frequency of manifestations of domain primitives in the observations file to determine those that are important, expanding the list of key terms by searching for terms sanctioned by a domain primitive deemed important in the previous step, and searching the merged file for larger relations by searching for particular lexico-syntactic patterns involving key terms and manifestations of domain primitives previously identified. The method further comprises the steps of structuring the knowledge thus mined and building a domain catalog.


Claims

I claim:

1. A machine-readable medium having stored thereon sequences of instructions, which when executed by a processor cause the processor to:

a) linguistically analyze and annotate text of online documentation to create a linguistically analyzed and annotated text;

b) mine said linguistically analyzed and annotated text for text representative of said online documentation, including:

i) searching for syntactic patterns indicative of key terms and maintaining a list of said key terms,

ii) searching for syntactic patterns indicative of manifestations of a domain primitive involving one of said key terms and maintaining a list of said manifestations, and

iii) analyzing said list of said manifestations to determine said manifestations that are representative of said online documentation on the basis of frequency of their occurrence; and

c) combining said list of said key terms and said list of said manifestations that are representative of said online documentation in a domain catalog.

2. A machine-readable medium having stored thereon sequences of instructions, which when executed by a processor cause the processor to:

(a) linguistically analyze and annotate a source code document in a computer system;

(b) mine the linguistically analyzed and annotated source code for code representative of a library of variables and procedure calls, wherein mining further comprise sequences of instructions that cause the processor to:

(i) search for syntactic patterns indicative of key terms comprising variables and procedure calls, and maintaining a list of the key terms,

(ii) search for syntactic patterns indicative of manifestations of a primitive involving one of the key terms and maintaining a list of the manifestations, and

(iii) analyze the list of the manifestations to determine the manifestations that are representative of the variables and procedure calls on the basis of frequency of occurrence;

(c) combine the list of the key terms and the list of the manifestations that are representative of the variables and procedure calls in a library file; and

(d) use the library file to create the library of variables and procedure calls.

3. A machine-readable medium having stored thereon sequences of instructions, which when executed by a processor cause the processor to:

(a) linguistically analyze and annotate text of an online technical document to create a linguistically analyzed and annotated text, further comprising sequences of instructions that cause the processor to:

(i) lexically and morphologically analyze the text, and

(ii) disambiguate between possible parts-of-speech for each word of the text and identify a syntactic function for each word of the text;

(b) mine the linguistically analyzed and annotated text for text representative of the online technical document, further comprising sequences of instructions that cause the processor to:

(i) search for syntactic patterns indicative of technical terms and maintain a list of the technical terms and frequency with which the technical terms occur in the text, and

(ii) search for syntactic patterns indicative of manifestations of a domain primitive involving one of said technical terms and maintain a list of the manifestations,

(iii) analyze the list of the manifestations to determine the manifestations that are representative of the online technical documentation on the basis of frequency of occurrence,

(iv) expand the list of technical terms by searching the list of the manifestations for syntactic patterns involving additional term not presently in said list of said technical terms that are adjunct to the manifestations,

(v) search the online technical documentation for lexico-syntactic patterns indicative of larger relations involving the technical terms, the additional terms and the manifestations; and

(c) combine the technical terms, the additional terms, the manifestations, and the larger relations in a domain catalog.

4. A machine-readable medium having stored thereon sequences of instructions, which when executed by a processor cause the processor to:

(a) translate an American Standard Code for Information Interchange (ASCII) data file having stored therein an online technical document and information regarding a proprietary internal representation for the online technical document to a second ASCI data file having stored therein the online technical document and information regarding standard internal representation of the online technical document;

(b) separate text of the online technical document from the information regarding the standard internal representation for the online document;

(c) linguistically analyze and annotate the text of the online technical document to create a linguistically analyzed and annotated text, further comprising sequences of instructions that cause the processor to:

(i) lexically and morphologically analyze the text to determine possible lexical and morphological features, and

(ii) disambiguate between possible parts-of-speech for each word of the text and identify a syntactic function for each word of the text,

(d) label each word of the text and the annotations;

(e) combine the linguistically analyzed and annotated text with the information regarding the standard internal representation for the online document into a merged file;

(f) mine the merged file for text representative of the online technical document further comprising sequences of instructions that cause the processor to:

(i) identify key terms further comprising sequences of instructions that cause the processor to:

(A) search for syntactic patterns indicative of technical terms, and

(B) maintaining a list of the key terms comprising the technical terms and the frequency with which the technical terms occur,

(ii) search for syntactic patterns indicative of manifestations of a domain primitive involving one of the key terms and maintaining a list of the manifestations and the frequency with which the manifestations occur,

(iii) discourse analyze each phrase in the merged file to resolve conjunctions and antecedent basis of pronouns to more accurately record the manifestations and the frequency with which the manifestations occur,

(iv) analyze the list of the manifestations to determine the manifestation that are representative of the online document on the basis of the frequency with which the manifestations occur,

(v) expand the list of key terms by searching the list of the manifestations for syntactic patterns involving additional terms not presently in the list of key terms that are adjunct to one of the manifestations, and

(vi) search the online technical document for lexico-syntactic patterns indicative of larger relations involving one of the key terms and one of the manifestations and maintain a list of the larger relations; and

(g) combine the key terms, the manifestations, and the larger relations in a domain catalog.

5. The machine-readable medium of claim 4 wherein said frequency with which said technical terms occur in said merged file is weighted according to said information regarding said standard internal representation for said online document.

6. The machine-readable medium of claim 4 wherein analyzing said list of manifestations to determine said manifestations that are representative of said online document on the basis of said frequency with which said manifestations occur is weighted according to said information regarding said standard internal representation for said online document.

7. The machine-readable medium of claim 4 wherein the sequences of instructions that cause the processor to combine said key terms, said manifestations, and said larger relations in a domain catalog further comprises sequences of instructions that cause the processor to:

a) cluster said key terms for similarity of use on the basis of repeated manifestations of said manifestations of a domain primitive occurring in a similar syntactic context;

b) cluster said key terms on the basis of proximity of said key terms in said merged file with respect to other said key terms; and

c) index said domain catalog by said key terms such that each of said key terms is an index therein, and incorporate said manifestations and said larger relations according to said indexed key terms.


Description

BACKGROUND OF THE INVENTION

The present invention relates to natural language processing of textual information in a data processing system. Specifically, the invention relates to a process comprising computer-mediated linguistic analysis of online technical documentation and extraction of representative text from the documentation to acquire knowledge essential to, for example, providing assistance to users in performing a task.

Reference books, user guides, instructional manuals, and similar types of technical documentation have long been a main source of background information (as opposed to foreground information, e.g., as found in newspapers) useful to individuals in developing the knowledge necessary to perform some task such as operating an apparatus or item of equipment, for example, a digital computer. The primary purpose of this genre of text is to assist a user of the apparatus to which the material is applicable in operating the apparatus.

More recently, with the proliferation of digital computers in all facets of modern society, and, more specifically, with the advent of desktop computers in the home and the workplace, such assistance has usually taken the form of an online help facility, that is, information useful in assisting the user in performing some task is made available at the user display device of the desktop computer by means of electronic retrieval. This type of assistance is commonly referred to as online assistance or online help. The text of the information may be stored locally in a database file (which may also be referred to as an online help database, or simply, help database) in electronic media on a memory storage device such as a hard disk drive or optical drive coupled to the desktop computer. Alternatively, the text of the information may be stored in a file on a memory storage device coupled to a server which the desktop computer accesses by way of a data network to which the desktop computer, participating as a client in the data network, may be coupled. In either case, the information may be retrieved from the memory storage device and displayed on the user display device as directed by commands input by the user from an input device such as a keyboard, mouse, pen device, etc. In a desktop computing environment, some form of online assistance is provided, usually with respect to some aspect of operating the desktop computer or performing a specific task involving an application program, e.g., a wordprocessor or spreadsheet application.

In the context of online assistance, early versions of assistance generally provide information regarding what tasks or functions can be accomplished with the tools and commands of a computer operating system or software application, and/or what is the proper syntax or procedure for invoking such a command. For example, an early form of online assistance termed Balloon Help (in which explanatory text is displayed in a small pop-up window shaped like the balloons used for dialog in comic strips) is provided on Apple Macintosh computers operating under version System 7 and later versions of the Apple Macintosh Operating System. Using Balloon Help, a user of an Apple Macintosh computer can determine the function of potentially any command, symbol, window, icon, or object visible on the user display device, i.e., the screen of the Apple Macintosh computer. When a user enables this form of online assistance, short, descriptive text messages appear on the screen describing the function performed by a particular command, symbol, or object whenever the user places the cursor on the command, symbol, or object in question.

More recent versions of online assistance provide a more comprehensive form of online assistance that not only provides assistance regarding functions of objects, but also what tasks can be accomplished with these objects, as well as how to accomplish the tasks. For example, with reference to FIG. 11 a novel metaphor of online assistance termed Apple Guide is provided on Apple Macintosh computers operating under version System 7.5 and later versions of the Apple Macintosh Operating System. Apple Guide provides online interactive instructions in response to user questions. An answer is provided to a user inquiry by leading the user through a series of interactive windows to a window or sequence of panels that contains explanatory text. An online help database behind the Apple Guide user interface provides the explanatory (coaching) text. Referring to FIG. 1, the user may begin the navigation through a series of windows upon selecting assistance by topic 102, index 103 or "look for" 104 (where an attempt is made to map a free form user query onto an appropriate answer script from the help database) from an access window 101 (here, the Full Access window as displayed by Macintosh Guide). Using Apple Guide, users of an Apple Macintosh computer are able to obtain online assistance in different forms, including task-oriented procedures on a software application's features, tutorials, advanced features for sophisticated users, and reference material of the type found on quick reference cards.

In early versions of online assistance such as the Balloon Help previously described, the process of determining the content of the database file (herein before and after referred to as the help database) in which is stored the text of information that may be retrieved by online assistance is relatively straightforward. Essentially, the content of the help database is governed by the commands that appear on the user display device or that can be invoked by the user from a user input device. It should be noted that the term command is used here to encompass any object through which a user can control the system or application software running on the digital computer, including, for example, a window, icon, symbol, or text string. The creator, or "author" of the help database simply catalogs each command and provides a short description of its function, or the appropriate syntax for invoking the command, thereby providing a complete enumeration of commands arranged systematically with descriptive details.

In the more recent versions of online assistance, the process of determining the content of the online help database is an arduous, time consuming, and iterative task, typically involving a team of instructional designers. Whereas in earlier versions of online assistance, the author simply cataloged all possible commands and the like, in more recent versions of online assistance, the instructional designers or persons acting in that capacity are not provided with such finite boundaries regarding what information is important and, thus, should be included in the help database. Providing online assistance to questions such as, "how do I do this task?" involves more than just cataloging and describing the functionality of every possible command. The designers need to determine, for example, what task-oriented procedures, what tutorials, what advanced features, and what reference material should be included. This process is one of introspection by the instructional designers. Decisions are made typically on the basis of accumulated experience and intuition acquired primarily by trial and error. One way to proceed is to first determine the key terms in the application domain (which may be composed of one or more words, i.e., which may be phrasal units), the properties thereof, and the relations (i.e., actions) that can be performed on or with the objects defined by the key terms. For example, with reference to FIG. 1, the instructional design team may determine that the term "disk" shown highlighted at 105 in window 101 is important, and thus, should be a key term included in the help database. They may further determine that actions involving the disk such as preparing, ejecting, erasing (displayed in the right half of window 101 at 106) are sufficiently important to include and relate to the key term disk in the help database. Key terms, as well as relations and properties involving those key terms essentially define the domain, i.e., the topic or application, for which online assistance is being developed. These key terms, relations and properties may be cataloged and then expanded upon in creating the help database. A domain catalog (i.e., a catalog comprising key terms, properties thereof, and relations involving those key terms, which essentially define an application domain) from which the help database is created also provides the basis for a suitable index, list of subtopics, or other means by which a user can initiate an inquiry into the help database. This process of determining the content and index to the help database comprises a substantial, nontrivial component of the design and delivery of online assistance for user tasks. It should be noted that determining the content of the help database essentially comprises the steps of 1) determining the core of key terms, relations and properties involving the key terms, e.g., "disk", "ejecting a disk", and "name of disk", and 2) writing definitions for key terms and their relations, e.g., defining "disk" and describing the sequence for "ejecting a disk". As will be seen, it is the first step of the process of determining the content of the help database to which the present invention is directed.

The same difficulty in determining the content of a online help database to be accessed by an online user assistance facility occurs in other contexts as well. For example, in the publishing industry, determining the content of the index or glossary to a reference manual, textbook, or instructional guide involves the same arduous process of determining the key terms, relations, i.e., actions, and properties which are considered sufficiently important to place into the index or glossary.

In a computing environment, for example, the desktop computer environment referred to earlier, the same difficulty arises when providing online delivery of technical documentation, that is, online access to an electronic copy of the technical documentation itself, not a help database derived therefrom. To provide this feature, a facility must exist for mapping a user query onto the appropriate position in the text in the online documentation. This necessitates, in the very least, the creation of an index or catalog of the type discussed above that additionally possesses a mapping or linking of the key terms, relations and properties to the location, e.g., the chapter or section number, page number, paragraph, and potentially, the line number, in the online text document at which they occur.

In a programming environment where it is desired to exchange information or otherwise communicate in some manner between separate software programs or routines, e.g., a mail program and a calendar program, elicitation of the type and format of information operated on and derivation of the basic processes each application is capable of executing is necessary to develop a set of procedures for successful interapplication or interprocess communication. Here, too, software engineers must determine the key terms, relations and properties of each application in order to design appropriate software procedures for successful communication therebetween.

Finally, although this discussion is not intended to set forth an exhaustive list of the environments in which it is necessary to boil down the technical information to its key terms and relations, another environment to which the same process applies is that of information management involving a digital computer, e.g., a desktop computer. For example, a user has access to a file containing a short technical document. The filename or title associated with the file in which the document is stored may not readily convey its content. Furthermore, the content of the document may not be readily discernible without fully reading the document. A content stamp of the document, on the other hand, contains key terms, relations and properties such that it is clear what the document is generally about, without having to read it to determine its content. By content stamping documents then, one is able to more accurately and efficiently manage information accessible from the desktop, whether the documents reside, for example, on a local hard disk or a hard disk of a server accessible via a data network. However, creating a content stamp requires reading a document to pull out the key information which comprises the stamp.

From the foregoing discussion, it can be seen that it is desirable to develop a method of extracting pertinent information from technical documentation which does not require or rely on the discretion of, for example, a team of instructional designers, and which facilitates the creation of a domain catalog containing the information, i.e., the key terms, properties thereof, and relations (activities related to or involving key terms) of the domain. It is further apparent that this desire for another method of extracting and cataloging pertinent information from technical documentation exists regardless of how this cataloged information is put to use, whether it be to fashion the content of a help database for online user assistance, to create an index or glossary for a reference manual, textbook, or instructional guide, or some other use, including, for example, those uses discussed above.

As will be seen, given online technical documentation, the present invention overcomes the above mentioned difficulty in creating the domain catalog from which, for example, the content of a help database underlying an online assistance tool may be determined and generated.

SUMMARY OF THE INVENTION

Described herein is a method involving computer-mediated linguistic analysis of online technical documentation to extract and catalog from the documentation knowledge essential to, for example, creating a online help database useful in providing online assistance to users in performing a task.

An embodiment of the method for creating a catalog comprising key terms, properties thereof, relations involving those key terms for a given topic, i.e., for a given domain, comprises 1) translating an ASCII data file of online technical documentation having a proprietary internal representation for document structure to a standard internal representation for document structure, for example, a standard internal representation generally conforming with the standard general markup language (SGML), 2) generating a stream of straight ASCII text free of markup information by stripping and saving markup tags for later processing, 3) linguistically analyzing and annotating the ASCII text, including the steps of: a) lexically and morphologically analyzing each word of the text to determine its possible lexical and morphological features, b) disambiguating between two or more possible parts of speech that each word may take on within the context of the sentence or phrase in which the word appears, and syntactically analyzing and labeling each word of the text, 4) explicitly labeling the text and linguistic annotations of each word as such to facilitate subsequent mining, 5) combining the linguistically analyzed, annotated, and labeled text and the markup tags stripped in step 2 into a merged file, 6) mining the merged file for knowledge, including the steps of: a) identifying and creating a list of technical terminology (primarily multi-token key terms) and the frequency with which each key term occurs by searching for particular syntactic patterns or sequences, b) mining the merged file for manifestations of domain primitives, i.e., looking for terms, relations and properties that are syntactically related to key terms by searching for particular syntactic patterns and maintaining a list of such manifestations of domain primitives in an observations file, c) analyzing the discourse context of each sentence or phrase in the merged file to more accurately record in the observations file the list of manifestations of domain primitives and avoid incorrect analysis of linguistic observations, d) analyzing the frequency of manifestations of domain primitives in the observations file to determine those that are important and those that are not, e) expanding the list of key terms by searching for terms related to a domain primitive deemed important in the previous step by correlating particular syntactic patterns, and f) given the key terms and manifestations of domain primitives already identified in the previous steps, searching the merged file for larger relations by searching for particular lexico-syntactic patterns involving key terms and manifestations of domain primitives previously identified, and 7) structuring the knowledge thus mined, including the steps of: a) clustering key terms for similarity of use on the basis of repeated manifestations of domain primitives occurring in identical or at least similar syntactic contexts, b) clustering key terms on the basis of proximity in terms of their relative position in the text, and c) building the domain catalog by incorporating, for each key term, those observations which are deemed to be important on the basis of frequency of occurrence in the observations file.

By performing linguistic analysis upon online documentation, it is an object of the present invention to facilitate the arduous process of determining and generating the content of a help database useful in delivering online assistance or reference by automatically creating a domain catalog, that is, a list of what information is important to include in the help database, as indicated by the set of key terms, the relations they participate in, and the properties they display in the domain catalog.

Although an embodiment of the present invention, as it is set forth herein below, is described primarily with reference to, or in the context of determining the content of a help database that drives an online assistance tool such as Apple Guide, it should be noted that this context merely provides an illustration of an environment in which the method of the present invention may be applied.

Another object of the present invention is facilitate the process of developing a set of interprocess communication procedures. By applying the linguistic analyses and natural language technologies according to the method of the present invention to the task of determining the set of procedure calls, function calls, subroutines, data structures, variables, arguments, and other components of software routines or applications, it is possible to derive a software library containing a core set of data elements and procedures for exchanging information including such data elements which may be used to develop interapplication or interprocess communication between software routines. In facilitating the interprocess communication software development process, the present invention is able to linguistically analyze and mine for knowledge (i.e., variables, procedure calls, software routines, etc.) the source code of a software application in the same way it would linguistically analyze and mine for pertinent information in an online technical document. The aforementioned and further objects, features and advantages of the present invention will be apparent from the description and figures which follow.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention is illustrated by way of example and not limitation in the following figures:

FIG. 1 is an example of a online help facility of the kind to which an embodiment of the present invention may be applied.

FIG. 2 is a block diagram of an embodiment of a digital computer system of the present invention.

FIG. 3 is a flow diagram of an embodiment of a method of the present invention.

FIG. 4 is an embodiment of a method step of the present invention.

FIG. 5 is an embodiment of a data structure of the present invention.

FIG. 6 is a flow diagram of an embodiment of a method step of the present invention.

FIG. 7 is a representative page of online technical documentation used by an embodiment of the present invention.

FIG. 8 is a diagram of a syntactic pattern and data structure of an embodiment of a method step of the present invention.

FIG. 9 is an embodiment of a method step of the present invention.

FIG. 10 is a diagram of syntactic patterns and data structure of an embodiment of a method step of the present invention.

FIG. 11 is an embodiment of a method step of the present invention.

FIG. 12 is an embodiment of a method step of the present invention.

FIG. 13 is a diagram of syntactic patterns and data structure of an embodiment of a method step of the present invention.

FIG. 14 is a diagram of a data structure of an embodiment of the present invention.

Reference numerals in all of the accompanying drawings typically are in the form "figure number" followed by two digits, xx; for example, reference numerals on FIG. 1 may be numbered 1xx; on FIG. 2, reference numerals may be numbered 2xx. In certain cases, a reference numeral may be introduced on one drawing and the same reference numeral may be utilized on other drawings to refer to the same item.

DETAILED DESCRIPTION OF THE PRESENT INVENTION

Overview

The present invention describes a method involving computer-mediated linguistic analysis of online technical documentation for automatically generating a catalog of pertinent information defining, in a concise formal structure, the domain, i.e., the topic or application about which the online documentation provides detailed background information. In the following description, numerous specific details are set forth describing specific representations of data, specific processing steps, etc., in order to provide a thorough understanding of the present invention. However, it will be apparent to one of ordinary skill in the art to which the present invention pertains, or with which it is most nearly connected, that the present invention may be practiced without the specific details disclosed herein. In other instances, well known systems or processes have not been shown in detail in order not to unnecessarily obscure the present invention.

The present description includes material protected by copyrights, such as illustrations of graphical user interface images or text which the assignee of the present invention owns. The assignee hereby reserves its rights, including copyright, in these materials, and each such material should be regarded as bearing the following notice: Copyright Apple Computer, Inc. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office file or records, but otherwise reserves all copyrights whatsoever.

Referring now to FIG. 2, certain aspects of embodiments of the present invention may be performed by a data processing system, such as a digital computer system. FIG. 2 illustrates such a system. In particular, computer system 200 includes a processor 201 and a memory 202 which are interconnected by a system bus 203. A display controller 204 and a display device 205 such as a CRT, liquid crystal or plasma display device, are coupled to processor 201 via system bus 203. A mass storage device 206, depicted here as a local device which may be a hard disk that stores information in magnetic media or optical media, is coupled to processor 201 and memory 202 through system bus 203. Typically, a computer system includes input and output devices in addition to a display device. For example, an output device may be a hard copy printer. Numerous input devices are also well known such as keyboards, mice, trackballs, touchpads, and pens. These input devices communicate with processor 201 and memory 202 via a controller such as input/output (I/O) controller 207. Furthermore, computer system 200 may be linked to other computer systems in a data network via network interface controller 208 in order to transmit and receive information with other computer systems in the data network, for example, in a client/server computing environment.

In a typical embodiment, textual documents such as reference guides, user guides, scientific or technical documents, instructional manuals, etc., may be stored on mass storage device 206 of computer system 200 or another computer system in a data network to which computer system 200, participating as a client in the data network, may be coupled via network interface controller 208. This potentially rich source of online background information may be applied directly or indirectly to any one of a number of uses, including, for example, assisting a user in performing some task by allowing the user to access and display the information on display device 205 as directed by commands input by the user from an input device coupled to I/O controller 207. The documentation might provide the basis for creation of an online help database, thereby providing users with the flexibility to access, and query, the database for assistance.

As was discussed previously, to the extent online assistance is provided, it is generally with respect to the operation of some aspect of computer system 200 or application software installed thereon, e.g., a wordprocessor or spreadsheet application. In such a situation, the textual documents are comprised of computer operating system or application software reference material, e.g., a reference manual or user guide. However, it may be readily envisioned that online textual documentation may be wholly unrelated to the operation of the computer system of application software installed thereon. The technical documentation could provide information on conceivably any science or technology.

In a desktop publishing environment, technical documentation may be authored and formatted on computer system 200, in which case, an embodiment of the present invention may be utilized, for example, to create the index and/or glossary of terms at the back of the technical book.

Furthermore, the creation of a domain catalog, i.e., a formal structure of the key terms, relations and properties defining the domain or topic to which the documentation is directed, facilitates exploitation of the technical information by user assistance tasks which may somehow rely upon or require such a formal structure. For example, given a technical document available online, a corresponding domain catalog can be used to identify the sections in the document which may be appropriate, i.e., provide information pertaining to the answer of a free format user query. The domain catalog may further provide a direct mapping of the technical vocabulary in the catalog as well as an index to an online help database.

The major difficulty in deriving the content of the domain catalog from the technical documentation is the process of determining exactly what information is sufficiently representative of the technical documentation to comprise the essence, or gist, of the technical document and thus, be included in the catalog. Once this determination is made, the rest of the process of creating, for example, a online help database, is relatively straightforward. An instructional design team may, for each term in the domain catalog, create a definition, check the validity of automatically generated cross-references between related terms, possibly add more cross references, and create descriptions for the actions identified as being related to the terms.

It is the first step of the above process, that of constructing the domain catalog from which the information contained therein may be expanded upon or put to any number of uses, to which an embodiment of the present invention is directed. In bringing a computer-mediated process involving linguistic analysis to bear on the task of constructing a domain catalog, the process is thereby automated. By carrying out appropriate processing of text documenting a given application, a set of key terms and their properties, and actions involving those key terms may be identified. This information may serve as a prompting device, a kind of "crib sheet" to indicate to, for example, the author of a help database the span of topics that will need to be covered for effective online user assistance.

The focus here is on technical documentation because it lends itself well to the natural language technologies employed by the process involving linguistic analysis described herein. Moreover, as an embodiment of the present invention described herein is implemented in the form of a software application executed by a computer system such as computer system 200, the technical documentation necessarily resides online, that is, it may be retrieved from, for example, mass storage device 206 and provided as input to the software application.

Technical documents represent a well defined genre of text, sharing common features of style, form, content and presentation. As will be seen, acknowledging and accounting for such expository features found in such documentation allows for certain types of linguistic analysis to be applied in a particular way to map the text of a document onto a concise, formal structure of linguistic objects representative of the key terms and their properties, as well as the relations (i.e., actions) between them, found in the domain to which the documentation is directed.

The primary purpose of linguistic processing of an online technical document is to identify and extract important, or key, words or phrases, collectively referred to as key terms, in the text of the document. In an embodiment of the present invention, key terms deemed sufficiently representative of the text are incorporated into what is referred to herein as a lexical network corresponding to the essence of the document. A structure may then be imposed upon the lexical network such that it may be viewed as a conceptual map of the functionality of the domain, i.e., the topic or application, described in the original technical document.

Prior to describing various aspects of the present invention in detail, it should be understood that the present invention is constructed of a cascade of individual linguistic processing modules, so constructed with the goal of extracting from online, background technical documentation lexical knowledge which defines a conceptual map of a domain, as well as represents in a normalized form the functionality of an associated application. In other words, the linguistic processing modules, each of which individually perform a specific task (that in many cases is an analysis or refinement of the output recorded in a previous step of the cascade), collectively determine the key terms, their properties, and the actions related thereto.

Moreover, the present invention involves a number of stages over which the linguistic processing modules span. With reference to FIG. 3, the stages may be loosely grouped into a linguistic analysis stage 300 followed by a knowledge mining stage 330. The linguistic analysis stage is generally concerned with identifying the parts of speech that make up each sentence or phrase in the online technical documentation, as well as the syntactic function that each word in the sentence is performing, a nontrivial task because of the context sensitive, inherently ambiguous nature of natural language. The knowledge mining stage generally determines what information should be extracted and stored in the domain catalog. By removing, to the extent possible, ambiguity of language in the text during the linguistic analysis stage, the knowledge mining stage has a linguistically rich base from which it can determine what information is to be extracted will relatively less difficulty.

It will be apparent from the following discussion of the present invention that, given online technical documentation, a domain catalog is constructed not by developing an understanding of what knowledge is contained in the document. Understanding the content of the technical documentation would require, inter alia, a level of robust language analysis beyond the power of natural language processing systems today, as well as the kinds of interpretation and reasoning systems still under development by those skilled in the art. Rather, the added value of the natural language technologies embodied in the present invention derives from processing the textual component of online technical documentation to the extent that lexically, syntactically, and otherwise structurally prominent characteristics of the documentation are able to be identified, extracted, and then incorporated by any of the aforesaid uses to which the present invention may be directed. (Importantly, as will be seen, the techniques employed by an embodiment of the present invention detect and exploit a correspondence between semantically important fragments in the domain and the way in which those fragments manifest themselves linguistically, e.g., as a particular syntactic structure with particular discourse properties.) Thus, most of the techniques and technologies described below depend on the availability of online technical documentation in text form. In what follows, an instance of such a piece of text will be simply referred to as a document.

Technological Background and Nomenclature

The following natural language technologies implemented in an embodiment of the present invention include the following linguistic capabilities: lexical access, morphological analysis, part-of-speech disambiguation analysis and syntactic function identification.

Lexical Access

Lexical access to a substantial core lexicon of English is necessary. A lexicon is a data structure containing a list of base forms of words for a given language, and inflections and derivations thereof. The lexicon should provide syntactic annotations for words (i.e., annotations regarding the way in which a word occurs in a phrase or sentence), at lease for part-of-speech, for example:

display: Noun, Verb

use: Noun, Verb

Furthermore, the lexicon should provide subcategorization frames (i.e., knowledge about the syntactic environments in which a word can validly occur), for example:

display: Noun+Prep [of]; Verb+NP

use: Noun+Prep [of, for]; Verb+NP

Additionally, the lexicon may be augmented with a robust part-of-speech guesser for those words outside the core lexical coverage. For example, even though there might be no explicit listing for "chooser" or "iconization", based on their endings, a reliable guess could be made that these are nouns.

Morphological Processor

A morphological processor looks at the formation of each word in the document and attempts to perform a mapping of an inflected word (i.e., a word modified from its base form to mark such distinctions as those of case, gender, number, tense, person, mood or voice) to its base form, for example:

                           uses -> use
                         ground -> ground
                         ground -> grind


A morphological processor further looks at the formation of each word in the document and attempts to perform a mapping of a derivational word (i.e., a word modified from its base form as by the addition of a noninflectional affix) to its base form, for example:

                         reinitialize -> initialize
                       reinitializing -> initialize


The ability to perform morphological processing enables the present invention to, for example, derive:

[initialize] [disk]

from "reinitializing the disk". The morphological processor further returns the possible part-of-speech tags for a word, for example:

                     uses -> use+[NounPlural]
                     uses -> use+[VerbPast]
                   ground -> ground+[Verb]
                   ground -> ground+[NounSingular]
                   ground -> grind+[VerbPast]


Part of Speech Tagger (For Lexical Disambiguation)

The primary function of such a component is to disambiguate among sets of parts-of-speech annotations, i.e., syntactic tags. For example, while every content word in the phrase, "to display files, view by size" would be lexically analyzed and marked both as a noun and a verb, local syntactic context is sufficient to disambiguate between the individual parts-of-speech:

to/[Inf]

display/[Verb]

files/[NounPlural]

view/[Verb]

by/[Prep]

size/[Noun]

Shallow Syntactic Analyzer

Syntactic analysis (parsing) is the process of resolving a sentence into component parts of speech, describing them grammatically, and identifying structural relationships between words and phrases in a sentence. For example, a noun phrase might be made up from a determiner followed by a noun; a verb phrase might be identified as a verb optionally followed by a noun phrase, and a possible sentence structure might be a noun phrase functioning as the subject, followed by a verb phrase, wherein the verb is the main verb of the sentence, and the noun phrase within the verb phrase is the object.

Presently, full syntactic analysis over real instances of text is not feasible due to a number of reasons, including the complexity of the parsing process, the high degree of lexical ambiguity, failure to cope robustly with unfamiliar input items, and inadequate coverage of existing grammatical descriptions of natural language. However, present technology does make it possible, on the basis of locally defined rules for syntactically allowed contexts, to perform a shallow form of syntactic analysis in which certain linguistic annotations of text are possible. Assuming part-of-speech disambiguation analysis has already been performed by, for example, the part-of-speech tagger described above, valid sequences of syntactic tags can be identified, for example, the grammatical sequence [[Det] [Adj]] is highly common, in contrast to [[Adj] [Det]]. Furthermore, it is possible to associate the words in a sentence with the syntactic functions they play within the particular context ("@Subject", "@Object", "@Complement-Modifier", etc.), as well as indicate the structural constraints between words. For example, the [Adj] proceeding a [Noun] is dominated by that noun; in a sequence of two or more [Noun]s, the rightmost one acts as the head; etc.

Shallow syntactic analysis differs from syntactic analysis in that a complete parse tree representation is not constructed--phrase boundaries are not identified, nor are relationships between phrases recovered. However, individual lexical items are assigned, where appropriate, syntactic functions. For example, as a result of processing the sentence,

The application requires the use of a separate type of layout window for modifying user templates.

"application" would be analyzed as the main subject; "layout" would be tagged as a [Noun] and associated as a dependent (premodifier) to "window"; both "window" and "templates" would be identified as nouns in complement positions, with "use" and "modify" marked as the dominant heads to which, respectively, those nouns act as direct objects. The significance of being able to identify these relationships will be discussed below.

Thus, because natural language is highly complex and ambiguous, full syntactic analysis for an entire language is impossible given present technology. Shallow syntactic analysis, however, is possible. While not developing a complete parse tree, shallow syntactic analysis attempts to identify and generate a pointer to different structures in a sentence, including, but not limited to, for example, a subject, verb, object, complement, adjunct, etc.

DETAILED DESCRIPTION

Referring now to FIG. 3, a detailed description of an embodiment of the present invention follows.

Linguistic Analysis Stage

A data file stored on, for example, mass storage device 206 and containing technical documentation may have been created by any one of a number of commercially available desktop publishing or wordprocessing software applications. The internal representation of the data file is, for the most part, governed by the software application that created the file, e.g., Microsoft Word. Commonly, the various desktop publishing or wordprocessing software applications have their own proprietary internal data representation for keeping track of the various features of a document, e.g., the typographical, visual, and layout characteristics of the document. Natural language processing software cannot adequately deal with the arbitrary format of documents created by different software applications. Thus, the present invention assumes a uniform framework for representing, storing and accessing the document in a way which preserves the majority of typographical, visual and layout information in the data file containing the document. This is accomplished by mapping, or exporting, the document into a stream of ASCII text to which the natural language technologies of the present invention can be applied. This prerequisite is fulfilled according to application-specific means outside the scope of the present invention. Essentially, what this means is that wordprocessing or desktop publishing application software must create an ASCII-based representation of the internal data format. In doing so, typically all that occurs is the internal representation of the file is converted from binary format to ASCII format--markup tags providing information regarding document structure may still be in a proprietary format, e.g., Microsoft Word Rich Text Format (RTF).

Furthermore, in addition to extruding ASCII text from, e.g., a Microsoft Word file containing a document having a proprietary internal representation for document structure (a process which will yield a text corpus), it is essential when exporting the document, for reasons discussed below, that this text corpus retain the markup information contained therein concerning the logical and physical structure of the document. Not all markup information may be important. The key in retaining markup information is to strip formatting information that is not important but maintain that which is, along with the text to which it applies. Markup information is information in the form of tags interspersed throughout the document which is used to (conceptually) drive a typesetting machine. To the extent there is important textual information in a sentence, there is equally important information in the way the text is visually organized and presented on a page. As will be seen, the fact that a phrase appears in a subject or chapter heading makes it much more important than if it were embedded in the middle of a long paragraph of text, and thus, more likely that it merits incorporation in the domain catalog. Unlike prior art text processing technologies, the present invention seeks to appreciate the context, in linguistic, document layout and structure terms, in which a phrase, e.g., a noun phrase, appears, and thus, markup information should be maintained in the ASCII text stream.

At step 301 in the cascade of individual linguistic processing modules, the present invention translates the data file containing the ASCII text of a document having a proprietary internal representation for document structure (created by application software outside the scope of the present invention as discussed above) to a data file containing the ASCII text of a document having a standard internal representation for document structure, which may, for example, generally conform to SGML (standard general markup language). The purpose of this step is to provide a standard internal representation for document structure information (i.e., markup tags) in the file containing the document, one which is understood by the natural language processes of the present invention. Thus, subsequent modules in the cascade need only understand one standard file format.

An example of a standard file format is set forth below, hereinafter referred to as example A. It can be seen that markup tags containing information such as the beginning and ending of chapter headings, lists and the items listed therein, subsections, paragraphs, and different text typefaces, such as bold or italics, are interspersed throughout the ASCII text. The example is taken from a portion of an online copy of the Apple Macintosh Reference guide, Chapter 1. Additional references herein related to this text document represent output generated at various stages of the cascade of linguistic processing modules using the same source of technical documentation.

    </chapter>Setting Up Your Programs </echapter>
    </para> This chapter describes how to set up the programs that you use when
    you work with your computer. </epara>
    </section> Installing your application programs </esection>
    </para> Most application programs come on floppy disks, and you install
     them
    by copying them from the floppy disks to your hard disk. Some programs have
    special installation instructions. See the documentation that came with
     your
    programs. </epara>
    </para> To use your programs most effectively: </epara>
    </list>
    </item> Put only one copy of each program on your hard disk. Having more
     than
    one copy can cause errors. </eitem>
    </item> Whenever you copy a program disk to your hard disk, be careful not
     to
    copy a System Folder. Always check to see what you've copied, and drag any
    extra System Folders to the Trash. </eitem>
    </item> If a program malfunctions consistently, try installing a fresh
     copy.
    If that does not help, find out from the software manufacturer whether your
    version of the program is compatible with the system software you're using.
    </eitem>
    </item> Put frequently used programs (or aliases for those programs) in the
    Apple menu so you can open the programs more conveniently. See Chapter 5,
    "Adapting Your Computer to Your Own Use." </eitem>
    </item> To open a program automatically each time you start up, you can put
    the program (or its alias) into the Startup Items folder. See Chapter 5,
    "Adapting Your Computer to Your Own Use." <(/eitem>
    </elist>


EXAMPLE A

As part of this translation process, visual characteristics of the text are mapped to the logical function that the characteristics perform, e.g., red text may indicate a chapter heading, bold text may mean a subsection heading, a string of 12-point Helvetica text may indicate a paragraph of text. This logical function information representing markup information is retained in the form of a tag at the beginning of each record in the data file. Maintaining such logical function information entails, for example, identification of chapter, section, subsection and other headings, as well as parsing of lists and sublists. To reiterate, the rationale behind this is not only that, for example, section and subsection headings are good places to identify technical terms, but more interestingly, the structure of a running discourse of technical text is itself quite revealing with respect to offering clues to information that describe the domain or application to which the content of the document is directed. For example, definitions of terms are typically found at the beginning of introductory paragraphs, section units typically are concerned with describing the functionality of closely related components, and phrases that are emphasized (e.g., by bold or italic font) are clearly important, etc.

As is step 301, step 302 is primarily a prepatory step in anticipation of step 303 and later steps in the cascade of linguistic processes. Linguistic analysis of text at step 303 assumes the document contains only ASCII text. The markup tags, therefore, must be stripped, as demonstrated in below in example B:

    Setting Up Your Programs **
    This chapter describes how to set up the programs that you use when
    you work with your computer.
    Installing your application programs **
    Most application programs come on floppy disks, and you install them
    by copying them from the floppy disks to your hard disk. Some programs have
    special installation instructions. See the documentation that came with
     your
    programs.
    To use your programs most effectively:
    Put only one copy of each program on your hard disk. Having more than
    one copy can cause errors.
    Whenever you copy a program disk to your hard disk, be careful not to
    copy a System Folder. Always check to see what you've copied, and drag any
    extra System Folders to the Trash.
    If a program malfunctions consistently, try installing a fresh copy.
    If that does not help, find out from the software manufacturer whether your
    version of the program is compatible with the system software you're using.
    Put frequently used programs (or aliases for those programs) in the
    Apple menu so you can open the programs more conveniently. See Chapter 5,
    "Adapting Your Computer to Your Own Use."
    To open a program automatically each time you start up, you can put
    the program (or its alias) into the Startup Items folder. See Chapter 5,
    "Adapting Your Computer to Your Own Use."


EXAMPLE B

However, as was previously mentioned, this information is subsequently used by the present invention, so it is not discarded, but saved in a temporary file and merged back into the text stream at a later step in the cascade, as will be discussed below.

The ASCII text free of markup information produced at step 302 is next analyzed with respect to its lexical and morphological content at step 303. Each word is annotated to include its lexical and morphological features, including a part-of-speech tag for each morphological context, and a possible syntactic label obtained by way of shallow syntactic analysis. For example, with reference to FIG. 4, text phrase 401, after analysis and annotation, appears as annotated phrase 400.

Below is an example, hereinafter referred to as example C, of such analysis and annotation as performed on the ASCII text provided in example B. For example, each record is comprised of a word of text and its annotations. Each word appears in its original form as used in the document and its base form, both in double quotes. Lexical annotations follow and are encapsulated by < >. The morphological annotation follows, in uppercase. Where more than one possible part-of-speech tag exists, each tag is shown annotated on a separate row. For example, the word "set" in the sentence "this chapter describes how to set up the programs that you use when you work with your computer", as set forth in the example below, has 6 possible part-of-speech tags: it may be interpreted as, among other things, a past tense verb in finite form (V PAST VFIN), a normal present tense, non third person singular finite verb (V PRES-SG3 VFIN), and a verb in its infinitive form (V INF), etc. The possible syntactic function is provided in the form of syntactic label, if present, and is the last annotation affixed to each word.

    ("<*setting>"
         ("set" <*> <SVOC/A> <SVO> <SVOO> <SV<P/on> PCP1 ))
    ("up" <*> PREP)
    ("up" <*> ADV ADVL (@ADVL)))
    ("<*your>"
         ("you" <*> PRON PERS GEN SG2/PL2 (@GN> ) ))
    ("<*programs>"
         ("program" <*> <SVO> V PRES SG3 VFIN (@+FMAINV))
         ("program" <*> N NOM PL))
    ("<$HEAD>")
    ("<*this>"
         ("this" <*> DET CENTRAL DEM SG (@DN>))
         ("this" <*> ADV AD-A> (@AD-A>))
         ("this" <*> PRON DEM SG ))
    ("<chapter>"
         ("chapter" N NOM SG))
    ("<describes>"
         ("describe" <as/SVOC/A> <SVO> V PRES SG3 VFIN (@+FMAINV) ))
    ("<how>"
         ("how" <**CLB> ADV WH ))
    ("<to>"
         ("to" PREP)
         ("to" INFMARK> (@INFMARK>) ))
    ("<set>"
         ("set" <SVOC/A> <SVO> <SVOO> <SV> <P/on> PCP2)
         ("set" <SVOC/A> <SVO> <SVOO> <SV> <P/on> V PAST VFIN (@+FMAINV))
         ("set" <SVOC/A> <SVO> <SVOO> <SV> <P/on> V SUBJUNCTIVE VFIN
     (@+FMAINV))
         ("set" <SVOC/A> <SVO> <SVOO> <SV> <P/on> V IMP VFIN (@+FMAINV))
         ("set" <SVOC/A> <SVO> <SVOO> <SV> <P/on> V INF)
         ("set" <SVOC/A> <SVO> <SVOO> <SV> <P/on> V PRES -SG3 VFIN (@+FMAINV))
         ("set" N NOM SG))
    ("<up>"
         ("up" PREP)
         ("up" ADV ADVL (@ADVL)))
    ("<the>"
         ("the" <Def> DET CENTRAL ART SG/PL (@DN>) ))
    ("<programs>"
         ("program" <SVO> V PRES SG3 VFIN (@+FMAINV) )
         ("program" N NOM PL))
    ("<that>"
         ("that" <**CLB> CS (@CS) )
         ("that" DET CENTRAL DEM SG (@DN>) )
         ("that" ADV AD-A> (@AD-A>) )
         ("that" PRON DEM SG )
         ("that" <NonMod> <**CLB> <Rel> PRON SG/PL))
    ("<you>"
         ("you" <NonMod> PRON PERS NOM SG2/PL2)
         ("you" <NonMod> PRON PERS ACC SG2/PL2))
    ("<use>"
         ("use" N NOM SG)
         ("use" <as/SVOC/A> <SVO> <SV> V SUBJUNCTIVE VFIN (@+FMAINV) )
         ("use" <as/SVOC/A> <SVO> <SV> V IMP VFIN (@+FMAINV) )
         ("use" <as/SVOC/A> <SVO> <SV> V INF )
         ("use" <as/SVOC/A> <SVO> <SV> V PRES -SG3 VFIN (@+FMAINV) ))
    ("<when>"
         ("when" <**CLB> ADV WH (@ADVL) ))
    ("<you>"
         ("you" <NonMod> PRON PERS NOM SG2/PL2)
         ("you" <NonMod> PRON PERS ACC SG2/PL2))
    ("<work>"
         ("work" <SV> <SVO> <P/in> <P/on> V SUBJUNCTIVE VFIN (@+FMAINV) )
         ("work" <SV> <SVO> <P/in> <P/on> V IMP VFIN (@+FMAINV) )
         ("work" <SV> <SVO> <Plin> <P/on> V INF )
         ("work" <SV> <SVO> <P/in> <P/on> V PRES -SG3 VFIN (@+FMAINV) )
         ("work" N NOM SG ))
    ("<with>"
         ("with" PREP ))
    ("<your>"
         ("you" PRON PERS GEN SG2/PL2 (@GN>) ))
    ("<computer>"
         ("computer" <DER:er> N NOM SG ))
    ("<$.>")
    ("<*installing>"
         ("instal" <*> <SVO> PCP1 ))
    ("<your>"
         ("you" PRON PERS GEN SG2/PL2 (@GN>) ))
    ("<application>"
         ("application" N NOM SG ))
    ("<programs>"
         ("program" <SVO> V PRES SG3 VFIN (@+FMAINV) )
         ("program" N NOM PL ))
    ("<$HEAD>")
    ("<*most>"
         ("much" <*> ADV SUP)
         ("much" <*> <Quant> PRON SUP SG)
         ("much" <*> <Quant> DET POST SUP SG (@QN>))
         ("many" <*> <Quant> PRON SUP PL)
         ("many" <*> <Quant> DET POST SUP PL (@QN>)))
    ("<application>"
    ("application" N NOM SG ))
    ("<programs>"
         ("program" <SVO> V PRES SG3 VFIN (@+FMAINV) )
         ("program" N NOM PL ))
    ("<come>"
         ("come" <SVC/A> <SV> <P/for> PCP2)
         ("come" <SVC/A> <SV> <P/for> V SUBJUNCTIVE VFIN (@+FMAINV) )
         ("come" <SVC/A> <SV> <P/for> V IMP VFIN (@+FMAINV) )
         ("come" <SVC/A> <SV> <P/for> V INF)
         ("come" <SVC/A> <SV> <P/for> V PRES -SG3 VFIN (@+FMAINV) ))
    ("<on>"
         ("on" PREP)
         ("on" ADV ADVL (@ADVL ) ))
    ("<floppy_disks>"
         ("floppy_disk" N NOM PL ))
    ("<&.backslash.,>")
    ("<and>"
         ("and" CC (@CC ) ))
    ("<you>"
         ("you" <NonMod> PRON PERS NOM SG2/PL2 )
         ("you" <NonMod> PRON PERS ACC SG2/PL2 ))
    ("<install>"
         ("install" <SVO> V SUBJUNCTIVE VFIN (@+FMAINV) )
         ("install" <SVO> V IMP VFIN (@+FMAINV) )
         ("install" <SVO> V INF)
         ("install" <SVO> V PRES -SG3 VFIN (@+FMAINV) ))
    ("<them>"
         ("they" <NonMod> PRON PERS ACC PL3 ))
    ("<by>"
         ("by" PREP )
         ("by" ADV ADVL (@ADVL) ))
    ("<copying>"
         ("copy" <SVO> <SV> <P/of> PCP1 ))
    ("<them>"
         ("they" <NonMod> PRON PERS ACC PL3 ))
    ("<from>"
         ("from" PREP ))
    ("<the>"
         ("the" <Def> DET CENTRAL ART SG/PL (@DN>) ))
    ("<floppy_disks>"
         ("floppy_disk" N NOM PL ))
    ("<to>"
         ("to" PREP)
         ("to" INFMARK> (@INFMARK>) ))
    ("<your>"
         ("you" PRON PERS GEN SG2/PL2 (@GN>) ))
    ("<hard disk>"
         ("hard_disk" N NOM SG ))
    ("<$.>")
    ("<*some>"
    ("some" <*> <Quant> DET CENTRAL SG/PL (@QN>) )
    ("some" <*> ADV )
    ("some" <*> <NonMod> <Quant> PRON SG/PL ))
    ("<programs>"
         ("program" <SVO> V PRES SG3 VFIN (@+FMAINV) )
         ("program" N NOM PL ))
    ("<have>"
         ("have" <SVO> <SVOC/A> V SUBJUNCTIVE VFIN (@+FMAINV) )
         ("have" <SVO> <SVOC/A> V PRES -SG3 VFIN)
         ("have" <SVO> <SVOC/A> V INF )
         ("have" <SVO> <SVOC/A> V IMP VFIN (@+FMAINV) ))
    ("<special>"
         ("special" A ABS ))
    ("<installation>"
    ("installation" N NOM SG ))
    ("<instructions>"
         ("instruction" N NOM PL ))
    ("<$.>")
    ("<*see>"
         ("see" <*> <as/SVOC/A> <SVO> <SV> <InfComp> V SUBJUNCTIVE VFIN
     (@+FMAINV)
    )
         ("see" <*> <as/SVOC/A> <SVO> <SV> <InfComp> V IMP VFIN (@+FMAINV) )
         ("see" <*> <as/SVOC/A> <SVO> <SV> <InfComp> V INF)
         ("see" <*> <as/SVOC/A> <SVO> <SV> <lnfComp> V PRES -SG3 VFIN
     (@+FMAINV) ))
    ("<the>"
         ("the" <Def> DET CENTRAL ART SG/PL (@DN>) ))
    ("<documentation>"
         ("documentation" <-Indef> N NOM SG ))
    ("<that>"
         ("that" <**CLB> CS (@CS) )
         ("that" DET CENTRAL DEM SG (@DN>) )
         ("that" ADV AD-A> (@AD-A>) )
         ("that" PRON DEM SG )
         ("that" <NonMod> <**CLB> <Rel> PRON SG/PL ))
    ("<came>"
         ("come" <SVC/A> <SV> <P/for> V PAST VFIN (@+FMAINV) ))
    ("<with>"
         ("with" PREP ))
    ("<your>"
         ("you" PRON PERS GEN SG2/PL2 (@GN>) ))
    ("<programs>"
         ("program" <SVO> V PRES SG3 VFIN (@+FMAINV) )
         ("program" N NOM PL ))
    ("<$.>")
    ("<*to>"
         ("to" <*> PREP)
         ("to" <*> INFMARK> (@INFMARK>) ))
    ("<use>"
         ("use" N NOM SG)
         ("use" <as/SVOC/A> <SVO> <SV> V SUBJUNCTIVE VFIN (@+FMAINV) )
         ("use" <as/SVOC/A> <SVO> <SV> V IMP VFIN (@+FMAINV) )
         ("use" <as/SVOC/A> <SVO> <SV> V INF )
         ("use" <as/SVOC/A> <SVO> <SV> V PRES -SG3 VFIN (@+FMAINV) ))
    ("<your>"
         ("you" PRON PERS GEN SG2/PL2 (@GN>) ))
    ("<programs>"
         ("program" <SVO> V PRES SG3 VFIN (@+FMAINV) )
         ("program" N NOM PL ))
    ("<most>"
         ("much" ADV SUP )
         ("much" <Quant> PRON SUP SG )
         ("much" <Quant> DET POST SUP SG (@QN>) )
         ("many" <Quant> PRON SUP PL )
         ("many" <Quant> DET POST SUP PL (@QN>) ))
    ("<effectively>"
         ("effective" <DER:ive> <DER:ly> ADV ))
    ("<$.backslash.:>")
    ("<*put>"
         ("put" <*> <SVO> PCP2 )
         ("put" <*> <SVO> V PAST VFIN (@+FMAINV) )
         ("put" <*> <SVO> V SUBJUNCTIVE VFIN (@+FMAINV) )
         ("put" <*> <SVO> V IMP VFIN (@+FMAINV) )
         ("put" <*> <SVO> V INF )
         ("put" <*> <SVO> V PRES -SG3 VFIN (@+FMAINV) ))
    ("<only>"
         ("only" ADV)
         ("only" A ABS ))
    ("<one>"
         ("one" NUM CARD )
         ("one" PRON NOM SG ))
    ("<copy>"
         ("copy" <SVO> <SV> <P/of> V SUBJUNCTIVE VFIN (@+FMAINV) )
         ("copy" <SVO> <SV> <P/of> V IMP VFIN (@+FMAINV) )
         ("copy" <SVO> <SV> <P/of> V INF )
         ("copy" <SVO> <SV> <P/of> V PRES -SG3 VFIN (@+FMAINV) )
         ("copy" N NOM SG ))
    ("<of>"
         ("of" PREP ))
    ("<each>"
         ("each" <Quant> DET CENTRAL SG (@QN>))
         ("each" <NonMod> <Quant> PRON SG ))
    ("<program>"
         ("program" <SVO> V SUBJUNCTIVE VFIN (@+FMAINV) )
         ("program" <SVO> V IMP VFIN (@+FMAINV) )
         ("program" <SVO> V INF )
         ("program" <SVO> V PRES -SG3 VFIN (@+FMAINV) )
         ("program" N NOM SG ))
    ("<on>"
         ("on" PREP)
         ("on" ADV ADVL (@ADVL) ))
    ("<your>"
         ("you" PRON PERS GEN SG2/PL2 (@GN>) ))
    ("<hard disk>"
         ("hard_disk" N NOM SG ))
    ("<$.>")
    ("<*having>"
         ("have" <*> <SVO> <SVOC/A> PCP1 ))
    ("<more=than>"
         ("more=than" <CompPP> PREP )
         ("more=than" <**CLB> CS (@CS) )
         ("more=than" ADV ))
    ("<one>"
         ("one" NUM CARD)


("one" PRON NOM SG )) ("<copy>" ("copy" <SVO> <SV> <P/of> V SUBJUNCTIVE VFIN (@+FMAINV) ) ("copy" <SVO> <SV> <P/of> V IMP VFIN (@+FMAINV) ) ("copy" <SVO> <SV> <P/of> V INF) ("copy" <SVO> <SV> <P/of> V PRES -SG3 VFIN (@+FMAINV) ) ("copy" N NOM SG )) ("<can>" ("can" N NOM SG) ("can" V AUXMOD VFIN (@+FAUXV ) )) ("<cause>" ("cause" N NOM SG) ("cause" <SVO> <SVOO> V SUBJUNCTIVE VFIN (@+FMAINV) ) ("cause" <SVO> <SVOO> V IMP VFIN (@+FMAINV) ) ("cause" <SVO> <SVOO> V INF ) ("cause" <SVO> <SVOO> V PRES -SG3 VFIN (@+FMAINV) )) ("<errors>" ("error" N NOM PL )) ("<$.>") ("<*whenever>" ("whenever" <*> <**CLB> ADV WH (@ADVL) )) ("<you>" ("you" <NonMod> PRON PERS NOM SG2/PL2 ) ("you" <NonMod> PRON PERS ACC SG2/PL2 )) ("<copy>" ("copy" <SVO> <SV> <P/of> V SUBJUNCTIVE VFIN (@+FMAINV) ) ("copy" <SVO> <SV> <P/of> V IMP VFIN (@+FMAINV) ) ("copy" <SVO> <SV> <P/of> V INF ) ("copy" <SVO> <SV> <P/of> V PRES -SG3 VFIN (@+FMAINV) ) ("copy" N NOM SG )) ("<a>" ("a" <Indef> DET CENTRAL ART SG (@DN>) )) ("<program>" ("program" <SVO> V SUBJUNCTIVE VFIN (@+FMAINV) ) ("program" <SVO> V IMP VFIN (@+FMAINV) ) ("program" <SVO> V INF ) ("program" <SVO> V PRES -SG3 VFIN (@+FMAINV) ) ("program" N NOM SG )) ("<disk>" ("disk" <SVO> <Rare> V IMP VFIN (@+FMAINV) ) ("disk" <SVO> <Rare> V INF ) ("disk" N NOM SG )) ("<to>" ("to" PREP ) ("to" INFMARK> (@INFMARK>) )) ("<your>" ("you" PRON PERS GEN SG2/PL2 (@GN>) )) ("<hard_disk>" ("hard_disk" N NOM SG )) ("<$.backslash.,>") ("<be>" ("be" <SV> <SVC/N> <SVC/A> V SUBJUNCTIVE VFIN ) ("be" <SV> <SVC/N> <SVC/A> V INF ) ("be" <SV> <SVCIN> <SVC/A> V IMP VFIN (@+FMAINV) )) ("<careful>" ("careful" A ABS )) ("<not>" ("not" NEG-PART (@NEG ) )) ("<to>" ("to" PREP) ("to" INFMARK> (@INFMARK>) )) ("<copy>" ("copy" <SVO> <SV> <P/of> V SUBJUNCTIVE VFIN (@+FMAINV) ) ("copy" <SVO> <SV> <P/of> V IMP VFIN (@+FMAINV) ) ("copy" <SVO> <SV> <P/of> V INF ) ("copy" <SVO> <SV> <P/of> V PRES -SG3 VFIN (@+FMAINV) ) ("copy" N NOM SG )) ("<a>" ("a" <Indef> DET CENTRAL ART SG (@DN>) )) ("<*system>" ("system" <*> N NOM SG )) ("<*folder>" ("folder" <*> <DER:er> N NOM SG )) ("<$.>") ("<*always>" ("always" <*> ADV ADVL (@ADVL) )) ("<check>" ("check" <SVO> <SV> <P/for> <P/with> <P/on> V SUBJUNCTIVE VFIN (@+FMAINV) ) ("check" <SVO> <SV> <P/for> <P/with> <P/on> V IMP VFIN (@+FMAINV) ) ("check" <SVO> <SV> <P/for> <P/with> <P/on> V INF ) ("check" <SVO> <SV> <P/for> <Plwith> <P/on> V PRES -SG3 VFIN (@+FMAINV) ) ("check" N NOM SG )) ("<to>" ("to" PREP ) ("to" INFMARK> (@INFMARK>) )) ("<see>" ("see" <as/SVOC/A> <SVO> <SV> <InfComp> V SUBJUNCTIVE VFIN (@+FMAINV) ) ("see" <as/SVOC/A> <SVO> <SV> <InfComp> V IMP VFIN (@+FMAINV) ) ("see" <as/SVOC/A> <SVO> <SV> <InfComp> V INF ) ("see" <as/SVOC/A> <SVO> <SV> <InfComp> V PRES -SG3 VFIN (@+FMAINV) )) ("<what>" ("what" <NonMod> <**CLB> PRON WH SG/PL ) ("what" <**CLB> DET PRE WH SG/PL (@DN>) )) ("<you_>" ("you" <NonMod> PRON PERS NOM SG2/PL2 SUBJ (@SUBJ) )) ("<_'ve>" ("have" <SVO> V PRES -SG3VFIN )) ("<copied>" ("copy" <SVO> <SV> <P/of> PCP2 ) ("copy" <SVO> <SV> <P/of> V PAST VFIN (@+FMAINV) )) ("<$.backslash.,>") ("<and>" ("and" CC (@CC ) )) ("<drag>" ("drag" <SVO> <SV> V SUBJUNCTIVE VFIN (@+FMAINV) ) ("drag" <SVO> <SV> V IMP VFIN (@+FMAINV) ) ("drag" <SVO> <SV> V INF ) ("drag" <SVO> <SV> V PRES -SG3 VFIN (@+FMAINV) ) ("drag" N NOM SG )) ("<any>" ("any" <Quant> DET CENTRAL SG/PL (@QN>) ) ("any" ADV AD-A> (@AD-A>) ) ("any" <NonMod> <Quant> PRON SG/PL )) ("<extra>" ("extra" A ABS )) ("<*system>" ("system" <*> N NOM SG )) ("<*folders>" ("folder" <*> <DER:er> N NOM PL )) ("<to>" ("to" PREP) ("to" INFMARK> (@INFMARK>) )) ("<the>" ("the" <Def> DET CENTRAL ART SG/PL (@DN>) )) ("<*trash>" ("trash" <*> <SVO> V SUBJUNCTIVE VFIN (@+FMAINV) ) ("trash" <*> <SVO> V IMP VFIN (@+FMAINV) ) ("trash" <*> <SVO> V INF ) ("trash" <*> <SVO> V PRES -SG3 VFIN (@+FMAINV) ) ("trash" <*> <-Indef> N NOM SG )) ("<$.>") ("<*if>" ("if" <*> <**CLB> CS (@CS) )) ("<a>" ("a" <Indef> DET CENTRAL ART SG (@DN>) )) ("<program>" ("program" <SVO> V SUBJUNCTIVE VFIN (@+FMAINV) ) ("program" <SVO> V IMP VFIN (@+FMAINV) ) ("program" <SVO> V INF ) ("program" <SVO> V PRES -SG3 VFIN (@+FMAINV) ) ("program" N NOM SG )) ("<malfunctions>" ("malfunction" <SV> V PRES SG3 VFIN (@+FMAINV) )) ("<consistently>" ("consistent" <DER:ly> ADV )) ("<$.backslash.,>") ("<try>" ("try" <SVO> <SV> <P/for> V SUBJUNCTIVE VFIN (@+FMAINV) ) ("try" <SVO> <SV> <P/for> V IMP VFIN (@+FMAINV) ) ("try" <SVO> <SV> <P/for> V INF ) ("try" <SVO> <SV> <P/for> V PRES -SG3 VFIN (@+FMAINV) ) ("try" N NOM SG )) ("<installing>" ("instal" <SVO> PCP1 )) ("<a>" ("a" <Indef> DET CENTRAL ART SG (@DN>) )) ("<fresh>" ("fresh" A ABS )) ("<copy>" ("copy" <SVO> <SV> <P/of> V SUBJUNCTIVE VFIN (@+FMAINV) ) ("copy" <SVO> <SV> <P/of> V IMP VFIN (@+FMAINV) ) ("copy" <SVO> <SV> <P/of> V INF ) ("copy" <SVO> <SV> <P/of> V PRES -SG3 VFIN (@+FMAINV) ) ("copy" N NOM SG )) ("<$.>") ("<*if>" ("if" <*> <**CLB> CS (@CS) )) ("<that>" ("that" <**CLB> CS (@CS) ) ("that" DET CENTRAL DEM SG (@DN>) ) ("that" ADV AD-A> (@AD-A>) ) ("that" PRON DEM SG ) ("that" <NonMod> <**CLB> <Rel> PRON SG/PL )) ("<does>" ("do" <SVO> <SVOO> <SV> V PRES SG3 VFIN )) ("<not>" ("not" NEG-PART (@NEG ) )) ("<help>" ("help" <SVO> <SV> <InfComp> <P/with> V SUBJUNCTIVE VFIN (@+FMAINV) ) ("help" <SVO> <SV> <InfComp> <P/with> V IMP VFIN (@+FMAINV) ) ("help" <SVO> <SV> <InfComp> <P/with> V INF ) ("help" <SVO> <SV> <InfComp> <P/with> V PRES -SG3 VFIN (@+FMAINV) ) ("help" N NOM SG )) ("<$.backslash.,>") ("<find>" ("find" <SVOO> <SVOC/N> <SVOC/A> <SVO> <SV> <P/for> V SUBJUNCTIVE VFIN (@+FMAINV) ) ("find" <SVOO> <SVOC/N> <SVOC/A> <SVO> <SV> <P/for> V IMP VFIN (@+FMAINV) ) ("find" <SVOO> <SVOC/N> <SVOC/A> <SVO> <SV> <P/for> V INF ) ("find" <SVOO> <SVOC/N> <SVOC/A> <SVO> <SV> <P/for> V PRES -SG3 VFIN (@+FMAINV) ) ("find" N NOM SG )) ("<out>" ("out" ADV ADVL (@ADVL ) )) ("<from>" ("from" PREP )) ("<the>" ("the" <Def> DET CENTRAL ART SG/PL (@DN>) )) ("<software>" ("software" <-Indef> N NOM SG )) ("<manufacturer>" ("manufacturer" <DER:er> N NOM SG )) ("<whether>" ("whether" <**CLB> CS(@CS ) )) ("<your>" ("you" PRON PERS GEN SG2/PL2 (@GN>) )) ("<version>" ("version" N NOM SG )) ("<of>" ("of" PREP )) ("<the>" ("the" <Def> DET CENTRAL ART SG/PL (@DN>) )) ("<prcgram>" ("program" <SVO> V SUBJUNCTIVE VFIN (@+FMAINV) ) ("program" <SVO> V IMP VFIN (@+FMAINV) ) ("program" <SVO> V INF ) ("program" <SVO> V PRES -SG3 VFIN (@+FMAINV) ) ("program" N NOM SG ) ) ("<is>" ("be" <SV> <SVC/N> <SVC/A> V PRES SG3 VFIN )) ("<compatible>" ("compatible" <DER:ble> A ABS )) ("<with>" ("with" PREP )) ("<the>" ("the" <Def> DET CENTRAL ART SG/PL (@DN>) )) ("<system>" ("system" N NOM SG )) ("<software>" ("software" <-Indef> N NOM SG )) ("<you_>" ("you_" <NonMod> PRON PERS NOM SG2/PL2 SUBJ (@SUBJ ) )) ("<_'re>" ("be" <SV> <SVC/N> <SVC/A> V PRES -SG1,3 VFIN )) ("<using>" ("use" <as/SVOC/A> <SVO> <SV> PCP1 )) ("<$.>") ("<*put>" ("put" <*> <SVO> PCP2 ) ("put" <*> <SVO> V PAST VFIN (@+FMAINV) ) ("put" <*> <SVO> V SUBJUNCTIVE VFIN (@+FMAINV) ) ("put" <*> <SVO> V IMP VFIN (@+FMAINV) ) ("put" <*> <SVO> V INF ) ("put" <*> <SVO> V PRES -SG3 VFIN (@+FMAINV) )) ("<frequently>" ("frequent" <DER:ly> ADV )) ("<used>" ("use" <as/SVOC/A> <SVO> <SV> PCP2 ) ("use" <as/SVOC/A> <SVO> <SV> V PAST VFIN (@+FMAINV) )) ("<programs>" ("program" <SVO> V PRES SG3 VFIN (@+FMAINV) ) ("program" N NOM PL ))

("<$.backslash.(>") ("<or>" ("or" CC (@CC ) )) ("<aliases>" ("alias" N NOM PL )) ("<for>" ("for" PREP) ("for" <**CLB> CS (@CS ) )) ("<those>" ("that" DET CENTRAL DEM PL (@DN>) ) ("that" PRON DEM PL )) ("<programs>" ("program" <SVO> V PRES SG3 VFIN (@+FMAINV) ) ("program" N NOM PL )) ("<$.backslash.)>") ("<in>" ("in" PREP ) ("in" ADV ADVL (@ADVL ) )) ("<the>" ("the" <Def> DET CENTRAL ART SG/PL (@DN>) )) ("<*apple>" ("apple" <*> N NOM SG )) ("<menu>" ("menu" N NOM SG )) ("<so>" ("so" <**CLB> CS (@CS) ) ("so" ADV )) ("<you>" ("you" <NonMod> PRON PERS NOM SG2/PL2 ) ("you" <NonMod> PRON PERS ACC SG2/PL2 )) ("<can>" ("can" N NOM SG ) ("can" V AUXMOD VFIN (@+FAUXV) )) ("<open>" ("open" <SVO> <SV> V SUBJUNCTIVE VFIN (@+FMAINV) ) ("open" <SVO> <SV> V IMP VFIN (@+FMAINV) ) ("open" <SVO> <SV> V INF ) ("open" <SVO> <SV> V PRES -SG3 VFIN (@+FMAINV) ) ("open" A ABS )) ("<the>" ("the" <Def> DET CENTRAL ART SG/PL (@DN>) )) ("<programs>" ("program" <SVO> V PRES SG3 VFIN (@+FMAINV) ) ("program" N NOM PL )) ("<more>" ("much" ADV CMP ) ("much" <Quant> PRON CMP SG ) ("much" <Quant> DET POST CMP SG (@QN>) ) ("many" <Quant> PRON CMP PL ) ("many" <Quant> DET POST CMP PL (@QN>) )) ("<conveniently>" ("convenient" <DER:ly> ADV )) ("<$.>") ("<*see>" ("see" <*> <as/SVOC/A> <SVO> <SV> <InfComp> V SUBJUNCTIVE VFIN (@+FMAINV) ) ("see" <*> <as/SVOC/A> <SVO> <SV> <InfComp> V IMP VFIN (@+FMAINV) ) ("see" <*> <as/SVOC/A> <SVO> <SV> <InfComp> V INF ) ("see" <*> <as/SVOC/A> <SVO> <SV> <InfComp> V PRES -SG3 VFIN (@+FMAINV ) )) ("<*chapter>" ("chapter" <*> N NOM SG )) ("<5>" ("5" NUM CARD ) ("5" NUM CARD )) ("<$.backslash.,>") ("<$.backslash.">") ("<*adapting>" ("adapt" <*> <SVO> <SV> <P/for> PCP1 )) ("<*your>" ("you" <*> PRON PERS GEN SG2/PL2 (@GN>) )) ("<*computer>" ("computer" <*> <DER:er> N NOM SG )) ("<to>" ("to" PREP) ("to" INFMARK> (@INFMARK>) )) ("<*your>" ("you" <*> PRON PERS GEN SG2/PL2 (@GN>) )) ("<*own>" ("own" <*> <SVO> <SV> V SUBJUNCTIVE VFIN (@+FMAINV) ) ("own" <*> <SVO> <SV> V IMP VFIN (@+FMAINV) ) ("own" <*> <SVO> <SV> VINF ) ("own" <*> <SVO> <SV> V PRES -SG3 VFIN (@+FMAINV) ) ("own" <*> A ABS )) ("<*use>" ("use" <*> N NOM SG ) ("use" <*> <as/SVOC/A> <SVO> <SV> V SUBJUNCTIVE VFIN (@+FMAINV) ) ("use" <*> <as/SVOC/A> <SVO> <SV> V IMP VFIN (@+FMAINV) ) ("use" <*> <as/SVOC/A> <SVO> <SV> V INF ) ("use" <*> <as/SVOC/A> <SVO> <SV> V PRES -SG3 VFIN (@+FMAINV) )) ("<$.>") ("<$.backslash.">") ("<*to>") ("to" <*> PREP ) ("to" <*> INFMARK> (@INFMARK>) )) ("<open>" ("open" <SVO> <SV> V SUBJUNCTIVE VFIN (@+FMAINV) ) ("open" <SVO> <SV> V IMP VFIN (@+FMAINV) ) ("open" <SVO> <SV> V INF ) ("open" <SVO> <SV> V PRES -SG3 VFIN (@+FMAINV) ) ("open" A ABS )) ("<a>" ("a" <Indef> DET CENTRAL ART SG (@DN>) )) ("<program>" ("program" <SVO> V SUBJUNCTIVE VFIN (@+FMAINV) ) ("program" <SVO> V IMP VFIN (@+FMAINV) ) ("program" <SVO> V INF ) ("program" <SVO> V PRES -SG3 VFIN (@+FMAINV) ) ("program" N NOM SG )) ("<automatically>" ("automatical" <DER:ic> <DER:al> <DER:ly> ADV )) ("<each>" ("each" <Quant> DET CENTRAL SG (@QN>)) ("each" <NonMod> <Quant> PRON SG )) ("<time>" ("time" N NOM SG ) ("time" <SVO> <Rare> V IMP VFIN (@+FMAINV) ) ("time" <SVO> <Rare> V INF ) ) ("<you>" ("you" <NonMod> PRON PERS NOM SG2/PL2 ) ("you" <NonMod> PRON PERS ACC SG2/PL2 )) ("<start>" ("start" <SV> <SVO> <P/on> V SUBJUNCTIVE VFIN (@+FMAINV) ) ("start" <SV> <SVO> <P/on> V IMP VFIN (@+FMAINV) ) ("start" <SV> <SVO> <P/on> V INF ) ("start" <SV> <SVO> <P/on> V PRES -SG3 VFIN (@+FMAINV) ) ("start" N NOM SG )) ("<up>" ("up" PREP) ("up" ADV ADVL (@ADVL) )) ("<$.backslash.,>") ("<you>" ("you" <NonMod> PRON PERS NOM SG2/PL2 ) ("you" <NonMod> PRON PERS ACC SG2/PL2 )) ("<can>" ("can" N NOM SG) ("can" V AUXMOD VFIN (@+FAUXV) )) ("<put>" ("put" <SVO> PCP2 ) ("put" <SVO> V PAST VFIN (@+FMAINV) ) ("put" <SVO> V SUBJUNCTIVE VFIN (@+FMAINV) ) ("put" <SVO> V IMP VFIN (@+FMAINV) ) ("put" <SVO> V INF ) ("put" <SVO> V PRES -SG3 VFIN (@+FMAINV) )) ("<the>" ("the" <Def> DET CENTRAL ART SG/PL (@DN>) )) ("<program>" ("program" <SVO> V SUBJUNCTIVE VFIN (@+FMAINV) ) ("program" <SVO> V IMP VFIN (@+FMAINV) ) ("program" <SVO> V INF ) ("program" <SVO> V PRES -SG3 VFIN (@+FMAINV) ) ("program" N NOM SG )) ("<$.backslash.(>") ("<or>" ("or" CC (@CC) )) ("<its>" ("it" PRON GEN SG3 )) ("<alias>" ("alias" <Rare> V IMP VFIN (@+FMAINV) ) ("alias" <Rare> V INF ) ("alias" N NOM SG )) ("<$.backslash.)>") ("<into>" ("into" PREP )) ("<the>" ("the" <Def> DET CENTRAL ART SG/PL (@DN>) )) ("<*startup>" ("startup" <*> N NOM SG )) ("<*items>" ("item" <*> N NOM PL )) ("<folder>" ("folder" <DER:er> N NOM SG )) ("see" <*> <as/SVOC/A> <SVO> <SV> <InfComp> V SUBJUNCTIVE VFIN (@+FMAINV) ) ("see" <*> <as/SVOC/A> <SVO> <SV> <InfComp> V IMP VFIN (@+FMAINV) ) ("see" <*> <as(SVOC/A> <SVO> <SV> <InfComp> V INF ) ("see" <*> <as/SVOC/A> <SVO> <SV> <lnfComp> V PRES -SG3 VFIN (@+FMAINV) )) ("<*chapter>" ("chapter" <*> N NOM SG )) ("<5>" ("5" NUM CARD ) ("5" NUM CARD )) ("<$.backslash.,>") ("<$.backslash.">") ("<*adapting>" ("adapt" <*> <SVO> <SV> <P/for> PCP1 )) ("<*your>" ("you" <*> PRON PERS GEN SG2/PL2 (@GN>) )) ("<*computer>" ("computer" <*> <DER:er> N NOM SG )) ("<to>" ("to" PREP ) ("to" INFMARK> (@INFMARK>) )) ("<*your>" ("you" <*> PRON PERS GEN SG2/PL2 (@GN>) )) ("<*own>" ("own"<*> <SVO> <SV> V SUBJUNCTIVE VFIN (@+FMAINV) ) ("own" <*> <SVO> <SV> V IMP VFIN (@+FMAINV) ) ("own" <*> <SVO> <SV> V INF ) ("own" <*> <SVO> <SV> V PRES -SG3 VFIN (@+FMAINV) ) ("own" <*> A ABS )) ("<*use>" ("use" <*> N NOM SG ) ("use" <*> <as/SVOC/A> <SVO> <SV> V SUBJUNCTIVE VFIN (@+FMAINV) ) ("use" <*> <as/SVOC/A> <SVO> <SV> V IMP VFIN (@+FMAINV) ) ("use" <*> <as/SVOC/A> <SVO> <SV> V INF ) ("use" <*> <as/SVOC/A> <SVO> <SV> V#PRES -SG3 VFIN (@+FMAINV) )) ("<$.>"


EXAMPLE C

Each word of text is further analyzed to disambiguate, if appropriate, between the part-of-speech possibilities, and determine the syntactic function of each word, as set forth in the example below, hereinafter referred to as example D (wherein part-of-speech disambiguation analysis is accomplished). Each of the linguistic processes is discussed in turn below.

    "<*setting>" "set" <*> <SVOC/A><SVO><SVOO><SV><P/on>PCP1
    "<*up>" "up" <*> ADV ADVL @ADVL
    "<*your>" "you" <*> PRON PERS GEN SG2/PL2 @GN>
    "<*program<" "program" <*> N NOM PL
    "<$HEAD>"
    "<*this>" "this" <*> DET CENTRAL DEM SG @DN>
    "<chapter>" "chapter"  N NOM SG
    "<describes>" "describe" <as/SVOC/A><SVO> V PRES SG3 VFIN @+FMAINV
    "<how>" "how" <**CLB> ADV WH
    "<to> " "to"  INFMARK> @INFMARK>
    "<set> " "set" <SVOC/A> <SVO> <SVOO> <SV> <P/on> V INF
    "<up> " "up" ADV ADVL @ADVL
    "<the> " "the" <Def> DET CENTRAL ART SG/PL @DN>
    "<programs> " "programt" N NOM PL
    "<that> " "that" <**CLB> CS @CS   "that" <NonMod> <**CLB> <Rel> PRON SG/PL
    "<you> " "you"<NonMod> PRON PERS NOM SG2/PL2
    "<use> " "use" N NOM SG  "use" <as/SVOC/A> <SVO> <SV> V PRES -SG3 VFIN
    @+FMAINV
    "<when> " "when" <**CLB> ADV WH @ADVL
    "<your> " "you" <NonMod> PRON PERS NOM SG2/PL2
    "<work> " "work" <SV> <SVO> <P/in> <P/on> V PRES -SG3 VFIN @+FMAINV
    "<with> " "with" PREP
    "<your> " "you" PRON PERS GEN SG2/PL2 @GN>
    "<computer> " "computer" <DER:er> N NOM SG
    "<$.> "
    "<installing> " "instal" <*> <SVO> PCP1
    "<your> " "you" PRON PERS GEN SG2/PL2 @GN>
    "<application> " "application" N NOM SG
    "<programs> " "program" N NOM PL
    "<$HEAD> "
    "<*most> " "much" <*> <Quant> DET POST SUP SG @QN> "many" <*> <Quant>
    DET POST SUP PL @QN>
    "<appiication> " "application" N NOM SG
    "<programs> " "program" N NOM PL
    "<come> " "come" <SVC/A> <SV> <P/for> V PRES -SG3 VFIN @+FMAINV
    "<on> " "on" PREP  "on" ADV ADVL @ADVL
    "<floppy_disks> "   "floppy_disk" N NOM PL
    "<$,> "
    "<and> " "and" CC @CC
    "<you> " "you" <NonMod> PRON PERS NOM.SG2/PL2
    "<install> " "install" <SVO> V PRES -SG3 VFIN @+FMAINV
    "<them> " "they" <NonMod> PRON PERS ACC PL3
    "<by> " "by" PREP
    "<copying> " "copy" <SVO> <SV> <P/of> PCP1
    "<them> " "they" <NonMod> PRON PERS ACC PL3
    "<from> " "from" PREP
    "<the>" "the" <Def> DET CENTRAL ART SG/PL @DN>
    "<floppy_disks>" "floppy_disk" N NOM PL
    "<to>" "to" PREP
    "<your>" "you" PRON PERS GEN SG2/PL2 @GN>
    "<hard_disk>" "hard_disk" N NOM SG
    "<$.>"
    "<*some>" "some" <*> <Quant> DET CENTRAL SG/PL @QN>
    "<programs>" "program" N NOM PL
    "<have>" "have" <SVO> <SVOC/A> V PRES -SG3 VFIN
    "<special>" "special" A ABS
    "<installation>" "installation" N NOM SG
    "<instructions>" "instruction" N NOM PL
    "<$.>"
    "<*see>" "see" <*> <as/SVOC/A> <SVO> <SV> <InfComp> V IMP VFIN @+FMAINV
    "<the>" "the" <Def> DET CENTRAL ART SG/PL @DN>
    "<documentation>" "documentation" <-Indef> N NOM SG
    "<that>" "that" <NonMod> <**CLB> <Rel> PRON SG/PL
    "<came>" "come" <SVC/A> <SV> <P/for> V PAST VFIN @+FMAINV
    "<with>" "with" PREP
    "<your>" "you" PRON PERS GEN SG2/PL2 @GN>
    "<programs>" "program" N NOM PL
    "<$.>"
    "<to>" "to" <*> INFMARK> @INFMARK>
    "<use>" "use" <as/SVOC/A> <SVO> <SV> V INF
    "<your>" "you" PRON PERS GEN SG2/PL2 @GN>
    "<programs>" "program" N NOM PL
    "<most>" "much" ADV SUP "much" <Quant> PRON SUP SG "many"
    "<Quant> PRON SUP PL
    "<effectively>" "effective" <DER:ive> <DER:ly> ADV
    "<$:>"
    "<*put>" "put" <*> <SVO> PCP2
    "<only>   "only" ADV
    "<one>" "one" NUM CARD
    "<copy>" "copy" N NOM SG
    "<of>" "of" PREP
    "<each>" "each" <Quant> DET CENTRAL SG @QN>
    "<prcgram>" "program" N NOM SG
    "<on>" "on" PREP
    "<your>" "you" PRON PERS GEN SG2/PL2 @GN>
    "<hard_disk>" "hard_disk" N NOM SG
    "<$>"
    "<*having>" "have" <*> <SVO> <SVOC/A> PCP1
    "<more=than>" "more=than" ADV
    "<one>" "one" NUM CARD
    "<copy>" "copy" N NOM SG
    "<can>" "can" V AUXMOD VFIN @+FAUXV
    "<cause>" "cause" <SVO> <SVOO> V INF
    "<errors>" "error" N NOM PL
    "<$.>"
    "<*whenever>" "whenever" <*> <**CLB> ADV WH @ADVL
    "<you>" "you" <NonMod> PRON PERS NOM SG2/PL2
    "<copy>" "copy" <SVO> <SV> <P/of> V PRES -SG3 VFIN @+FMAINV
    "<a>" "a" <Ind.ef> DET CENTRAL ART SG @DN>
    "<program>" "program" N NOM SG
    "<disk>" "disk" N NOM SG
    "<to>" "to" PREP
    "<your>" "you" PRON PERS GEN SG2/PL2 @GN>
    "<hard_disk>" "hard_disk" N NOM SG
    "<$,>"
    "<be>" "be" <SV> <SVCIN> <SVC/A> V SUBJUNCTIVE VFIN
    "<careful>" "careful" A ABS
    "<not>" "not" NEG-PART @NEG
    "to>"  "to" INFMARK> @INFMARK>
    "<copy>" "copy" <SVO> <SV> <P/of> V INF
    "<a>" "a" <Indef> DET CENTRAL ART SG @DN>
    "<*system>" "system" <*> N NOM SG
    "<*folder>" "folder" <*> <DER:er> N NOM SG
    "<$.>"
    "<*always>" "always" <*> ADV ADVL @ADVL
    "<check>" "check" <SVO> <SV> <P/for> <P/with> <P/on> V IMP VFIN @+FMAINV
     "check"
    N NOM SG
    "<to>" "to" INFMARK> @INFMARK>
    "<see>" "see" <as/SVOC/A> <SVO> <SV> <infComp> V INF
    "<what>" "what" <NonMod> <**CLB> PRON WH SG/PL
    "<you_>" "you_" <NonMod> PRON PERS NOM SG2/PL2 SUBJ @SUBJ
    "<_'ve>" "have" <SVO> V PRES -SG3 VFIN
    "<copied>" "copy" <SVO> <SV> <P/of> PCP2
    "<$,>"
    "<and>" "and" CC @CC
    "<drag>" "drag" <SVO> <SV> V IMP VFIN @+FMAINV "drag" <SVO> <SV> V INF
                          "drag" <SVO> <SV> V PRES -SG3 VFIN @+FMAINV
    "<any>" "any" <Quant> DET CENTRAL SG/PL @QN>
    "<extra>" "extra" A ABS
    "<*system>" "system" <*> N NOM SG
    "<*folders>" "folder" <*> <DER:er> N NOM PL
    "<to>" "to" PREP
    "<the>" "the" <Def> DET CENTRAL ART SG/PL @DN>
    "<*trash>" "trash" <*> <-Indef> N NOM SG
    "<$.>"
    "<*if>" "if" <*> <**CLB> CS @CS
    "<a>" "a" <Indef> DET CENTRAL ART SG @DN>
    "<program>" "program" N NOM SG
    "<malfunctions>" "malfunction" <SV> V PRES SG3 VFIN @+FMAINV
    "<consistently>" "consistent" <DER:ly> ADV
    "<$,>"
    "<try>" "try" <SVO> <SV> <P/for> V IMP VFIN @+FMAINV "try" N NOM SG
    "<installing>" "instal" <SVO> PCP1
    "<a>" "a" <Indef> DET CENTRAL ART SG @DN>
    "<fresh>" "fresh" A ABS
    "<copy>" "copy" N NOM SG
    "<$.>"
    "<if>" "if" <*> <**CLB> CS @CS
    "<that>" "that" PRON DEM SG
    "<does>" "do" <SVO> <SVOO> <SV> V PRES SG3 VFIN
    "<not>" "not" NEG-PART @NEG
    "<help>" "help" <SVO> <SV> <InfComp> <P/with> V INF
    "<$,>"
    "<find>" "find" <SVOO> <SVOC/N> <SVOC/A> <SVO> <SV> <P/for> V IMP VFIN
    "@+FMAINV "find" <SVOO> <SVOCIN> <SVOC/A> <SVO> <SV> <P/for> V INF
    "<out>" "out" ADV ADVL @ADVL
    "<from>" "from" PREP
    "<the>" "the" <Def> DET CENTRAL ART SG/PL @DN>
    "<software>" "software" <-Indef> N NOM SG
    "<manufacturer>" "manufacturer" <DER:er> N NOM SG
    "<whether>" "whether" <**CLB> CS @CS
    "<your>" "you" PRON PERS GEN SG2/PL2 @GN>
    "<version>" "version" N NOM SG
    "<of>" "of" PREP
    "<the>" "the" <Def> DET CENTRAL ART SG/PL @DN>
    "<program>" "program" N NOM SG
    "<is>" "be" <SV> <SVC/N> <SVC/A> V PRES SG3 VFIN
    "<compatible>" "compatible" <DER:ble> A ABS
    "<with>" "with" PREP
    "<the>" "the" <Def> DET CENTRAL ART SG/PL @DN>
    "<system>" "system" N NOM SG
    "<software>" "software" <-Indef> N NOM SG
    "<you_>" "you_" <NonMod> PRON PERS NOM SG2/PL2 SUBJ @SUBJ
    "<_'re>" "be" <SV> <SVCIN> <SVC/A> V PRES -SG1 ,3 VFIN
    "<using>" "use" <as/SVOC/A> <SVO> <SV> PCP1
    "<$.>"
    "<*put>" "put" <*> <SVO> PCP2 "put" <*> <SVO> V IMP VFIN @+FMAINV
    "<frequently>" "frequent" <DER:ly> ADV
    "<used>" "use" <as/SVOC/A> <SVO> <SV> PCP2
    "<programs>" "program" N NOM PL
    "<$(>"
    "<or>" "or" CC @CC
    "<aliases>" "alias" N NOM PL
    "<for>" "for" PREP
    "<those>" "that" DET CENTRAL DEM PL @DN>
    "<programs>" "program" N NOM PL
    "<$)>"
    "<in>" "in" PREP
    "<the>" "the" <Def> DET CENTRAL ART SG/PL @DN>
    "<*apple>" "apple" <*> N NOM SG
    "<menu>" "menu" N NOM SG
    "<so>" "so" <**CLB> CS @CS
    "<you>" "you" <NonMod> PRON PERS NOM SG2/PL2
    "<can>" "can" V AUXMOD VFIN @+FAUXV
    "<open>" "open" <SVO> <SV> V INF
    "<the>" "the" <Def> DET CENTRAL ART SG/PL @DN>
    "<programs>" "program" N NOM PL
    "<more>" "much" ADV CMP
    "<conveniently>" "convenient" <DER:ly> ADV
    "<$.>"
    "<*see>" "see" <*> <as/SVOC/A> <SVO> <SV> <InfComp> V IMP VFIN @+FMAINV
    "<*chapter>" "chapter" <*> N NOM SG
    "<5>" "5" NUM CARD
    "<$,>"
    "<$">"
    "<*adapting>" "adapt" <*> <SVO> <SV> <P/for> PCP1
    "<*your>" "you" <*> PRON PERS GEN SG2/PL2 @GN>
    "<*computer>" "computer" <*> <DER:er> N NOM SG
    "<to>" "to" PREP
    "<*your>" "you" <*> PRON PERS GEN SG2/PL2 @GN>
    "<*own>" "own" <*> A ABS
    "<*use>" "use" <*> N NOM SG
    "<$.>"
    "<$.>"
    "<*to>" "to" <*> INFMARK> @INFMARK>
    "<open>" "open" <SVO> <SV> V INF
    "<a>" "a" <Indef> DET CENTRAL ART SG @DN>
    "<program>" "program" N NOM SG
    "<automatically>" "automatical" <DER:ic> <DER:al> <DER:ly> ADV
    "<each>" "each" <Quant> DET CENTRAL SG @QN>
    "<time>" "time" N NOM SG
    "<you>" "you" <NonMod> PRON PERS NOM SG2/PL2
    "<start>" "start" <SV> <SVO> <P/on> V PRES -SG3 VFIN @+FMAINV
    "<up>" "up" ADV ADVL @ADVL
    "<$,>"
    "<you>" "you" <NonMod> PRON PERS NOM SG2/PL2
    "<can>" "can" V AUXMOD VFIN @+FAUXV
    "<put>" "put" <SVO> V INF
    "<the>" "the" <Def> DET CENTRAL ART SG/PL@DN>
    "<program>" "program" N NOM SG
    "<$(>"
    "<or>" "or" CC @CC
    "<its>" "it" PRON GEN SG3
    "<alias>" "alias" N NOM SG
    "<$)>"
    "<into>" "into" PREP
    "<the>" "the" <Def> DET CENTRAL ART SG/PL @DN>
    "<*startup>" "startup" <*> N NOM SG
    "<*items>" "item" <*> N NOM PL
    "<folder>" "folder" <DER:er> N NOM SG
    "<$.>"
    "<*see>" "see" <*> <as/SVOC/A> <SVO> <SV> <InfComp> V IMP VFIN @+FMAINV
    "<*chapter>" "chapter" <*> N NOM SG
    "<5>" "5" NUM CARD
    "<$,>"
    "<$">"
    "<*adapting>" "adapt" <*> <SVO> <SV> <P/for> PCP1
    "<*your>" "you" <*> PRON PERS GEN SG2/PL2 @GN>
    "<*computer>" "computer" <*> <DER:er> N NOM SG
    "<to>" "to" PREP
    "<*your>" "you" <*> PRON PERS GEN SG2/PL2 @GN>
    "<*own>" "own" <*> A ABS
    "<*use>" "use" <*> N NOM SG


"<$,>"


EXAMPLE D

At step 303, the linguistic analysis begins by morphologically and lexically analyzing each word of the text to determine its possible morphological and lexical features. Morphological analysis involves, among other things, mapping a word to its base form. Morphological analysis takes each word and, either through derivation (e.g., "re-initialize" maps to "initialize") or inflection (e.g., "initializing" maps to "initialize"), reduces it to its base form. For example, with reference to FIG. 4, a lookup in the lexicon of the word "initializing" 403 is performed by lexical analysis and fails. Morphological analysis reduces the word to its base form "initialize" by stripping the "ing" ending and adding "e", as indicated by annotation 402 ("</base "initialize">"). Using this base form of the word, lexical analysis then performs a successful lookup in the lexicon to determine the lexical features of the word. Lexical analysis determines the word is a verb. More specifically, lexical analysis determines the word's part-of-speech is a verb present participle, as indicated by part-of-speech annotation 405 ("</pos PCP1>"); it also determined the word participates in a subject-verb-object construction, as indicated by lexical features annotation 404 ("</Ifeats><SVO>>"). Furthermore, the morphological features, e.g., part-of-speech, are determined. Morphological features provide an inference of the linguistic properties of a word (e.g., tense, person, mood) based on how the word is used in a particular context. In the case of the word "initializing" 403, there are no morphological features as indicated by an empty morphological features annotation 406. However, morphological features may be inferred from the fact that lexical analysis has identified the word as a present participle (due to the ing ending) as is indicated by part-of-speech annotation 405. Finally, it should be noted that since lexical analysis determined "initialize" is a verb in a subject-verb-object construction, it will search for an object following the verb. In this way, syntactic labeling is possible, at least to the extent that an association of the words in a sentence with the syntactic functions they play within the particular context ("@Subject", "@Object") may be determined.

As another example, "use" 407 may be a noun, as indicated by part-of-speech annotation 408 ("</pos N>"), in which case it takes on the morphological features of NOM (nominal) and SG (singular, as opposed to plural in the case of "uses"), as indicated by morphological features annotation 409 ("</mfeats NOM SG>"). However, "use" 407 may also be a verb, as indicated by part-of-speech annotation 410 ("</pos V>"), in which case it has the morphological features of an imperative (IMP), i.e., "you use", and a finite verb (VFIN), as indicated by morphological features annotation 411 ("</mfeats IMP VFIN>"). "[U]se" may also be used as a normal present tense, non third person singular finite verb, i.e., "they use", as indicated by morphological features annotation 412 ("</mfeats PRES-SG3 VFIN>"). Thus, "use" 407 has three possible uses: it may be a noun or either of two verb readings.

As will be seen, it is important to identify the correct use of each word in order to perform knowledge mining. For example, knowledge mining attempts to identify technical terms to be included in the domain catalog by searching for particular syntactic patterns representative of technical terms, e.g., a noun phrase. If it sees that "disk", "repair" and program" in phrase 401 are nouns, then it recognizes the three nouns as constituting a noun phrase, and thus, potentially a technical term. However, "repair" and "program" can also be verbs, so lexical analysis must first determine that the words are, in this context, nouns. Notice, however, with reference to FIG. 4, that "repair" has a noun analysis, as indicated by part-of-speech annotation 413 ("</pos N>"). "[P]rogram" has a verb and a noun analysis, as indicated by part-of-speech annotations 414 and 415, respectively. What has happened here already is a certain amount of part-of-speech disambiguation analysis, i.e., the lexical and morphological analyses have together determined on the basis of local constraints and knowledge about how combinations of words are formed, the proper part-of-speech for certain terms. Using phrase 401 as an example, the analysis, at a certain level of abstraction, proceeds as follows: the first occurrence of "disk" is unambiguously a noun; "program", however, can be a noun or a verb, but because it precedes a preposition ("for") and follows a verb sequence ("use" "commercial"), it is very likely a noun; if "disk" is a noun and "program" is a noun, then "repair" is most likely a noun as well.

The linguistic analysis performed at step 303 does not generate a complete syntactic analysis of the sentence, but it is able to, in some instances, identify components of sentence structure, e.g., subject, verb, and object. In this way, extraction of semantically important terms and conceptually interesting data from the document is possible on the basis of their syntactic identity without requiring full syntactic analysis.

The lexical, morphological, part-of-speech disambiguation and syntactic label processing performed in the linguistic analysis stage are not concurrent processes, nor do they function sequentially with respect to each other. Lexical analysis and morphological analysis are performed essentially in parallel. Part-of-speech disambiguation is coupled closely to morphological analysis. Determination of syntactic functions and syntactic labeling follows closely behind part-of-speech disambiguation.

Part-of-speech disambiguation decides between two or more possible analyses of a word. For example, with reference to example C, as a result of lexical and morphological analysis, the word "up" in the sentence "this chapter describes how to set up the programs that you use when you work with your computer" is determined to be either a preposition (PREP) or an adverb (ADV ADVL). A syntactic label of adverbial (@ADVL) is affixed to the latter possibility. Given the context of the sentence in which the word appears, part-of-speech disambiguation analysis is able to determine "up" functions as an adverbial. Thus, as set forth in example D, the part-of-speech which "up" functions as is unambiguously that of an adverb.

Thus, while lexical and morphological analysis generally operate at the word level, part-of-speech disambiguation analysis is concerned with a phrase or sentence, and reduces, to the extent possible, each word of the phrase or sentence to a single, and therefore, unambiguous analysis. Part-of-speech disambiguation analysis looks at a string of words, and on the basis of certain knowledge about the construction of some of those words (namely, that knowledge acquired through lexical and morphological analysis), and the order in which they occur in the sentence, infers likely construction of other words in the sentence. For example, given the sequence of three words "disk repair program" found in phrase 401, once it has been determined that "disk" is a noun and that "program" is a noun, part-of-speech disambiguation analysis recognizes "repair" must also be a noun. Once part-of-speech disambiguation is completed, syntactic analysis determines and labels each word of text with an appropriate syntactic function.

In an embodiment of the present invention, implementation of the foregoing linguistic analyses and annotations, including lexical and morphological analysis, part-of-speech disambiguation analysis and syntactic labeling, may be accomplished by way of commercially available application software from, for example, LingSoft, Incorporated, of Helsinki, Finland.

As can be seen with reference to example D, each record of linguistically analyzed, annotated and disambiguated text, i.e., each word of the text and its associated linguistic annotations, comprise an arbitrary number of tokens. In some cases, a certain annotation may be missing altogether, e.g., morphological features may not be discerned or present for a particular word. Furthermore, any one field of the record may further comprise an arbitrary number of tokens, e.g., it is not uncommon for lexical analysis to generate an annotation comprising anywhere from zero to five tokens. Thus, at step 304, the linguistically analyzed, annotated and disambiguated text, as well as the annotations themselves, are explicitly labeled to indicate which tokens refer to which annotations, thereby facilitating subsequent mining. Referring to the example below, hereinafter referred to as example E, the annotations to which the tokens belong is more readily discernible than in the case of example D.

    Setting </base "set"> </lfeats <*> <SVOC/A> <SVO> <SVOO> <SV> <P/on>> </pos
     PCP1>
    </mfeats > <(syn @NPHR @-FMAINV>
    Up </base "up"> </lfeats <*>> </pos ADV> </mfeats ADVL> <lsyn @ADVL>
    Your </base "you"> </lfeats <*>> </pos PRON> </mfeats PERS GEN SG2/PL2>
     </syn @NPHR @OBJ>
    Programs </base "program"> </lfeats <*>> </pos N> </mfeats NOM PL> </syn
     @NPHR @OBJ>
    HEAD
    This </base "this"> </lfeats <*>> </pos DET> </mfeats CENTRAL DEM SG> </syn
     @DN>>
    chapter </base "chapter"> </lfeats > </pos N> </mfeats NOM SG> </syn @SUBJ>
    describes </base "describe"> </lfeats <as/SVOC/A> <SVO>> </pos V> </mfeats
     PRES SG3
    VFIN> </syn @+FMAINV>
    how </base "how"> </lfeats <**CLB>> </pos ADV> </mfeats WH> </syn @ADVL>
    to </base "to"> </lfeats > </pos INFMARK>> </mfeats > </syn @INFMARK>>
    set </base "set"> </lfeats <SVOC/A> <SVO> <SVOO> <SV> <P/on>> </pos V>
     </mfeats INF>
    </syn @-FMAINV>
    up </base "up"> </lfeats > </pos ADV> </mfeats ADVL> </syn @ADVL>
    the </base "the"> </lfeats <Def>> </pos DET> </mfeats CENTRAL ART SG/PL>
     </syn @DN>>
    programs </base "program"> </lfeats > </pos N> </mfeats NOM PL> </syn @OBJ
     @I-OBJ>
    that </base "that"> </lfeats <**CLB>> </pos CS> </mfeats > </syn @CS>
     </base "that"> </lfeats
    <NonMod> <**CLB> <Rel>> </pos PRON> </mfeats SG/PL> </syn @SUBJ @OBJ @I-OBJ
    @PCOMPL-O>
    you </base "you"> </lfeats <NonMod> </pos PRON> </mfeats PERS NON SG2/PL2>
     </syn
    @SUBJ>
    use </base "use"> </lfeats > </pos N> </mfeats NOM SG> </syn @OBJ> </base
     "use"> </lfeats
    <as/SVOC/A> <SVO> <SV>> </pos V> </mfeats PRES -SG3 VFIN> </syn @+FMAINV>
    when </base "when"> </lfeats <**CLB>> </pos ADV> </mfeats WH> </syn @ADVL>
    you </base "you"> </lfeats <NonMod>> </pos PRON> </mfeats PERS NOM SG2/PL2>
     </syn
    @SUBJ>
    work </base "work"> </lfeats <SV> <SVO> <P/in> <P/on>> </pos V> </mfeats
     PRES -SG3 VFIN>
    </syn @+FMAINV>
    with </base "with"> </lfeats > </pos PREP> </mfeats > </syn @ADVL>
    your </base "you"> </lfeats > </pos PRON> </mfeats PERS GEN SG2/PL2> </syn
     @GN>>
    computer </base "computer"> </lfeats <DER:er>> </pos N> </mfeats NOM SG>
     </syn @<P>
    .
    Installing </base "instal"> </lfeats <*> <SVO>> </pos PCP1> </mfeats >
     </syn @NPHR @-
    FMAINV>
    your </base "you"> </lfeats > </pos PRON> </mfeats PERS GEN SG2/P#> </syn
     @GN>>
    application </base "application"> </lfeats > </pos N> </mfeats NOM SG>
     </syn @NPHR @NN>>
    programs </base "program"> </lfeats > </pos N> </mfeats NOM PL> </syn @NPHR
     @OBJ>
    HEAD
    Most </base "much"> </lfeats <*> <Quant>> </pos DET> </mfeats POST SUP SG>
     </syn
    @QN>> </base "many"> </lfeats <*> <Quant>> </pos DET> </mfeats POST SUP PL>
     </syn
    @QN>>
    application </base "application"> </lfeats > </pos N> </mfeats NOM SG>
     </syn @NN>>
    programs </base "program"> </lfeats> </pos N> </mfeats NOM PL> </syn @SUBJ>
    come </base "come"> <llfeats <SVC/A> <SV> <P/for>> </pos V> </mfeats PRES
     -SG3 VFIN>
    </syn @+FMAINV>
    on </base "on"> </lfeats > </pos PREP> </mfeats > </syn @ADVL> </base "on">
     </lfeats > </pos
    ADV> </mfeats ADVL> </syn @ADVL>
    floppy_disks </base "floppy_disk"> </lfeats > </pos N> </mfeats NOM PL>
     </syn @<P>
    ,
    and </base "and"> </lfeats > </pos CC> </mfeats> </syn @CC>
    you </base "you"> </lfeats <NonMod>> </pos PRON> </mfeats PERS NOM SG2/PL2>
     </syn
    @SUBJ>
    install </base "install"> </lfeats <SVO>> </pos V> </mfeats PRES -SG3 VFIN>
     </syn
    @+FMAINV>
    them </base "they"> </lfeats <NonMod>> </pos PRON> </mfeats PERS ACC PL3>
     </syn
    @OBJ>
    by </base "by"> </lfeats> </pos PREP> </mfeats> </syn @ADVL>
    copying </base "copy"> </lfeats <SVO> <SV> <P/of>> </pos PCP1> </mfeats >
     </syn @<P-
    FMAINV>
    them </base "they"> </lfeats <NonMod>> </pos PRON> </mfeats PERS ACC PL3>
     </syn
    @OBJ>
    from </base "from"> </lfeats > </pos PREP> </mfeats > </syn @ADVL>
    the </base "the"> </lfeats <Def>> </pos DET> </mfeats CENTRAL ART SG/PL>
     </syn @DN>>
    floppy_disks </base "floppy_disk"> </lfeats > </pos N> </mfeats NOM PL>
     </syn @<P>
    to </base "to"> </lfeats > </pos PREP> </mfeats> </syn @<NOM @ADVL>
    your </base "you"> </lfeats > </pos PRON> </mfeats PERS GEN SG2/PL2> </syn
     @GN>>
    hard_disk </base "hard_disk"> </lfeats > </pos N> </mfeats NOM SG> </syn
     @<P>
    .
    Some </base "some"> </lfeats <*> <Quant>> </pos DET> </mfeats CENTRAL
     SG/PL> </syn
    @QN>>
    programs </base "program"> </lfeats> </pos N> </mfeats NOM PL> </syn @SUBJ>
    have </base "have"> </lfeats <SVO> <SVOC/A>> </pos V> </mfeats PRES -SG3
     VFIN> </syn
    @+FMAINV>
    special </base "special"> </lfeats > </pos A> </mfeats ABS> </syn @AN>>
    installation </base "installation"> </lfeats > </pos N> </mfeats NOM SG>
     </syn @OBJ @NN>>
    instructions </base "instruction"> </lfeats > </pos N> </mfeats NOM PL>
     </syn @OBJ>
    .
    See </base "see"> </lfeats <*> <as/SVOC/A> <SVO> <SV> <InfComp>> </pos V>
     </mfeats IMP
    VFIN> </syn @+FMAINV>
    the </base "the"> </lfeats <Def>> </pos DET> </mfeats CENTRAL ART SG/PL>
     </syn @DN>>
    documentation </base "documentation"> </lfeats <-Indef>> </pos N> </mfeats
     NOM SG> </syn
    @OBJ>
    that </base "that"> <"/lfeats <NonMod> <**CLB> <Rel>> </pos PRON> </mfeats
     SG/PL> </syn
    @SUBJ>
    came </base "come"> </lfeats <SVC/A> <SV> <P/for>> </pos V> </mfeats PAST
     VFIN> </syn
    @+FMAINV>
    with </base "with"> </lfeats > </pos PREP> </mfeats> </syn @ADVL>
    your </base "you"> </lfeats > </pos PRON> </mfeats PERS GEN SG2/PL2> </syn
     @GN>>
    programs </base "program"> </lfeats > </pos N> </mfeats NOM PL> </syn @<P>
    .
    To </base "to"> </lfeats <*>> </pos INFMARK>> </mfeats > </syn @INFMARK>>
    use </base "use"> </lfeats <as/SVOC/A> <SVO> <SV¯ </pos V> </mfeats INF>
     </syn @-
    FMAINV>
    your </base "you"> </lfeats > </pos PRON> </mfeats PERS GEN SG2/PL2> </syn
     @GN>>
    programs </base "program"> </lfeats > </pos N> </mfeats NOM PL> </syn @OBJ>
    most </base "much"> </lfeats > </pos ADV> </mfeats SUP> </syn @ADVL @AD-A>>
     </base
    "much"> </lfeats <Quant>> </pos PRON> </mfeats SUP SG> </syn @OBJ
     @PCOMPL-O>
    </base "many"> </lfeats <Quant>> </pos PRON> </mfeats SUP PL> </Syn @OBJ
     @PCOMPL-
    O>
    effectively </base "effective"> </lfeats <DER:ive> <DER:ly>> </pos ADV>
     </mfeats > </syn
    @ADVL>
    :
    Put </base "put"> </lfeats <*> <SVO>> </pos PCP2> </mfeats > </syn @NPHR
     @PCOMPL-O>
    only </base "only"> </lfeats > </pos ADV> </mfeats > </syn @AD-A>>
    one </base "one"> </lfeats > </pos NUM> </mfeats CARD> </syn @QN>>
    copy </base "copy"> </lfeats > </pos N> </mfeats NOM SG> </syn @NPHR @OBJ>
    of </base "of"> </lfeats> </pos PREP> </mfeats > </syn @<NOM-OF>
    each </base "each"> </lfeats <Quant>> </pos DET> </mfeats CENTRAL SG> </syn
     @QN>>
    program </base "program"> </lfeats > </pos N> </mfeats NOM SG> </syn @<P>
    on </base "on"> </lfeats > </pos PREP> </mfeats> </syn @<NOM @ADVL>
    your </base "you"> </lfeats > </pos PRON> </mfeats PERS GEN SG2/PL2> </syn
     @GN>>
    hard_disk </base "hard_disk"> </lfeats > </pos N> </mfeats NOM SG> </syn
     @<P>
    .
    Having </base "have"> </lfeats <*> <SVO> <SVOC/A>> </pos PCP1> </mfeats >
     </syn @-
    FMAINV>
    more=than </base "more=than"> </lfeats > </pos ADV> </mfeats > </syn @ADVL
     @AD-A>>
    one </base "one"> </lfeats > </pos NUM> </mfeats CARD> </syn @QN>>
    copy </base "copy"> </lfeats > </pos N> </mfeats NOM SG> </syn @SUBJ>
    can </base "can"> </lfeats > </pos V> </mfeats AUXMOD VFIN> </syn @+FAUXV>
    cause </base "cause"> </lfeats <SVO> <SVOO>> </pos V> </mfeats INF> </syn
     @-FMAINV>
    errors </base "error"> </lfeats> </pos N> </mfeats NOM PL> </syn @OBJ>
    .
    Whenever </base "whenever"> </lfeats <*> <**CLB>> </pos ADV> </mfeats WH>
     <lSyn @ADVL>
    you </base "you"> </lfeats <NonMod>> </pos PRON> </mfeats PERS NOM SG2/PL2>
     </syn
    @SUBJ>
    copy </base "copy"> </lfeats <SVO> <SV> <P/of>> </pos V> </mfeats PRES -SG3
     VFIN> </syn
    @+FMAINV>
    a </base "a"> </lfeats <Indef>> </pos DET> </mfeats CENTRAL ART SG> </syn
     @DN>>
    program </base "program"> </lfeats > </pos N> </mfeats NOM SG> </syn @NN>>
    disk </base "disk"> </lfeats> </pos N> </mfeats NOM SG> </syn @OBJ>
    to </base "to"> </lfeats > </pos PREP> </mfeats > </syn @<NOM @ADVL>
    your </base "you"> </lfeats > </pos PRON> </mfeats PERS GEN SG2/PL2> </syn
     @GN>>
    hard_disk </base "hard_disk"> </lfeats> </pos N> </mfeats NOM SG> </syn
     @<P>
    ,