Methods and system for using web browser to search large collections of documents6055538Abstract A system for rapidly and easily searching large collections of documents using standard web browser programs as the user interface. The present invention parses a collection of text documents to identify symbols therein and builds a database file which identifies the file and line locations of each symbol identified. The database file is constructed to permit rapid searching for symbols to permit interactive use of the present invention as a search tool. A database client process interacts with the web browser via standard CGI techniques to convert browser commands and queries into appropriate server process requests. A server process receives such requests and manipulates the database files in response to the requests. Query results returned to the client process are then reformatted by the client process to return a document with hypertext links in place of search keys located in the database (e.g., an HTML page). The system of the present invention thereby provides for rapid searching of large collections of text documents which is not coupled to a specific toolset used to create any one of the documents and which uses a simple and well-known user interface, namely: web browsers. Claims What is claimed is: Description BACKGROUND OF THE INVENTION
______________________________________
void
WriteCompInt(unsigned int v) // compression encoding function
// We cannot deal with numbers so large that their upper bits
// become tags for numbers presumed smaller.
//
if ((v & 0xe0000000) != 0) {
printf("Numbers are too large to create a database..backslash.n"
"Try building a smaller database..backslash.n");
exit(1);
}
if ((v & .about.0x7f)==0) {
// Number can fit in one byte with the top bits flagged as 1
putc(v .vertline. 0x80, OutFile);
++CurOffset/
}
else if ((v & .about.0x3fff)==0) {
// Number can fit in two bytes with the top bits flagged as 01
putc((v>>8) .vertline. 0x40, OutFile);
putc(v & 0xff, OutFile);
CurOffset += 2;
}
else if ((v & .about.0x1fffff)==0) {
// Number can fit in three bytes with the top bits flagged as 001
putc((v>>16) .vertline. 0x20, OutFile);
putc((v>>8) & 0xff, OutFile);
putc( v & 0xff, OutFile);
CurOffset += 3;
}
else {
// Number can fit in four bytes with the top bits flagged as 000
putc((v>>24) & 0x1f, OutFile);
putc((v>>16) & 0xff, OutFile);
putc((v>>8) & 0xff, OutFile);
putc( v & 0xff, OutFile);
CurOffset += 4;
}
return;
}
int
TScanDB::ReadCompInt( // compression decoding function
unsigned char **FilePtr
)
//
// Read and decompress an integer from the current file pointer
location
// in the memory mapped data base.
//
{
if (**FilePtr & 0x80) {
// byte starts with 1, numbers up to 2 7-1 = 127
*FilePtr += 1;
return ((*FilePtr)[-1] & 0x7f);
}
if (**FilePtr & 0x40) {
// byte starts with 01, numbers up to 2 14 - 1 = 16,383
*FilePtr += 2;
return (((*FilePtr)[-2] & 0x3f) << 8) .vertline. (*FilePtr)[-1];
}
if (**FilePtr & 0x20) {
// byte starts with 001, numbers up to 2 21 - 1 = 2,097,151
*FilePtr += 3;
return (((*FilePtr)[-3] & 0x1f) << 16) .vertline.
((*FilePtr)[-2] << 8) .vertline. (*FilePtr)[-1];
}
// byte starts with 000, numbers up to 2 29 - 1 = 536,870,912
*FilePtr += 4;
return ((*FilePtr)[-4] << 24) .vertline. ((*FilePtr)[-3] << 16)
.vertline.
((*FilePtr)[-2] << 8) .vertline. ((*FilePtr)[-1]);
}
______________________________________
The above integer compression encoding technique provides compression ratios of the database file anywhere between 1X and approximately 4X. Still further integer compression may be achieved with a second compression techniques applied in conjunction with the above. As discussed above with respect to FIG. 3, sequences of integer offset values are concatenated in the compressed, preferred physical embodiment of the database file of the present invention. As noted above, the integer encoding techniques above provide some compression to reduce leading zero bits in integer numbers. The second compression techniques includes reducing offset values which follow a first offset in a sequence of concatenated offset values to be relative offsets. The relative offsets provide a delta integer value from the immediate predecessor offset value. The first offset value in such a sequence of offset values is the full integer value. The next offset value is relative to the first, the third is relative to the second, fourth to the third etc. For example the sequence of values: 100,200, 500, 1000 are encoded as the sequence 100, 100, 300, 500 As an alternative, each subsequent integer offset value in a sequence may be relative to the first such value. For example, the same sequence: 100, 200, 500, 1000 may be encoded as: 100, 100, 400, 900 Clearly the former approach provides superior compression and is therefore preferred, The latter method may require less computation in that the sequence of compressed number need not be completely parsed. Only the leading bits need be parsed to determine the number of bytes (as described above). The remaining byte which encode the integer value need not be accessed to determine the value of a latter value in the sequence. Web Browser Operation FIG. 9 is a flowchart describing the operation of a standard web browser as modified to work in conjunction with the methods processes and structures of the present invention. Those skilled in the art will recognize that FIG. 9 does not describe operation of web browsers in general as presently known in the art. Rather, FIG. 9 describes only the specific features a web browser as adapted to utilize the present invention. In particular, element 900 awaits and accepts user input to specify search parameters to be applied to collection of text documents. Not shown are the processing steps which serve to identify either the collection of documents by path names nor the steps to provide the pre-built database file path. Such processing is well-known to those skilled in the art. Element 902 is next operable to transmit the search parameters accepted by processing of element 900 to the database client process for further processing. Element 904 than awaits return of the search results through processing initiating completed via the database client process. Lastly element 906 displays the HTML formatted search results (or other format having hyperlinks therein) as returned from the database client process. Processing continues by looping back to element 900 to await a further search parameters. Those skilled in the art will recognize that standard web browser processing techniques may invoke further processing by clicking a hyperlink in the search results displayed by operation of element 906 and as returned by operation of the database client process. Further, it will be recognized that linear search techniques within standard web browsers may be invoked to further refine the search of the information returned and displayed on the web browsers computer display screen. Database Client Process Operation FIG. 10 is a flowchart describing methods operable within the database client process of the present invention. As noted above, a web browser invokes the services of the present invention via the database client process using the CGI communication gateway standards. The database client process, in turn, communicates with the database server process to effectuate the query operations requested by the web browser. Those skilled in art will recognize that the features of the present invention may be implemented with or without such a client/server architecture. As noted above, a web browser (via communications with a web server process) may invoke a database manipulation program which directly accesses the database file rather than doing so through a database server process. The client/server model of the present invention provides benefits in coordinating multiple shared simultaneous access to the database file. In addition, the database client/server model preferred in the current invention permits the web browser, web server, and database server processes to be distributed over independent computing nodes. In other words, the client/server model preferred in the present invention is more easily integrated into a distributed computing environment wherein processes communicate in a standardized manner regardless of the physical computing node on which they are operating. Lastly, the database server process of the present invention, as discussed in additional detail below, supports a query command stream and returns its results essentially in ASCII text. This allows the database server process to be developed, tested, and debugged independent of the database client process. Element 1000 is first operable to receive search parameters of a query request from the web browser. As noted above the web browser constructs a search request by accepting search parameters from an interactive user. Those search parameters are transmitted directly to the database client process utilizing the CGI interfacing techniques. Element 1002 is next operable to transform the search parameters received in the search request into appropriately formatted search commands supported by the database server process of the present invention. Details of the search commands so supported are provided herein below. Element 1004 is next operable to transmit the transformed (re-formatted) search command to the database server process of the present invention. As noted above, the database client process and database server process of the present invention preferably communicate using well-known inter-process communication techniques. Such communication techniques simplify coordination of shared access to the database file. Element 1006 is next operable to await receipt of results of processing the transformed (re-formatted) search command previously transmitted to the database server process. As above, the search results are returned from the database server process to the database client process utilizing well-known network inter-process communication techniques. Element 1008 is then operable to transform the search command results returned from the database server process into an appropriate page including hyperlinks indicative of the search results. The search results as returned from the database server process are formatted in an internal tokenized form as presented herein below. Tokenized symbols are transformed by element 1008 into hyperlinks for generating further query commands potentially of interest to the user. Some queries return results which have a pre-defined format as discussed below wherein tokenizing is not performed by the server process. Rather, elements (symbols or keywords) which may of interest for further search processing are clearly defined by the format of the query response. Element 1010 is next operable to transmit the reformatted search command results back to the web browser which initiated the search request. As noted above the web browser and database client process communicate utilizing well-known CGI techniques. Processing than continues by looping back to element 1000 to await receipt of further search requests and associated search parameters from an associated web browser. Database Server Process Operation FIG. 11 is a flowchart describing the processing performed by database server process 104 of the present invention. Element 1100 is first operable to receive a search command from the database client process. As noted above search commands received from the database client process are formatted in an internal format supported and defined by the database server process as discussed herein below. Element 1102 is next operable to spawn a thread for processing of the received search command. Well-known multi-threaded programming techniques are applied to permit multiple search commands to be processed on behalf of multiple database client processes. The multi-threaded programming technique also permits the server process to more easily "cleanup" on behalf of a failed processing thread. For example, failure of a single thread, processing a particular search request, does not impact concurrent processing by other threads of other search requests on behalf of other database client processes. The multi-threaded aspect of the database server processing is depicted in FIG. 11 by the multiple arrows exiting from processing of element 1102. The newly spawned thread continues processing with element 1104-1110 through to completion. The main line database server process continues processing by looping back to element 1100 to await receipt of another search request from another database client process. The newly spawned thread of the database server process continues with element 1104 to process the search request received from the database client process. Element 1106 is next operable to determine if the query was for the contents of a file (a file contents query as generated by the browser program). If not, processing continue with element 1110 to transmit the search results to the database client process for further processing on behalf of the web browser program. If the query is a file content query, processing continues with element 1108 to tokenize the file content query results. As noted above, each symbol in a file content query is tokenized by the database server process. In particular, each symbol in the file content text stream is delimited by the TOKEN.sub.-- START and TOKEN.sub.-- END characters as discussed below. The tokenized results are then transmitted to the database client by operation of element 1110 for further processing on behalf of he web browser program. Those skilled in the art will readily recognize that elements 1104-1110 (a single thread of the database server process) may operate concurrently to provide streaming of the resultant data back to the database client process requesting the search. In other words, as element 1104 continues to process the search command thereby generating search results, elements 1106-1110 may concurrently operate to transmit those results already generated back to database client process. In this manner, the web browser, the database client process, and database server process may all overlap their processing to provide the desired rapid response to the interactive user of the web browser. Early results of the query process are viewable at the user's computer display even as later results are yet to be generated. The search results are said to be streamed from the database server process through to the web browser for display on the user's computer screen. Element 1104 is described above as performing the specified query through use of the database file. Such an operation is well understood by those skilled in the art in view of the logical description of the database file presented above with respect to FIG. 2. With respect to the preferred physical embodiment of the database file as discussed above in conjunction with FIG. 3, the following pseudo-code listing is helpful in understanding the detailed operation of element 1104.
__________________________________________________________________________
// MAKE.sub.-- PTR turns an internal file offset into a real C-language
pointer
// for direct dereference by the C-runtime environment. If the offset
was
// zero, MAKE.sub.-- PTR returns NULL, otherwise it adds the offset to
the base
// of the memory mapped database in memory.
// ReadCompInt( ) reads a compressed integer from the database and
converts
it
// to a normal integer. ReadCompInt( ) also advances FilePtr to the next
byte
// after the compressed integer.
Lookup(Key, MatchCase, OutsideCurlies)
HashIndex = HashKey(Key)
FilePtr = MAKE.sub.-- PTR(HashTable[HashIndex])
if FilePtr is NULL, this hash chain is empty, so no matches, so exit
// This outermost loop is executed once for each match, the inner
// loop loops across non-matches within the chain between matches
// Could still be zero matches if token doesn't exist.
// Could be 1 match if token exists and we're matching case.
// Could be many matches if matching case.
loop, to find the next match
loop, to skip over non-matches
KeyOffset = ReadCompInt(&FilePtr)
if KeyOffset is NULL, then no match, exit this loop
KeyOffset = MAKE.sub.-- PTR(KeyOffset)
DataOffset = ReadCompInt(&FilePtr)
if MatchCase and case sensitive match between *KeyOffset and Key, or
case insensitive match between *KeyOffset and Key
then match found, exit this loop
end loop to skip over non-matches
if no match found, then exit
NextKeyOffset = FilePtr
FilePtr = MAKE.sub.-- PTR(DataOffset)
loop, to traverse the list of files
FileNum = ReadCompInt(&FilePtr)
if FileNum is zero, done, exit this loop
LinesOffset = ReadCompInt(&FilePtr)
NextFileOffset = FilePtr
FilePtr = MAKE.sub.-- PTR(LinesOffset)
loop, to traverse the list of lines with the token
LineNum = ReadCompInt(&FilePtr)
if LineNum is not zero, and OutsideCurlies is set, and the
LineNum is tagged as OutsideCurlies
then MATCH FOUND: process match
end loop when LineNum is zero
FilePtr = NextFileOffset
end loop to traverse list of files
FilePtr = NextKeyOffset
end loop to find next match
__________________________________________________________________________
Web Browser/Database Client Protocol The present invention provides for various search requests (also referred to herein as queries) between the web browser and the database client process. As noted above, the web browser and database client process preferably communicate using the CGI standards. A query is communicated from the web browser to the database client in response to the user entering input search symbols or keywords and clicking a button to initiate the search processing. The type of search and various parameters relating to the selected search type are then transmitted to the database client process. The database client process and database server process then communicate as discussed herein to process the query and to return results thereof to the web browser in the form of an HTML page (or other format having hyperlinks). The present invention includes the following four query types. File Contents The file contents query is generated by the web browser to return the contents of one file from the collection of text documents in the database. The file contents are retrieved and display the by the web browser. The results retrieved and returned by the database client and server processing include hyperlinks for each symbol that is indexed in the database file for the returned text document. The hypertext links will invoke a query corresponding to the symbol for a "symbol in files" query as described herein below. Substring in Paths A substring in paths query is generated by the web browser to request a list of filenames for text documents in the collection of text documents which match a specified string. Each filename returned by the query results is displayed by the web browser as a hyperlink which specifies a file contents query for the corresponding file (as described above). Substring in Symbols A substring in symbols query command is generated by the web browser to request the list of symbol names that contain a specified substring. As returned from the database client process, each symbol named matching the substring is a hypertext links to a symbol in files query as described herein below. Symbol in Files A symbol in files query is generated by the web browser to request a list of lines that contain a specified symbol. Each line returned by the database client process includes the filename, line number, and line of text including the requested symbol. The filename as returned from the database client is a hyperlink specifying a file contents query for the corresponding file as described above. Those skilled in the art will readily recognize that other query commands may be included within the scope of the present invention. Furthermore, those skilled in the art will recognize that subsets of the above described queries as well as other substituted queries relating to symbols within collections of text documents are within the scope of present invention. The above identified four query commands are intended as examples of a useful set of queries to permit rapid user searching for symbols or keywords in collections of text documents. As noted above with respect to element 1008, certain search results are preferably returned from the server process in a tokenized format. In particular, in the preferred embodiment of the present invention, the results of a file contents query are returned in tokenized form such that all symbols in the database file are tokenized in the file contents returned from the database server to the database client. Other exemplary queries listed above generate search results from the server in a predefined format. Element 1008 above accepts all such formats for returned search results, tokenized and non-tokenized pre-defined formats), and converts them to pages having hyperlinks for items therein having likely interest for the user's next search request. The hyperlinks define a query for more information regarding the corresponding symbol or keyword. As noted herein below, several of the above identified queries (in particular the Symbol in files query command) permit options to be specified to control the searching performed by the database server process to satisfy the query command. Certain such parameters are meaningful for particular types of text documents as processed by optimized parsers (as discussed above with respect to database build processing techniques). For example, in searching for symbols in C language source programs, it is often useful to search for symbol with or without case sensitive matching. Further it may be useful to search for a symbol outside of curly braces (i.e., to identify global symbol declarations as opposed to symbol references). Other such search parameters may include stripping leading underscore characters from symbols when matching for a requested substring. Another parameter may specify that the returned results should not be tokenized (as described herein below) and hence returned faster. For example, if the user requests the contents of a large text document or queries for a symbol likely to be found frequently in the collection of text documents, the user may realize in advance that the links are not required for subsequent searches. Specifying the "not tokenized" parameter allows the query results to be returned more quickly. The "not tokenized" parameter also permits the returned results to be usable for other than web browsing. For example, the returned information may be saved in a file. Those skilled in the art will recognize a wide variety of such options that may be supported by the database server process and hence supported in the interface between the web browser and the database client process. The above list is intended merely as exemplary of the types of search parameters which may be specified in addition to the symbol or keyword search terms specified by the web browser user. Present web browsers typically allow a displayed page (e.g., an HTML page) to specify that it is or is not cachable. The database client process of the present invention therefore sets appropriate attributes on records returned to the web browser to ensure that the records are cached locally by the web browser. Subsequent requests for other lines in a file may be satisfied locally by the web browser recognizing the information as resident in its local cache. In particular, for example, if the user issues a query identical to an earlier query, the web browser can recognize the earlier search results in its cache and speed the presentation of the results to the user. Or for example, if a user issues a query to display an entire text file, the file is presented to the user at a starting line number indicated by the user. If a subsequent query requests the same file, but perhaps a different starting line number, the web browser will recognize that the entire file is already cached and speed the display of the requested portion of the file to the user. Database Client/Database Server Protocol As noted herein above, the database client process re-formats the search command and parameters supplied to it by the web browser into an internal request and response format defined and supported by the database server process. The database server process defines a stateless version of the commands supported as well as a state based version of the supported search commands. In the state based version, the server process retains some state information regarding the processing requested by the database client process. For example, a first command describes the database file path name to be used for processing of queries. A second command specifies, for example, a particular query to be performed. When processing the second command, the database server process uses saved state information from prior commands to identify, for example, the path name of the database file to be used in satisfying the query. In a state based model such as this, a connection with a particular client process requires saved state information regarding that connection. In other words, the server process must maintain state information for each presently active client connection. Further, an active client connection must be "closed" to recover the resources in the server dedicated to that open connection. In a stateless model, the best presently known mode for practicing the present invention, the database server process retains no such state information. Rather, each command (query request) received provides all information necessary to process the command (e.g., the database path name plus all values and parameters needed to process the query request). The connection with a client process exists only for the duration of processing that request. No state information is retained between such requests. The state based mode of practicing the present invention is however useful, as noted above, for development, testing, and debug of the database server process independent of the database client process (i.e., using a simple ASCII text command interface wherein saved state information need not be re-entered for testing of each command). An exemplary preferred embodiment of the protocol used in communicating between a database client process and a database server process is described below. First, commands designed around a state based model of the interface are presented followed by the equivalent commands for the stateless model. In all cases below, a request format is shown with the label "REQ" and the associated response is labeled as "REPLY." Responses generated by many of the commands listed below are "tokenized" in that the all symbols or keywords in the search results are returned as tokens (delimited by TOKEN.sub.-- START and TOKEN.sub.-- END delimiter bytes). Specifically, a file contents query issued by the user generates a QFILE server query command (as described below). The server process returns the entire file contents as an ASCII text stream wherein each symbol in the ASCII text stream is delimited as a token. The database client process, in turn, translates each token so delimited in the search results into a hyperlink for performing a further query on that symbol. Sill more specifically, the tokens in the search results are preferably transformed by the database client process into hyperlinks for locating associated information rapidly.
__________________________________________________________________________
EOF.sub.-- MARKER = 0
TOKEN.sub.-- START = 1
TOKEN.sub.-- END = 1
< >
==
replaced by described information/parameters
[ ]
==
optional information in protocol
{ }
==
annotation, not part of client/server protocol
.vertline.
==
alternate selections
REQ: DBPATH <database path>
RESP:
1 .vertline. 0[: <error message>]
This command and reply essentially establishes a connection between a
client
process and a server process and specifies the path name for the database
file
to be used for queries processed in this open connection. The reply
simply
indicates success of failure. In the case of failure an error message may
be
appended.
REQ: QFILES <keyword>
RESP:F File1
: F File2
: F File3
: <repeat F . . . >
REQ: QPATHS <keyword> {same as QFILES}
RESP:F File1
: F File2
: F File3
: <repeat F . . . >
:
These commands (essentially synonyms) return a list of file names in the
presently open database file whose path names include the specified
keyword
substring.
REQ: QLINES <keyword > <case.sub.-- sensitive: 0 or 1> <outside.sub.--
of.sub.-- { }: 0 or 1>
RESP:C <common.sub.-- path>
: R <relative.sub.-- path> {may be null}
: L <line #> <line.sub.-- contents>
{repeat R and L records for all lines in all files}
:
This command returns a list of files and lines in each file where the
specified
keyword is located in the collection of text documents associated with
the
presently open database file. The case.sub.-- sensitive parameter
specifies whether
the case of the keyword is to be considered in performing the search.
The
outside.sub.-- of.sub.-- { } parameter specifies that the search is to
locate only matching
keywords that are outside the scope of all C programming language blocks
(delimited by pairs of curly braces).
REQ: QSYMS <keyword> <case.sub.-- sensitive: 0 or 1>
RESP:S <symname>
{repeat S records for all matching symbols}
:
This command returns a list of all symbols found in the database which
include
(as a substring) the supplied keyword. As above, the case.sub.-- sensitive
parameter
may be specified to indicate the relevance of the case of the keyword
parameter
in the search.
REQ: QFILE <full.sub.-- pathname: common.sub.-- path + relative.sub.--
path>
RESP:
1 .vertline. 0[: <error message>]
: B <number of bytes>
: <bytes of data>
This command returns the entire contents of a file specified by its full
file name
as a parameter. First the number of bytes to be returned is returned
(i.e., the
length of the tokenized byte stream to follow) followed by the tokenized
byte
stream as described below.
<bytes of data> ==
<data><TOKEN.sub.-- START><token.sub.-- data><TOKEN.sub.-- END><data> {
repeat }
The tokenized byte stream format described above is the entire content of
a
requested file where each symbol in the text stream which was parsed by
the
database builder process and hence entered into the database file is
identified
as a token. As noted elsewhere, the database client process transforms
this
tokenized system into a corresponding page with hyperlinks for display.
Each
token is transformed into parameters appropriate for a QLINES command.
REQ: Q VERSION
RESP:V <version string>
This command merely returns a version number for the database server
process. This allows the database client to adapt to upgrades in the
features of
the server process.
REQ: QUIT
RESP:{none - socket disconnected}
This command terminates an open connection to a client (in the state
model).
The following commands represent extensions to the client/server protocol
of the
present invention which provide for stateless operation as is preferred.
REQ: Q PATHS <database path length> <database path> <keyword length>
<keyword>
RESP:1 .vertline. 0[: <error message>]
: {see QPATHS for remainder}
This command is identical in operation to the combination of a DBPATH
command and a QPATHS command as described above. This command
combines the parameters and return information to provide a stateless
version
of the command. The connection with the client is closed following
completion
of the command. The return data is as described above.
REQ: Q LINES <database path length> <database path> <keyword length>
<keyword>
<case.sub.-- sensitive: 0 or 1>
<outside.sub.-- of.sub.-- { }: 0 or 1>
<match.sub.-- leading.sub.-- underscore: 0 or 1>
RESP:1 .vertline. 0[: <error message>]
: B <max line number> <number of lines to follow>
: {see QLINES for remainder}
This command is essentially identical in operation to the combination of
a
DBPATH command and a QLINES command as described above. This
command combines the parameters and return information to provide a
stateless
version of the command. The connection with the client is closed
following
completion of the command. The "B" return value include the number of
lines
to be returned so that the user may as early as possible determine
whether the
results are worth viewing. The return data is as described above.
REQ: Q SYMS <database path length> <database path> <keyword length>
<keyword>
<case.sub.-- sensitive: 0 or 1>
RESP:1 .vertline. 0[: <error message>]
: {see QSYMS for remainder}
This command is essentially identical in operation to the combination of
a
DBPATH command and a QSYMS command as described above. This
command combines the parameters and return information to provide a
stateless
version of the command. The connection with the client is closed
following
completion of the command. The return data is as described above.
REQ: Q FILE <database path length> <database path> <size of full path>
<full.sub.-- path: common.sub.-- path + relative.sub.-- path>
<tokenized: 0 or 1>
RESP:1 .vertline. 0[: <error message>
: {see QFILE for remainder}
This command is essentially identical in operation to the combination of
a
DBPATH command and a QLINES command as described above. This
command combines the parameters and return information to provide a
stateless
version of the command. The connection with the client is closed
following
completion of the command. The return data is as described
__________________________________________________________________________
above.
Exemplary Screen Displays FIGS. 4 through 7 are exemplary screen displays on a web browser which typify the operation of the present invention. In particular, FIG. 4 is a screen display exemplifying the query and response for a Q LINES query (as described above). Checkboxes 400-406 select the type of query operation desired as marked on the textual label associated with each checkbox. These operations correspond to the four operations supported in the web browser to database client process interface. As shown in FIG. 4, a symbol in files query is requested by virtue of checkbox 400 being marked. This query request is transformed by the database client process into a Q LINES server request. Checkboxes 408-414 are used to select search options appropriate for the type of search requested. The parameters correspond to the textual label associated with each checkbox on FIG. 4 and as described herein above. As shown in FIG. 4, the user has requested that the Symbol in files query request match the case of the supplied keyword with the tokens in the database file search. Query box 416 permits the user to enter a keyword which is to be searched by the query request. In particular, the query specified by the user as exemplified in FIG. 4 is to search for the symbol "abort" in all files and to match the lower case specified by the user. Buttons 418 and 420 are used to control operation of the browser. In particular, the user clicks button 418 to evaluate (perform) the search specified in the query and checkboxes. The user clicks button 420 to clear the search parameters and query keywords. Any common portion of the path names of all files in the collection of text documents is shown at label 450. List 452 displays the results of the Symbol in files query request. All lines in the collection of text documents which contain the keyword "abort" (in lower case) are displayed including the file name (the relative portion of the path name devoid of the common portion of the path name shown at 450), the line number and the line of text from the corresponding file. Each file path name in the result list 452 is a hypertext link to generate a File Contents query for the corresponding query. By clicking on the link, the user navigates to a listing of the file contents of the corresponding file. FIG. 5 shows a similar query to that shown in FIG. 4 but with the Outside of { } checkbox 410 marked. As can be seen in FIG. 5, the results list 454 shows only the subset of lines listed in FIG. 4 at 452 in which "abort" appears outside curly braces (i.e., in global declaration contexts). FIG. 6 is an exemplary screen display showing a Substring in Symbols query (as indicated by the mark in checkbox 402). The query in box 416 requests that the server locate all symbols having "frame.sub.-- of" as a substring be displayed. The result list 456 shows all such symbols which contain the substring "frame.sub.-- of." Each symbol in the results list is a hypertext link to generate a Symbol in Files query request for the corresponding file. FIG. 7 is another exemplary screen display typifying a Substring in Paths query request and results (as indicated by the marked checkbox 404). The query specifically requests a list of all file names (paths) which contain the substring "lib" as entered in box 416. Results list 458 shows the relative portion of all paths known in the database which contain the specified string as a substring. Each listing in the results list is a hypertext link to generate a file contents query request for the corresponding file. While the invention has been illustrated and described in detail in the drawings and foregoing description, such illustration and description is to be considered as exemplary and not restrictive in character, it being understood that only the preferred embodiment and minor variants thereof have been shown and described and that all changes and modifications that come within the spirit of the invention are desired to be protected.
|
Same subclass Same class Consider this |
||||||||||
