Method and apparatus for representing and interrogating an index in a digital memory5293616Abstract A method apparatus for retrieval of data from an index database is disclosed. Each address in an index memory corresponds to an attribute that may be possessed by one or more records in the index database. Index memory datawords located at each index memory address include a number of binary bits equal to the number of records in the index database. If a record possesses an attribute, the value of the bit at a position corresponding to the record's address in that attribute's index memory dataword will be a binary "1". Priority encoder circuitry is provided to locate the positions of each of the 1's in the index memory datawords so that all of the records in the index database that possess an attribute can be determined. Logic circuitry is provided to combine index memory datawords logically to form new datawords that can be used to identify records that possess either all of a plurality of selected attributes, or one or more of a plurality of selected attributes. This retrieval system employs parallel hardware circuitry to increase data retrieval speed. Claims What is claimed is: Description BACKGROUND OF THE INVENTION
__________________________________________________________________________
ATTRIBUTE A1
A2
A3
A4
B1
B2
B3
B4
C1
C2
C3
C4
D1
D2
D3
D4
__________________________________________________________________________
INDEX MEMORY
0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
ADDRESS 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
__________________________________________________________________________
INDEX DATA-
INDEX DATA-
BASE ADDRESS
BASE RECORD
INDEX MEMORY DATAWORDS
__________________________________________________________________________
1 AADC 1 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0
2 ABDC 1 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0
3 BDAC 0 0 1 0 1 0 0 0 0 0 0 1 0 1 0 0
4 CCCA 0 0 0 1 0 0 0 0 1 1 1 0 0 0 0 0
5 DABC 0 1 0 0 0 0 1 0 0 0 0 1 1 0 0 0
6 DDDD 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1
__________________________________________________________________________
If the entry ABDC is selected as input for retrieval, the addresses corresponding to the attributes A in the first position, B in the second position, D in the third position and C in the fourth position will be produced. From the table, these are 0000, 0101, 1110 and 1011. The four index memory datawords at these addresses are listed below, along with the result of a logical ANDing of the four datawords.
______________________________________
1 1 0 0 0 0 0000 address
0 1 0 0 0 0 0101 address
1 1 0 0 0 1 1110 address
1 1 1 0 1 0 1011 address
0 1 0 0 0 0 result
______________________________________
The "1" in the second bit position of the result, which can be termed an index database address dataword, indicates that the input matches the second record in the index database. That is, the second record contains an A in the first position, a B in the second position, a D in the third position and a C in the fourth position. The above example is simple in nature to provide a clear understanding of the present invention's operation. It will be understood that if the invention is employed to retrieve records, such as articles, using keyword searching techniques, each of the records will contain a large number of possible keywords, and thus will be identified by many different datawords in the index memory, one for each attribute which a record possesses. A microprocessor circuit 200 which can be employed to implement the method of the present invention is illustrated in FIG. 2. Specifically, the microprocessor circuit 200 includes a conventional microprocessor 202, a microprocessor program and table memory 204 and a match determination circuit 205. The match determination circuit 205 includes an index memory 206, a parallel binary accumulator 208, a priority encoder 210, a decoder-demultiplexer 212 and a tri-state latch 214. Microprocessor 202 inputs and outputs data, such as entries, records and keywords, over an I/O databus 218, and receives instructions or commands from a host computer 219 over an instruction bus 220. Microprocessor memory 204 contains data in look up tables which maps inputted attributes to addresses in the index memory 206, and also data which identifies which records of the index database stored in the host computer are to be retrieved based upon the inputted attributes. It should be noted that hard wired circuitry can also be employed to map inputted attributes to addresses in the index memory 206. For example, the address for an attribute can be specified by assigning a character position (e.g. "A" in the first position) code to the higher order bits of the address register and the character code (e.g. "A") in the lower order bits of the address register through hard wired buses. The index memory 206 in the match determination circuit 205 communicates with the microprocessor 202 via an address bus 222 and a memory read/write select line 224 as is conventional. Index memory 206 contains all of the addresses and the corresponding index memory datawords which are employed to identify those records in the index database which possess the specified attributes, as will be discussed in greater detail below. Each of the bits in the index memory datawords corresponds to a record in the index database. The index memory datawords are outputted from index memory 206 through a databus 226 to the parallel binary accumulator 208, which serves to combine the datawords logically using any of the Boolean functions AND, OR or NOT. The parallel binary accumulator 208, by way of example, can be one or more Texas Instruments SN74S281 4 bit parallel binary accumulators, and is controlled by microprocessor 202 via an instruction bus 228. As an example, in the case where it is desired to determine which records in the index database possess more than one specified attribute, accumulator 208 performs the logical AND operation of each of the index memory datawords corresponding to those attributes to generate a new dataword that contains binary 1's in the bit positions corresponding to the index database addresses of any such records. It should be noted that in order to carry this function out, the microprocessor 202 must first set all of the bits in the accumulator 208 to "1". The index database address dataword that is formed by the accumulator 208 is next inputted to priority encoder circuit 210 which serves to identify the bit position of the highest order "1" in the dataword. Priority encoder 210 can be one or more Texas Instruments SN74HC147s, for example. Alternatively, VLSI circuits can be employed for the priority encoder 210. If all of the bits in the dataword are 0, a signal indicating this is sent to the microprocessor 202 via an output line 230. This situation will occur if none of the records of the index database possess the specified attribute combinations. The output from priority encoder 210 is fed via a databus 234 back to microprocessor 202, and also to decoder/demultiplexer 212, which can be one or more Texas Instrument SN74HC4515s, for example. As with the priority encoder 210, VLSI circuits can also be employed for the encoder/demultiplexer 212. The operation of this circuit is controlled by microprocessor 202 via a select line 236. The output from the decoder/demultiplexer circuit 212 is fed via databus 238 to the input of accumulator 208. The purpose of the decoder/demultiplexer circuit 212 is to remove the highest order 1 from the dataword generated by the accumulator 208 once priority encoder 210 has identified that highest order 1. This circuit acts to generate a dataword that contains all 1's except for the bit position specified by priority encoder 210, which is set to 0. This dataword is then sent via databus 238 to accumulator 208 which logically ANDs this word with the previously generated dataword. The result of this operation is that the highest order 1 is removed from the dataword which is once again sent to the priority encoder 210 so that it will determine the location of the next highest order 1 in the index database address. 1' 1' This procedure is repeated until the location of all of the 1's in the index database address dataword are determined and fed to microprocessor 202. Also, once all of the 1's have been detected by priority encoder 210, a signal will be sent via line 230 to the microprocessor 202 indicating that the retrieval process is complete. All of the outputs from the priority encoder 210 indicating the bit positions of the 1's in the index database address dataword are fed to microprocessor 202 which utilizes them as addresses for a second look up table contained in the microprocessor memory 204, and provides the identity of all of the records that possess the selected attributes. The identities of these records are then sent by microprocessor 202 to the host computer 219 via I/O bus 218 which will display the identities to a user and/or retrieve the actual records from a separate memory for display. The tri-state latch 214 has an input connected to the output of the accumulator 208, and an output connected to the index memory 206 and the input of the accumulator 208. The tri-state latch 214 acts as a buffer which holds the output of the accumulator 208 so that it can be written back into the index memory 206. This writing back occurs only when a record is added to, or deleted from, the index database. Specifically, an index memory dataword is read out, modified in the accumulator 208 by adding or deleting a "1" in one or more of the bit locations, and then transferred through the tri-state latch 214 back to the index memory 206. For example, if a new record is to be added to the index database, it is assigned to an empty location in the record identity look up table in the microprocessor memory 204 and the bit corresponding to that location in each of the datawords contained in the index memory which represent attributes possessed by the new record, will be set from "0" to "1". It will be understood that since each bit of the datawords stored in the index memory 206 corresponds to a record in the index database in the host computer 219, a plurality of the match determining circuits 205 connected in parallel will be needed to implement a large index as illustrated in FIG. 3. For example, if the index memory 206 is implemented using memory chips having 16 bit datawords, and the index database contains 16 X n entries, then n match determining circuits 205 will be needed. Multiple indexes can also be cascaded in series with one another as illustrated diagrammatically in FIG. 4 to provide a multiple level retrieval system 300. The multiple level retrieval system can use two or more of the microprocessor circuits 200 of FIG. 2 interfaced to the host computer 219 as shown. For example, a first of the circuits 200 can be used to locate records in the index database containing particular keywords, where the ordering of the characters in each of the key words must be considered. Any matches located during the first level of retrieval can then be used by a second of the microprocessor circuits 200 during a second level of retrieval to identify records in the index containing all of the keywords, regardless of order. Thus, during the first level of retrieval, both the character identification and position attributes are employed to locate records containing particular words. Then during the second level of retrieval, word identification attributes, but not position attributes, are employed to locate records containing a combination of words in any order. A pseudocode which can form the basis of software to control the operation of the microprocessor circuit 200 is provided below. In this pseudocode, the terms "record identifier", "entry", "pointer" and "element" are used. The definitions of these terms are as follows: A "record identifier" is a couple containing two parts: an entry and a pointer. An "entry" is the first part of an index record identifier, and is a pattern, such as an arrangement of letters, pixels or other entities, that can be represented using binary bits. A "pointer" is the second part of an index record identifier and represents a memory location, disk address, array position or any other number associated with an entry of an index record. During a search and retrieval operation, a presented input entry is compared with record entries throughout the index database. A detected match causes the associated pointer or pointers to be returned as output to the host computer. An "element" is a piece of an entry. For example, if the entry is a language word spelled out with familiar alphabetic letters, an element would be one or more letters of the word. In general, an element is a subset of the large set that represents an entry. The following is the pseudocode which generally describes the various operations carried out by the microprocessor circuit 200. These operations include Initialization, Waiting for a Command and Executing a Command. The various commands that can be executed include adding a record, index look up, deleting a record, index backup and index statistics. A) Initialization Upon reset, the microprocessor jumps to a known point and starts executing code for initialization. This code initializes ancillary parts of the index circuitry, 0's volatile working memory and establishes tables. B) Waiting for a Command The microprocessor loops waiting for interrupt or repeatedly polls an input looking for a change of state. When one of these events occurs, the microprocessor reads in the bus or input port contents and analyzes those contents to determine which host generated command to execute. Handshaking may occur to acknowledge receipt of a command or to request re-transmission if the data is uninterpretable. The commands could be in the form of decimal digits, e.g. "1" could mean "add a record", "2" could mean "look up", etc., and these digits could indicate the number of a pointer in a pointer table where the pointer indicates an entry point to a section of code for executing the specified command. C) Executing a Command 1) Add a Record Read "number of record identifier slots available" variable. If 0, return "index full" message and exit, else decrement above variable and continue. Locate an unused record identifier slot (perhaps pop an "unused record identifier slot" stack in memory which contains all unused record slot numbers). Read input entry and input pointer of the record identifier to be added. Store entry and pointer in index table at location determined from above mentioned "unused record identifier slot" number. Convert input entry into a list of index memory addresses. Send "unused record identifier slot" number, SN to the decoder/demultiplexer (D/D) causing it to produce an output consisting of "0's" in all bit positions except the SN'th bit position. The D/D output is read into the ALU. Read out from the index memory the first memory word, using the first address in the index memory address list. Logically OR it with the D/D output in the ALU. The ALU now contains the index memory word with its SN'th bit changed from a 0 to a 1. Write the ALU output back to the same address in the index memory. Repeat this process for each address in the index memory address list. Signal the host computer that the "add a record" operation is complete. Return to "loop and wait for next command". 2) Index Look Up Read and store the input entry. Prepare an index memory address list as follows. When the order of the elements of the input entry is unimportant, one may use the distinct binary coding of each element as a memory address. Or, if more convenient, convert the element's code to a new unique number; for example, if the elements are ASCII characters with code values from 32 to 95, subtracting 32 from each would make the address space go from 0 to 63. If the order of the elements of an input entry is important, then use an approach suggested by the following: assume 16 element positions in an input entry and assign memory addresses using the following formula: Memory Address =(Element Number X 16) +Position Number This formula yields a unique address for any element in any position. Alternatively, the formula: Memory Address =(Position No. X Number of Possible Elements) +Element Number yields a unique address for each element/position combination. Clearly, other operations on position and character number will yield a unique number for each combination of them. The list of addresses, usually one for each element in the input entry, is referred to here as the index memory address list. The input entry could be only part of a pattern. For instance, "year 19" is only part of entry "year 1990". Once the index memory address list is prepared, readout all the index memory datawords addressed by the index memory address list and logically AND them together in the ALU. The result is put on the index memory databus. The priority encoder, connected to the index memory database, outputs the bit number of the highest order bit position with a 1 in it. If the "all 0's" line is set meaning the priority encoder finds no 1's, return a "no match" message to the host then exit. When a match occurs, the output of the priority encoder indicates the bit number which is the slot number (SN) of a match. Readout the pointer stored at the SN'th location of the index record table and return this pointer to the host. The D/D has the SN as its input and an output of all 0's except for the SN'th bit position which is a "1". In the ALU, complement this D/D output and logically AND in the previous output which has been stored in the latch, thus resetting the highest order "1". If there are multiple matches, the output of the priority encoder will now shift to the number of the next highest order "1", and the above procedure is repeated. Once all "1" bits have been located using the above procedure, the "all 0's" line is set, indicating that all matches have been recorded, then exit. 3) Delete a Record This operation is the same as "add a record" except, reset the SN'th bit of each index memory dataword addressed by the entries in the index memory address list from 1 to 0 by logically AND'ing them one at a time to the complement of the D/D output and writing them back to the index memory. Push the SN onto the "unused record identifier slot" stack and increment the "number of unused record identifier slots available" variable by 1. 4) Index Backup Readout to the host the contents of the index record table. A new index can be reconstructed from this table. 5) Index Statistics Report to the host the number of available slots and any usage counts or other statistics desired. Although the invention has been disclosed in terms of preferred embodiments, it will be understood that numerous variations and modifications could be made thereto by those of ordinary skill in the art without departing from the true spirit and scope of the inventive concept as set forth in the following claims.
|
Same subclass Same class Consider this |
||||||||||
