Fast search processor4760523Abstract A special-purpose search processor, and a related method, for performing a variety of logically complex searches of a serial data stream in a highly concurrent fashion. The processor comprises a sequence of serially connected cells of identical construction, and the data stream is passed through the sequence of cells, each cell performing a logical operation based only on the data provided to it from the previous cell in the sequence. Each cell has a character register for data storage and a pattern register for storage of part of a search pattern. The contents of the two registers are compared in each cell, at each cycle of a clock used to propagate the data through the processor. Match indicators or match tolerance values are propagated through the processor on a match line, and match results emerge in synchronism with the data stream. Multiple match lines are employed in one preferred embodiment, to temporarily save, retrieve and exchange match tolerance values, in order to effect logically complex searches in a highly concurrent manner. Types of searches that may be performed include logical OR and AND searches, common-prefix OR searches, and searches involving variable-length and fixed-length don't-care strings, variable-length care strings, and negate strings. Claims We claim: Description BACKGROUND OF THE INVENTION
______________________________________
c.sub.1 c.sub.2
c.sub.3
______________________________________
Pattern C A T
Mask U U U
Tolerance 1 0 0
Last Flag 0 0 1
______________________________________
The mask flag has a certain bit set to force a match regardless of whether upper or lower case characters appear in the date. This is indicated by the letter U. The key to the matching process is in the role played by the L1.sub.f logic element 83' and the delay register 91'. If there is no match between the character register 214 and the pattern register 218', input (1) is chosen in this logic element, and the tolerance value is decremented to zero in the decrementing circuit 92s. (The decrementing circuit 92' is designed not to decrement the tolerance value below zero). Thus, a non-match in any cell will result in zeroing of the tolerance value in the M1 match line. When a match is found, however, input (2) is selected in the M1.sub.f logic element 83', and the tolerance value is not decremented. If a letter C is input to the first cell c.sub.1, a match will be found and a tolerance value of "1" will be passed to the delay register 91'. The purpose of the delay register 91' is to synchronize the propagation rate of the tolerance value on the M1 match line with that of the data on the character line. For a search pattern of n characters, it will take 2n clock cycles for an n-character sequence within the data stream to pass completely across the search pattern. Therefore, to provide a match result out of the processor when the last data stream character is emerging requires that the M1 match line values progress along the line at half the character clocking rate. The delay register at each cell position takes care of this timing difference. Another way to appreciate the need for the delay is to consider the number of clock cycles that must occur between the matching of two adjacent characters. After the matching of the C's in cell c.sub.1, shown in line (b) of FIG. 4, two clock cycles must occur before the A's are aligned for match detection cell c.sub.2, as shown in line (d). In line (a) of FIG. 4, the letters CATX are shown as approaching the search pattern. The two numerals in each cell represent the tolerance values at the M1 register and the delay register, respectively. These are initially all zero. In line (b), the letter C has advanced to the first cell c.sub.1 and a "1" has been introduced into the M1 register. In line (c), the letter C has advanced to the second cell c.sub.2 and the letter A as in the first cell c.sub.1. Since there was a match in the previous line in the first cell c.sub.1, a "1" tolerance value will advance to the delay register in this cell. On the next clock cycle, as shown on line (d), the "1" from the delay register of the first cell is shifted into the M1 register of the second cell c.sub.2, where the data character A is aligned with the A of the search pattern. On the next cycle, on line (e), the "1" is advanced to the delay register of the second cell c.sub.2, since there was previously a match in that cell. In the next cycle, on line (f), the "1" is propagated to the M1 register of the third cell c.sub.3, where the T characters now match. In line (g), the "1" moves to the delay register of the third cell c.sub.3, because of the previous match in that cell. The final step is shown in line (h), in which the tolerance value of "1" emerges from the search pattern with the letter X, which immediately follows the pattern located in the data stream. It will be seen that the tolerance value will propagate across the search pattern of cells only f a match has been detected in each successive cell of the pattern. If a tolerance value higher than "1" were introduced in the first character of the search pattern, one or more errors could then be tolerated in the data stream. For example, if a tolerance value of "3" were used, CAT would produce a result of "3", COT a result of "2", and C a result of "1". Each error decrements the tolerance value by "1". In a three-letter pattern, all three letters would have to be in error to reduce the tolerance value to zero. Although this simple search can be performed as described, using a single match line, a more powerful search processor results when multiple match lines are employed. The following section describes the cell structure of such a processor. Multiple Match Lines - Overview: Although the simple search described operates at high speed as desired, it is somewhat limited in terms of the types of searches that can be made. For example, a simple OR search, such as CAT or DOG, would require two passes of the data stream if the simple search technique were to be employed. In accordance with an important aspect of the invention, multiple match lines are employed to provide the search processor with extended capabilities. Second and third search lines are used principally as registers for the temporary storage of match results. For this usage of multiple match lines, a number of manipulative functions are needed for the match lines, to enable splitting of a line, exchanging positions of lines, combining lines, and so forth. These basic functions are controlled by flags stored in a flag register in each cell, as explained in the following sub-sections. Cell Structure with Multiple Match Lines: Each cell, as shown in FIG. 3, includes seven logic blocks indicated by numerals 80-86, the function of which will be explained as the description proceeds, four match registers 87-90, a delay register 91, and two decrementing circuits 92 and 93. As in the description of FIG. 3a, the subscript "i" refers to an input signal, as in M1.sub.i, and the subscript "o" refers to an output signal, as in M1.sub.o. the subscripts "a" and "b" refer to intermediate signals, between input and output. Each of the logic elements 80-86 is a priority multiplexer, having multiple inputs, designated by numbers in parentheses, and a single output. The convention for each logic element is that the output is chosen from the input whose associated input logic condition is true. The logic conditions are set forth in abbreviated form in the logic boxes and are further elaborated on in Part B of this description. If more than one input logic condition is true at the same time, the uppermost input, i.e. the one with the lowest input line number, is chosen as the output. The operation of these logic elements will shortly become clear. The first match input line M1.sub.i is connected to input (3) of logic element 80, which is also designated L1.sub.i. The output of this element passes to the M1 match register 87, from which two possible inputs to logic element 83 are derived. Input (1) to logic element 83 is derived by decrementing the M1 register value, in circuit 92, and input (2) is derived directly from the M1 register 87. The output of logic element 83, also designated L1.sub.f, passes through the delay register 91 and thence to logic element 85 (L1.sub.o) as input (4). The output of logic element 85 is the first match line output M1.sub.o. Alternate paths for the first match line are a feedback path from the output of logic element 83 to input (4) of the logic element L1.sub.i 80, and a path that bypasses the delay 91, extending from the output of logic element 83 to input (3) of logic element L1.sub.o 85. The input for the second match line M2.sub.i is connected to inputs (1) and (3) of logic element 81, and as input (2) of logic element 80. The output of logic element 81, also designated L2.sub.i, is connected to the M2 match register 88, and thence to input (3) of logic element 84 (directly) and to iput (2) of the same element (through decrementing circuit 93). The output of logic element 84, also known as L2.sub.o, is the second match line output signal M2.sub.o, which is also fed back to input (4) of the input logic element L2.sub.i 81. Input (2) of element L2.sub.i is a forced zero value, and input (1) of logic element L2.sub.o 84 is derived from a value M1.sub.a output from the M1 match register 87. By way of example, FIG. 3b shows the L2.sub.i logic element 81 in more detail. Bascially, the logic element includes a priority encoder 250 and a multiplexer 252. The encoder 250 has four inputs, on lines 254-257, which derive their binary inputs from the logical expressions shown in the respective blocks 258-261. When an expression in one of the blocks is true, a "1" input signal is generated on the corresponding input line. If only one of the inputs is a "1", its position is converted by the encoder 250 to an address signal output on lines 266 to the multiplexer 252. If more than one input is a "1," the priority encoder 250 selects the input line nearest the top of the block, i.e, the line with lowest reference numeral. The multiplexer 252 operates in a conventional manner and converts the address on lines 266 to a 1-in-4 internal selection signal, which is used to select one of four input lines 270-273 for output from the multiplexer. The other logic elements in FIG. 3 operate in substantially the same way. The third match line input M3.sub.i is connected to input (2) of logic element 82, also known as L3.sub.i, input (1) being derived from the M1.sub.i signal. The output of logic element M3.sub.i is connected to the M3 match register 89, and thence to the output line M3.sub.o. The fourth match line input M4.sub.i is connected as the only input to the M4 match register 90, the output of which is connected as input (2) to the logic element 86, also knonw as L4.sub.o. Input (1) to the logic element L4.sub.o is derived from the first match line output M1.sub.o, and the output is the fourth match line output M4.sub.o. The cell structure also includes first and second initialization registers 210 and 211. An input initialization line INIT.sub.i is coupled to the first initialization register 210, the output of which is coupled to the second register 211, from which an output initialization line INIT.sub.o is derived. An initialization state accumulator 212 is connected to the line between the registers 210 and 211. The character input line, designated CHAR.sub.i, provides the input to a character register 214. A number of other storage registers are shown as connected to the character line, since their values are initialized through the character line. These are the length register 215, the flag register 216, the tolerance register 217, the pattern register 218 and the mask register 219. A comparator 220 receives data from the pattern register 218, the character line and the mask register 219. Bascially, the comparator 220 compares the data in the character register 214 with the data in the pattern register 218, in conjunction with a mask stored in the mask register 219. The only element of the cell yet to be discussed is the counter logic 222. This operates in conjunction with the length register 215 and certain flags in the flag register 216, to control an internal counter used in searching functions. All of the searching functions and conditions can be understood by reference to the basic cell diagram of FIG. 3. The specific flags in the flag register 216 will be introduced was particular functions are described. It will be understood from the foregoing description that the simple search function described with reference to FIG. 3a can be performed in the same manner using the cell stucture of FIG. 3. The only difference is that the processor using multiple match lines makes use of the M4 match line as a result line. When the last flag is set in a cell, usually the last cell in a search pattern, the tolerance value on the M1 line is transferred to the M4 line. This is shown in FIG. 5 which is a match line diagram showing in diagrammatic form how tolerance values are propagated through the interconnected cells. As shown, the tolerance value propagates along the M1 match line until cell c.sub.3 is reached. Then the presence of the last flag causes transfer of the tolerance value to the M4 match line. In FIG. 3, this is the path through input (1) of the M4.sub.o logic 86. It should also be understood that one can enjoy the same cell structure without the use of tolerance values. In this special case, the tolerance values are always initially unity. The first decrementing action on a tolerance value effectively clears it to zero, indicating a non-match. The Flags: The following is a list of the flags, which are one-bit fields within the flag register 216 of each cell: P: Pass flag B: Bracket flag O: OR flag C: Choose flag R: Right flag N: Negate flag I: Infinite flag L: Last flag The flags used in a simple OR search will be explained first. The remaining flags will be discussed later. The Bracket Flag: The bracket flag effects a splitting or branching of the M1 match line. A dedicated cell is needed to perform this function. In other words, no pattern character can be stored in a cell with the bracket flag set. Its effect is to transfer the output of the M1 match register (the M1.sub.a output) to the M2 output line of M2.sub.o of the same cell. On the M2 match line, any value loaded into the M2 register from the M2.sub.i input lines will be ignored when the bracket flag is set. The effect of the bracket flag can be appreciated in FIG. 3, which shows the M1.sub.a output as providing input (1) to the L2.sub.o logic element 84. The Pass Flag: The pass flag is set in a cell in which there is a need to bypass the delay register 91 in the M1 match line. Normally, it is desired that the tolerance value in the M1 match line should propagate from cell to cell at half the rate of character propagation. However, there are exceptions that require the delay to be bypassed in some cells. One exception is in a special-purpose cell, such as one in which the bracket flag is set and no pattern-matching function is performed. Another use of the bypass flag is in the last character of a pattern. If the bypass flag is set in this cell, the result will emerge with the last character of the pattern, rather than with the next subsequent character, and this may be more convenient for some applications. Yet another use of the pass flag is in conjunction with variable-length "don't care" operations, to be discussed. The Tolerance Register: The tolerance register 217 is not a flag; it is another register separate from the flag resistor. It is initialized with a positive integer indicating the maximum mismatch that will be tolerated in a pattern search. The tolerance value is typically set only for the first character of a pattern of interest. The tolerance register value, if non-zero, is loaded into the M1 match register 87, as indicated by input (1) of L1.sub.i logic element 80. If the tolerance register contains zero, as will be the case for most cells of the pattern, the input for the M1 match register is usually derived from the M1 input match line M1.sub.i. The Last Flag: The last flag is set in the last character of a pattern to be searched for. In a cell in which the last flag is set, the values in the M1 delay register 91 and the M4 match register 90 are compared, and the larger one is placed on the M4 output line M4.sub.o. This is apparent from FIG. 3, in which the L4.sub.o logic element 86 has its first input (1) derived from M1.sub.o if the last flag is set and M1.sub.o >M4.sub.a, M4.sub.a being the output from the M4 match register 90. As will now be discussed, the function of the last flag is also used in logical OR operations. The Basic OR Search: The basic OR search has as its goal the location of two or more alternative patterns in the data stream being searched. For example, one may wish to locate the occurrences of the words CAT or DOG or MAN. A simple search using only one match line would require three passes through the entire data stream to perform the OR function. In the present invention, the OR search is made using match line M4 in conjunction with the tolerance register 217 and the last flag. As shown in FIG. 6, the first pattern, CAT appears in the cells in a normal manner. The C character sets the tolerance register to a desired value, and the last character, T, has the last flag set. Thus, the result of the CAT match, if any, is transferred to the M4 match line. It will be recalled that a match result emerges from the last cell of a pattern with the character following the last matching character of the data stream being searched. If the letter following CAT is assumed to be X, for example, then the match resul will emerge from the T cell as the X character emerges from the character register of the T cell. Since the M4 match line has only one register, the match value will propagate along the M4 match line in synchronism with the X character of the data stream. Since a logical OR function is required, it is important that any match result from the CAT pattern not be associated with the search for the DOG pattern. This requires that the first letter of the DOG pattern should reset the tolerance value to a selected value, regardless of whether or not there was a match in the first pattern. Therefore, the D character of DOG and the M character of MAN have their tolerance registers initialized at a desired value. The last cell, G, of the DOG pattern also has its last flag set, causing the result of the DOG search to be transferred to the M4 match line. In most OR searches, this will not cause any conflict in results on the M4 match line, since the result from the CAT search will have propagated out of the processor by the time the result of the DOG search reaches the M4 line. In other words, the results of the CAT and DOG searches will emerge from the processor at different times. To allow for the possibility that some search configurations would cause a conflict between a search result on the M4 line and a search result about to be transferred to the M4 line, the last-flag logic selects the larger match value for the output on the M4 line. This is shown in the input logic for input (1) of logic element 86 (L4.sub.o). Such a conflict between results on the M4 line would occur only when two ORed search patterns were of equal length and were almost identical, within the tolerance selected for the search. For example, a search for DOG or LOG with a tolerance of two or more would result in both OR paths detecting a match when DOG appeared in the data stream. DOG would match exactly and LOG would match with a lower tolerance but both match results would compete for the M4.sub.o line and the larger tolerance would be selected. The addition of the third search pattern, MAN, is treated in identical fashion. The M character of the pattern resets the tolerance value for the search, and the N character of the pattern has its last flag set, to transfer any match result to the M4 line. The OR Flag: The OR flag affects three match lines: M1, M2, and M3. When a clock pulse occurs in a cell with the OR flag set, the M2 input value, on line M2.sub.i, is loaded into the M1 match register 87, and the M1 input value, on line M1.sub.i, is loaded into the M3 match register 89. These signal paths may be readily observed in FIG. 3, in which the (1) input to the L3.sub.i logic element 82 is derived from M1.sub.i when the OR flag is set, and the M2.sub.i value is selected for input to the M1 match register when the OR flag is set and certain other logical conditions are true. When used subsequent to a bracket flag, the OR flag permtis retrieval of a previous result saved in the M2 match line, and at the same times saves the current M1 match result in the M3 match line. THe practical importance of the OR flag will become apparent from examples to be described after the choose flag has been introduced. The Choose Flag: The choose flag affects the output of the first match line, on line M1.sub.o. On each clock pulse in a cell in which the choose flag is set, the values in the M3 match register 88 and the M1 delay register 91 are compared, and the larger value is output on the M1.sub.o output line. This choise is made by input (2) of the L1.sub.o logic element 85. If the pass flag is set as well as the OR flag, the M3 delay register 91 is bypassed and the comparison is made between M3 and the output from the M1 match register 87, on line M1.sub.f. Common-Prefix OR Search: The common-prefix OR search is an OR search in which each of the alternative patterns has a common prefix pattern. For example, suppose one wished to locate the occurrences of OLD CAT or OLD DOG or OLD MAN. While this may seen trivial, for longer search patterns it would be desirable not to have to repeat the prefix several times in the search pattern. The common-prefix search solves this problem by means of the bracket, OR and choose flags, as shown in FIG. 7. The common-prefix pattern, OLD, appears first in the search pattern, and the result of the prefix search appears on the M1 match line in the same manner as in a simple search. The next cell after the prefix is a bracket cell, having both the bracket flag and the pass flag set. As will be recalled, this splits the M1 tolerance value and places it on the M2 line as well as on the M1 line of subsequent cells. In the next segment of the search pattern, the first suffix pattern, CAT, is searched for, so that the last cell (T) in this segment will produce a result indicative of the OLD CAT search. The first cell of the next segment of the search pattern, the D cell, has the OR flag set, and this results in saving the OLD CAT result in the M3 match line and retrieving the prefix search result (OLD) from the M2 match line. The third segment of the search effects a search for OLD DOG, while propagating the OLD search result through the M2 match line and propagating the OLD CAT search result through the M3 search line. The last letter of the third segment, the G cell, has its choose flag set. In the general case, this effects a choice of the larger of the M3 value and the M1 value. In most practical situations, however, there will be not conflict between the M3 and M1 values. If a match had been found for OLD CAT, by the time a subsequent match has been found for OLD DOG the OLD CAT match value would have been propagated out of the processor. The choose function either takes the OLD CAT match value from m3 or the OLD DOG match value already on M1 and outputs it on the M1.sub.o line. The fourth segment of the search pattern functions in a similar way to the third. The first cell, the M cell, has its OR flag set, to save the M1 match value in M3 again, and to retrieve the prefix match value from M2. In the fourth segment, matching proceeds for the OLD MAN pattern, and in the N cell the choose flag again effects a choice between the OLD MAN match value on the M1 line and a possible OLD CAT or OLD DOG match on line M3. Again there is little possibility of two simultaneous matches of the alternative patterns, unless the patterns are of equal length and nearly similar content. In this discussion, the space between the prefix and possible suffixes of the search patterns has been ignored. One simple way to handle this is to consider the space as part of the common prefix. Another solution is to invoke a "don't care" function, to be discussed, and to ignore the imbedded space in the search pattern. "Don't Care" Character Strings: There is a common search requirement to ignore strings of characters imbedded in a search pattern. The characters to be ignored are frequently referred to as "don't care" characters. Two cases of interest are the fixed-length don't care and the variable-length don't care strings. Fixed-length don't care situations are easy to handle using the mask register 219 of selected cells. For example, if one wished to search for a particular date, but the day of the month was not critical, the search pattern might be: MARCH XX, 1972, where XX denote don't-care characters. The search could be implemented by setting all bits of the mask register in the two cells of the search pattern corresponding to the day of the month. Bit positions of the character register corresponding to the set positions of the mask register are ignored in the comparison process. If all bits of the mask register are set, a match in that cell is assured, regardless of the content of the character register. The variable-length don't care function is effected by means of a special cell with the maks register bits all set, operating in conjunction with the counter and length register 215 in the same cell. The "don't-care" cell is placed in the search pattern at the position at which a variable-length don't care string is permitted. Suppose, for example, one wishes to search for "MARCH" within ten characters preceding "1972." The search pattern will be MARCH*1972, where the asterisk represents the "don't care" cell. The * cell has its length counter initialized to a value corresponding to the don't-care count of ten. After the pattern for MARCH has been matched in the data stream, a non-zero tolerance value on the M1 match line will have the effect of loading the counter with the value stored in the length register, in this case ten. This is apparent from the second load condition for the counter set forth in FIG. 3. Subsequent characters, up to ten, passing through the "don't care" cell will result in decrementing the counter. Meanwhile, the M1 match result from the match of the pattern MARCH will be recirculated within the cell itself. The mechanism for this recirculation is a feedback signal M1.sub.f from the output of logic unit 83 to input (4) of the L1.sub.i logic element. The action of the * cell is, therefore, to transmit this recirculated match value on the M1.sub.o line up to ten times. If the next following character does not match the "1" in the second part of the pattern (1972), the match value will be decremented in the usual way and a match result may not emerge from the end of the pattern. If the string 1972 follows the string MARCH within the designated ten characters, then one of ten match values output by the * cell will be propagated all the way through the search pattern, indicating a match within the don't-care range that was specified. A variation of the variable-length don't-care search is obtained by setting the pass flag in the don't-care cell. This permits a zero-length don't-care between the two search patterns. That is to say, the number of don't care characters may be from zero to a selected count, rather than from one to the selected count. A related search condition is the "variable-length care" condition, in which a specific character may occur repeatedly in a character string. For example, one may wish to find the words FAT and CAT separated by up to five spaces. In this case, the search pattern is FAT CAT, with the fourth cell containing a space character and not having its pass flag set. The length register in this cell is initialized to a value of five, and this value is loaded into the counter when a match is found in the first part of the pattern (FAT). The match value from the first part of the pattern is recirculated five times while the counter is being decremented, so long as spaced appear between the words FAT and CAT. If the second part of the pattern (CAT) appears within five spaces of the first part, one of these circulated match values will be propagated completely through the pattern if the second part of the pattern is detected. This type of search, i.e. variable-length care, cannot be implemented down to a zero-length. Zero-length would imply a "don't-care" situation, whereas the "care" search requires a specific character to be located. Accordingly, the pass flag cannot be set for this type of search. The Negate Flag: The negate flag has the effect of: (1) repeatedly outputting the value in the M2 match register on the M1 output line M1.sub.o, and (2) if the incoming value on the M2 input line M2.sub.i is greater than the previous value M2.sub.o, loading the incoming M2.sub.i value in the M2 match register. If the incoming M2.sub.i value is not greater than the previous value, M2 will either be retained as it was or will be zeroed, depending on whether M1.sub.i is zero or non-zero. The implications of these alternative actions will become clear from the example depicted in FIG. 8. The negate flag is used to create patterns that will be considered to match if certain strings are not present in the incoming data stream. For example, if one wishes to find the words FAT CAT but without the word BLACK between them. In other words FAT BLACK CAT would not be a match, but FAT WHITE CAT would be. The search pattern is FAT[BLACKnCAT, where [is a cell with its bracket flag set and n is a cell with its negate flag set. If the first pattern segment (FAT) is located in the data stream, the match value is passed to the M2 match line by action of the bracket flag, and is also retained in the M1 match line. However, the tolerance value is reset in the first cell of second segment (BLACK). Therefore, at the end of the second pattern segment (BLACK), the M1 match line will carry an indication of whether or not the pattern BLACK was found in the data stream. The M2.sub.i incoming match line will indicate whether a match of the first pattern segment (FAT) was found. When M2.sub.i is greater than its previous value, a match is indicated and M2 match register is loaded. This loading step is through input (1) of the L2.sub.i logic element 81. So long as BLACK has not been found, the negate cell continues to recirculate the value in its M2 match register, and to transfer this value back to the M1 match line. Transfer back to the M1 match line is made through input (1) of the L1.sub.o logic unit 85. Recirculation of the M2 value is made through input (4) of the L2.sub.i logic element 81. If a match is then found for the third segment (CAT), a non-zero match value will emerge from the processor in the usual manner. If a match is found for BLACK, a non-zero match value is presented to the negate cell on its M1.sub.i line, and this results in a zero value being placed on the M2 match register. This operation takes place as a result of the zero connected to input (2) of the L2.sub.i logic element 81. The zero match value is also transferred to the M1 match line in the negate cell, and no match can then be found for the entire pattern, regardless of whether or not the last segment matches. In summary, then, the negate logic functions principally as a result of the configuration of the L2.sub.i logic element 81. Input (1) is selected when the M2 register is first loaded in the negate cell, input (2) is selected when a match of the unwanted pattern segment is located, and input (4) is selected when the M2 value is recirculated. It will be understood that the last cell in any search pattern should have its last flag set, to transfer the final result of the search to the M4 result line. The Right Flag: The right flag works in conjunction with the length counter and performs a function similar in some respects to that of the negate flag. If the incoming value M2.sub.i on the M2 match line is non-zero, it is loaded into the M1 match register. This is effected through input (2) of the L2.sub.i logic element 80. If the value in the M1 match register is zero, or if the length counter has reached zero, the value loaded into the M2 match register is decremented by one and is output on the M1.sub.o output line. Otherwise, i.e. if M1 is not zero and the counter is not zero, then the value loaded into the M2 match register is output directly onto the M1.sub.o output line. The value output from M2 to the M1.sub.o output line is also recirculated to the M2 match register, through input (4) of the L2.sub.i logic element 81. The incoming value on the M2.sub.i line is loaded into the M2 match register only f it is greater than the previous M2 value (M2.sub.o). This is accomplished by input (1) of the L2.sub.i logic element 81, and is similar in operation to the negate flag. The right flag provides a method for extending the concept of a variable-length care string in the search pattern, to allow a variable mix of several different specific characters, as the example shown in FIG. 9 illustrates. Suppose that the search pattern includes the name JEAN-PAUL. It is concluded that the hyphen may be replaced by a space, or a tab character, or some combination of these. Accordingly, the ideal search pattern would find a match if the data stream included any combination, up to a specified count, of several specified characters; for example, any five characters including hyphens, spaces and tabs, but no other characters. The right flag permits this function to be performed. The search pattern stored in JEAN[-t]PAUL, where [is a cell with the bracket flag set, t is a tab character, and ] is a cell with the right flag set. The first segment of the pattern (JEAN) generates a match value in the M1 match line in the normal manner. Then the bracket cell copies this value in the M2 match line. For the next cells, following the bracket, the tolerance register is set to one, so the search for the characters following the bracket requires an exact match of one of the selected characters between the bracket and the cell with its right flag set. The cell containing the hyphen has both the OR flag and the choose flag set. The OR flag results in the M1.sub.i value being transferred to M3 and the M2.sub.i value being copied to M1. Then the choose flag in the same cell chooses the larger of the M3 and M1 values. The next cell, contaiing the tab characters, also has its OR and choose flags set and operates in the same manner as the preceding cell. Up to this point, the search pattern is quite similar to the one used in a common-prefix logical OR search. A non-zero output will be generated on the M1 match line if any one of the three charactes is detected. The next cell, with the right flag set, may be though of as a "closing bracket." If the input of the M2.sub.i line is non-zero, indicating a prefix match, the counter is loaded with the length register value. If the M1.sub.i line also indicates a match, meaning that one of the three designated characters followed the prefix, then the M2 match value is recirculated and is also transferred to the M1 match line for use in matching the suffix pattern segment. If the M1.sub.i line indicates no match, i.e. that a character other than one of the specified three followed the prefix, then the M2 value is decremented by one, recirculated, and also transferred to the M1 match line. So long as each character following the first pattern segment is one of the designated characters, and the number of such following characters does not exceed the designated count, then the cell with the right flag will continue to generate a match indicator. If a different character is interposed in the stream, or if the count is decremented to zero, the tolerance value will be decremented and the cell output may indicate a non-match. Operations involving the right flag may also be better understood by tracing the relevant portions of logic in FIG. 3. Counter loading is initiated by the first load condition in the counter logic. The transfer of non-zero M2 tolerance value back to the M1 match line in a right-flag cell, is effected by input (2) of the L1.sub.i logic element, which is the selected input when M2.sub.i is non-zero. After the counter has been loaded, the M1.sub.i line should normally return to zero, and the counter will be decremented on each cycle. The M1 register will then derive its input from the M1.sub.i line rather than the M2.sub.i line, as indicated by input (3) of the L1.sub.i logic element. In the right-flag cell, the M2 match output, on line M2.sub.o is always transferred to the M1.sub.o line, through input (1) of the L1.sub.o logic element 85. The value of the output transferred is determined by the condition of the M1 match register 87 (M1.sub.a) and by the condition of the counter. If M1.sub.a or the counter is zero, this indicates that either there is no current match within the brackets, or the match result in M2 has been output to M1 more than a selected number of times. In either case, the M2 match value is decremented prior to its next output on the M1.sub.o line. If M1.sub.a is non-zero, meaning that a matching character was found between the brackets, and the counter is non-zero, then the M2 match value is recirculated without change in the M2 line. The recirculation of M2 values in a right-flag cell is effected through input (4) of the L2.sub.i logic element 81. Decrementing the M2 value is accomplished by the decrementing circuit 93, which is selected by input (2) of the L2.sub.o logic element 84. Initialization and Diagnostic Mode: It has been assumed in the foregoing examples that a mechanism exists for loading a number of serially connected cells with a desired search pattern, together with the associated flags and registers. The initialization scheme is discussed in Part B of this specification. PART B Details of Match Control Logic: The logical functions for the logic 80, 81, 82, 83, 84, 85, and 86 are described in the following diagrammatic charts, designated CHART-1 through CHART-10. The following symbols are used in the charts:
______________________________________
TOL = tolerance value,
O = OR flag,
R = right flag,
N = negate flag,
C = choose flag,
P = pass flag,
L = last flag,
B = bracket flag
I = infinity flag,
K = counter,
Char = character register,
Patt = pattern register,
. = logical AND,
+ = logical OR.
______________________________________
The first three charts concern control of the counter in each cell. The counter is conventional in design, and operates in one of three modes: clear, load and decrement. As CHART-1 shows, the clear mode is entered only when a cmode signal is not asserted, which is when the cell is not in a search mode. CHART-1 (CLEAR) cmode Whenever "cmode" is not negated, that is, "cmode" is asserted, the control logic 222 (FIG. 3) controls the counter to either be in the LOAD mode or in the DECREMENT mode. The initial value K in the counter is loaded from the length register 215. The loading of the count from register 215 is under control of the counter control 222, which receives a 2-bit TOL value from the tolerance register 217. The tolerance can have a value between zero and three indicated by the binary values of 00, 01, 10, and 11. The counter control 222 also receives the flags from the flag register 216. The counter control 222 also receives the M1.sub.i and the M2.sub.i match lines. Each of the match lines M1.sub.i, M2.sub.i, M3.sub.i, and M4.sub.i, which are collectively referenced as (M1-M4).sub.i, are 2-bit lines. Each 2-bit line can represent the four different values from zero to three, which in binary notations are 00, 01, 10, and 11. The logical combination of the inputs to the counter control 222 that cause the LOAD mode to be asserted are presented in the following CHART-2. CHART-2 (LOAD) ((R+N+O).(M1.sub.i .noteq.0)+(R+N+O).(M1.sub.i .noteq.0)+(TOL.noteq.)).(cmode) Note from CHART-2 that the LOAD mode is asserted only when cmode is asserted. If the LOAD mode is not asserted, then the counter is decremented under the conditions set forth in the following CHART-3. CHART-3 (DECREMENT) (K.noteq.0).(LOAD).(I).(cmode) The counter, under control of the counter control 222 enables use of a character count during various search operations. In FIG. 3, the pattern register 218 stores the character pattern that is to be searched for by the FIG. 3 cell. The mask register 219 stores a mask, which permits any combination of the bits in the pattern register 218 to be ignored. The contents of the pattern register 218 and the mask register 219 are logically ORed together in the comparator 220, before comparson with the contents of the character register 214. Whenever the contents of character register 214 are the same as the contents of the pattern register 218, ignoring any bits masked by the contents of mask register 219, the comparator 220 provides a comparison signal, which provides input (1) to the M1.sub.f logic element 83. The logical relationships governing the logic elements shown in FIG. 3 are given in CHART-4 through CHART-10 below.
______________________________________
CHART-4 (L1.sub.i)
No. Name Logic
______________________________________
(1) TOL TOL .noteq. 0
(2) M2.sub.i
(O).(K .ltoreq. 1 + M2.sub.i > M1.sub.f) + (R).(M2.sub.i
.noteq. 0)
(3) M1.sub.i
(--O).(K .ltoreq. 1 + M1.sub.i > M1.sub.f) + R + N + B
(4) M1.sub.f
else
______________________________________
The logic in each of the logic blocks 80 through 86 is conventional in implementation and is defined in the following manner. Each input to each logic block has a priority determined by its input number. Specifically, the input (1) has the highest priority, the input (2) has the next highest priority and so forth until the input (4) has the lowest priority. If the logical statement for the highest priority input is satisfied, then the value of the input listed in the NAME column has its value selected as the output from the logic block. If the (1) input is not satisfied for any logic block, then the next highest order input, namely the (2) input is examined and if the logic specified is satisfied, then the corresponding value of the input (identified in the NAME column) is provided at the output. This process continues until the highest order input is satisfied and, at the very least, the lowest order input is selected for the output. By way of a specific example, reference is made to the L1.sub.i logic block 80. For the (1) input, the value of TOL is examined and if it is not equal to zero, then the 2-bit value of the TOL signal is stored into the M1 register 87. However, if the value of TOL is equal to zero, then the logical expression for the (2) input is examined. The logical expression for the (2) input requires the logical AND of the OR flag, with the condition that K be less than or equal to 1 or that M2.sub.i be greater than M1.sub.f or the logical AND of the Right flag, and the condition that M2.sub.i not equal to zero. If this condition is satisfied, then the 2-bit value of M2.sub.i is stored into the M1 register 87. If neither the (1) or (2) conditions are satisfied, then the third logical condition is examined as set forth in CHART-4 above. If this condition is satisfied, then the 2-bit value of M1.sub.i is stored into the M1 register 87. If none of the logical conditions for the (1), (2), or (3) inputs are satisfied, then the value of M1.sub.f output from the logic block 83 is stored into the M1 register 87. Similar to logic block 80, each of the other logic blocks 81 through 86 in FIG. 5 processes the inputs in the priority fashion indicated to provide 2-bit output. The Fast Data Finder (FDF) processor is designed to do pattern mathcing particularly in a text search application. A simple pattern matching task might be: Fing the string "RED" within the text "THE SCHOOLS WERE RED . . . " The FDF processor utilizes a pipeline or systolic approach consisting of N "cells" serially connected output to input. Typically, each cell in the pipeline is identical to all other cells in the pipeline. Each cell is programmed to search for a specific character. For example, to find the string "RED" three cells are used with the first one programmed with the character "R", the second one with the character "E", and the third one with the character "D". Communication between cells in very simple. A text character enters each cell on the character line, CHAR. In addition, there are four lines that contain match information, namely M1, M2, M3 and M4. There is also a mode line, INIT, to indicate "initialization" or "compare" mode. Each cell receives information from the cell before it and passes information to the cell after it. During the initialization mode, each cell is programmed with the specific pattern character in each cell. In addition to the pattern field, each cell has a length, eight flags, a mask and a tolerance field, which are also programmed during the initialization mode. These fields are used in the more complicated pattern matching cases. After the initialization mode, the character comparison mode is entered. The text is streamed through the pipeline of cells. Each character of the text is compared with the first cell's pattern to form match information and this match information is passed to the second cell. The second cell's pattern is also compared with the next character of the text and the match information passed to the third cell. At the end of the pipeline, the output match information is checked to determine whether all N cells of the pattern have matched. The text search functions handled by the FDF processor include: 1.0 SIMPLE search: example - "RED" 2.0 FIXED LENGTH DON'T CARES (FLDC): example -"FIRST" followed by exactly 5 characters followed by "WEEK" 3.0 VARIABLE LENGTH DON'T CARES (FLDC): example - "WAY" within 0 to 20 characters of "BLUE" 4.0 VARIABLE LENGTH CARES (VLC): example - "YIPPEE"and "YAHOO" separated by up to 3 "-"s 5.0 OR FUNCTION: example - "DOUG" or "KWANG-I" 6.0 RANGES OF STRINGS: example - "WR" followed by ")" or "I" followed by "TE" 7.0 CHOICE OF VARIABLE LENGTH CARE CHARACTERS: example - "YIPPEE" and "YAHOO" separated by up to 3 "-"s or "e"s 8.0 AND FUNCTION: example - "HAWAII" and "MAUI" before "MOLO" 9.0 NUMERIC RANGING: example - Numbers between 1.08 and 1200.43 Each of these search functions is described in detail including information needed to initialize the cells as well as an explanation of the matching. 1.0 SIMPLE SEARCH In the example mentioned above, a search is made for the string "RED" in the text string, "THE SCHOOL IS RED . . . ". Initialization Mode. In the Simple search, the processor is initialized as follows: 1. Pattern -- character to match 2. Mask -- bits to ignore 3. Tolerance -- initial match value, decremented when mismatches occur 4. Last flag -- indicates last cell of pattern In the example the first cell, c.sub.1, is initialized to contain the pattern "R", the second cell, c.sub.2, to contain the pattern "E" and the third cell, c.sub.3, to contain the pattern "D". Since the cells compare on a bit-by-bit basis, a mask in each cell allows for specified bits to be ignored. In our example, we want to look for an upper or lower case "R", so we set the mask accordingly. The tolerance is set to 0 in every cell except the first, which is set with a tolerance of 1. Since the third cell, c.sub.3, is the last cell of our pattern, it must be programmed with the last flag turned on. The results of initialization appear in the following TABLE 1--1.
TABLE 1-1
______________________________________
SIMPLE INITIALIZATION
c.sub.1
c.sub.2 c.sub.3 c.sub.4
. . .
c.sub.N
______________________________________
PATTERN R E D
MASK U U U A . . .
A
TOLERANCE 1 0 0 0 . . .
0
LAST FLAG OFF OFF ON OFF . . .
OFF
______________________________________
The symbol U in the mask designation means that the mask has one bit set so as to ignore the difference between upper-case and lower-case characters. The symbol A means that all bits of the mask are set. Matching Mode. After initialization, the text string is fed into the pipeline. The M1 value of cell c is set to the cell's tolerance if c's tolerance is nonzero. Otherwise, M1 is set to the incoming M1 value. As each test character enters the pipeline, it is compared with the pattern in the cell. If the incoming character and stored pattern do NOT compare, the match is decremented by one. At the next stage, this M1 value is left in the delay register, while the delay value is carried by M1 to the next cell. Thus, the delay register is nonzero only if the previous character matched this cell's pattern. For cells with the last flag set, the delay register value is placed in M4. M4 otherwise passes directly to the next cell. In this example, the c.sub.1 tolerance value is a 1, and at c.sub.1 the M1 value will, therefore, always be set to a 1. The first character "T", of the text string, "THE SCHOOLS WERE RED . . . " does not match c.sub.1 's pattern "R", causing the M1 value to be decremented to 0. Subsequently, the delay register becomes zero and a zero is passed to c.sub.2. The second character "H", of the text string also causes the same thing to happen and in a similar manner M1 and delay remain 0 until the "R" in "RED" enters the processor. The status of the character, match, and delay registers is given in the following TABLE 1-2.
TABLE 1-2
______________________________________
STAGE 0
______________________________________
c.sub.1 c.sub.2 c.sub.3 c.sub.4
______________________________________
character character character character
E R E
______________________________________
M1 delay M1 delay M1 delay M1 delay
______________________________________
0 0 0 0 0 0 0 0
______________________________________
Now the "R" enters the FDF. As always, M1 is set to the tolerance, but since the "R" DOES match c.sub.1 's pattern, the M1 value of 1 is unmodified. The delay register picks up the M1 value of 0 of the previous stage.
TABLE 1-3
______________________________________
STAGE 1
______________________________________
c.sub.1 c.sub.2 c.sub.3 c.sub.4
______________________________________
character character character character
P E R
______________________________________
M1 delay M1 delay M1 delay M1 delay
______________________________________
1 0 0 0 0 0 0 0
______________________________________
The "E" now enters the pipeline. Again, M1 is set to the tolerance, but the E does not match the pattern "R", so the M1 value is decremented to 0. The M1 is left in the delay register while the second cell receives the delay value of c.sub.1.
TABLE 1-4
______________________________________
STAGE 2
______________________________________
c.sub.1 c.sub.2 c.sub.3 c.sub.4
______________________________________
character character character character
E R E
______________________________________
M1 delay M1 delay M1 delay M1 delay
______________________________________
0 1 0 0 0 0 0 0
______________________________________
The "D" now enters the FDF. Again, c.sub.1 's M1 value is decremented to 0. The "E" moves into c.sub.2 's character register, while c.sub.2 's match register is set to the 1 in c.sub.1 's delay register. Since the "E" matches c.sub.2 's pattern of "E", this 1 remains unmodified. The "R" moves into c.sub.3 's character register and the other registers are set to 0.
TABLE 1-5
______________________________________
STAGE 3
______________________________________
c.sub.1 c.sub.2 c.sub.3 c.sub.4
______________________________________
character character character character
D E R
______________________________________
M1 delay M1 delay M1 delay M1 delay
______________________________________
0 0 1 0 0 0 0 0
______________________________________
At the next step, the "." enters the pipeline. None of the text string characters match the pattern of the cell they are occupying. The M1 values move into the delay registers without modification.
TABLE 1-6
______________________________________
STAGE 4
______________________________________
c.sub.1 c.sub.2 c.sub.3 c.sub.4
______________________________________
character character character character
. D E R
______________________________________
M1 delay M1 delay M1 delay M1 delay
______________________________________
0 0 0 1 0 0 0 0
______________________________________
The "D" now moves into the third cell, where it matches the pattern. The M1 value is not decremented.
TABLE 1-7
______________________________________
STAGE 5
______________________________________
c.sub.1 c.sub.2 c.sub.3 c.sub.4
______________________________________
character character character character
. . D E
M1 delay M1 delay M1 delay M1 delay
______________________________________
0 0 0 0 1 0 0 0
______________________________________
Finally, the first "." moves into the third cell, where c.sub.3 's delay gets the M1 value of 1. Since the last flag is set, the M4 value is set to the delay value of 1. Thus, the final match value of 1 is carried on the M4 line and moves through the pipeline with the first ".", the character following "RED".
TABLE 1-8
______________________________________
STAGE 6
______________________________________
c.sub.1 c.sub.2 c.sub.3 c.sub.4
______________________________________
character character character character
. . . D
______________________________________
M1 delay M1 delay M1 delay M1 delay
______________________________________
0 0 0 0 0 1 0 0
M4 M4 M4 M4
0 0 0 0
______________________________________
MATCH FOUND!!!
______________________________________
The flow of information is summarized in the following table. Each O or X represents one register in the cell. At the first cell, M1 is set to the tolerance value. M1 passes through the delay register at each character of the pattern. At the third cell, with the last flag set, M1 moves down to the M4 RESULT line.
TABLE 1-10
______________________________________
R E D
______________________________________
M1 X X X X X X 0 0 0 0 0 0 0 0 0 0
M2 0 0 0 0 0 0 0 0
M3 0 0 0 0 0 0 0 0
M4 0 0 0 X X X X X
______________________________________
2.0 FIXED LENGTH DON'T CARE (FLDC) SEARCH The mask allows for specified bits of the pattern to be ignored in the comparison. If all of the bits are "masked out", every text character will "match" the pattern. This allows us to find, for example, "FIRST" followed by exactly five characters followed by "WEEK". Initialization Mode. The initialization is identical to the SIMPLE Search, except for the mask, where all bits masked are represented by an "A" as shown in the following TABLE 2-1.
TABLE 2-1
______________________________________
RESULTS OF INITIALIZATION
c.sub.1 c.sub.2
c.sub.3
c.sub.4
c.sub.5
c.sub.6
c.sub.7
c.sub.8
c.sub.9
c.sub.10
c.sub.11
c.sub.12
c.sub.13
c.sub.14
______________________________________
PAT. FI R S T * * * * * W E
E K
MASK
U U U U U A A A A A U U U U
TOL.
10 0 0 0 0 0 0 0 0 0 0 0 0
LAST
00 0 0 0 0 0 0 0 0 0 0 0 0
______________________________________
Matching Mode. Identical to SIMPLE CASE. 3.0 VARIABLE LENGTH DON'T CARE (VLDC) Suppose we want to search for "WAY" within 0 to 20 characters of "BLUE". We use one cell to handle the VLDC with the length set to 21, which s placed between "WAY" and "BLUE". Initialization Mode. In the VLDC case, we must initialize the following. 1. Pattern 2. Mask 3. Tolerance 4. Last flag 5. Length -- initial counter value, decremented with each character, reset with incoming match 6. Pass -- bypasses the delay when set. The length is set to 1 on all cells except the fourth, which is set to 21. Cell c.sub.4 also has the pass flag set, since we also want to be able to find 0 characters between "WAY" and "BLUE". Notice that cell c.sub.4 also has all bits masked.
TABLE 3-1
______________________________________
RESULTS OF INITIALIZATION
c.sub.1
c.sub.2
c.sub.3
c.sub.4
c.sub.5
c.sub.6
c.sub.7
c.sub.8
______________________________________
PATTERN W A Y * B L U E
MASK U U U A U U U U
TOLERANCE 1 0 0 0 0 0 0 0
LENGTH 1 1 1 21 1 1 1 1
PASS 0 0 0 1 0 0 0 0
LAST FLAG 0 0 0 0 0 0 0 1
______________________________________
Matching Mode. Table 3--3 illustrates the information flow. Tolerance moves into M1 at the first cell. The tolerance value in M1 passes through the delay to the subsequent cells, and then down to M4 at the last cell. The exception to this is at the fourth VLDC cell. Since the pass flag is set here, the delay is skipped and the M1 tolerance value passes directly to the subsequent cells. The other new concept here is the length. When a nonzero M1 value enters the fourth cell, the length is loaded into the counter. The counter counts down to 0 (indicating that the selected number of characters have occurred since "WAY" was found) before the M1 value is decremented.
TABLE 3-2
______________________________________
Variable Length Don't Care (VLDC)
EXAMPLE: WAY within 20 characters of BLUE
PATTERN SETUP
______________________________________
PATTERN W A Y * B L U E
MASK U U U A U U U U
TOL 1 1 1 21 1 1 1 1
LENGTH 0 0 0 1 0 0 0 0
LAST 0 0 0 0 0 0 0 1
______________________________________
4.0 VARIABLE LENGTH CARE (VLC) One may want to specify that only a specified character can be between our two strings. For example, "YIPPEE" and "YAHOO" separated by up to 3 "-"s. Initialization Mode. Identical to the VARIABLE LENGTH DON'T CARE case except for the pattern and mask on the VLC cell. In this case, the seventh cell has a definite pattern of "-" and is not masked.
TABLE 4-1
______________________________________
RESULTS OF INITIALIZATION
c.sub.1
c.sub.2
c.sub.3
c.sub.4
c.sub.5
c.sub.6
c.sub.7
c.sub.8
c.sub.9
c.sub.10
c.sub.11
c.sub.12
______________________________________
PATTERN Y I P P E E ! Y A H O
O
MASK U U U U U U N U U U U U
| ||||||
