High throughput system for encryption and other data operations6870929Abstract According to one embodiment, an encryption system (500) includes an input buffer (504) that can provide data blocks from different contexts (522-1 to 522-n) to a selected encryption circuit (524-1 to 524-m) according to a scheduling section (502). A scheduling section (502) can include a register array (510) having rows that each correspond to a context and columns that correspond to an encryption circuit. Each register array (510) row can store one "hot" bit that designates a context with a particular encryption circuit. A column can be selected by a multiplexer (514) and its values prioritized and encoded by a priority encoder (518) to generate an address that results in the selection of a data block from a particular context. Priority may be varied by shifting a column value with a rotate circuit (516) according to an offset value (OFFSET). Claims What is claimed is: Description TECHNICAL FIELD
for (yy=1, aa=A, e!=0) {
if(e&1) yy=(yy*aa) mod n
aa=(aa*aa) mod n
e=e>>1
}.
The step e&1 examines a particular bit of the value e. The step e=e>>1 moves to the next bit of e. The last value yy will be the desired result. In this arrangement, the two operations yy=(yy*aa) mod n and aa=(aa*aa) mod n are computations that (apart from the first iteration) utilize the previously computed yy and aa results from the previous loop iteration. If such a computation is implemented in a pipelined circuit, and the latency of the circuit is greater than the rate at which values are applied to the circuit, each operation must "wait" until the previous result has fully propagated through the pipeline. This can result in delays and/or times at which various pipeline segments are idle. SUMMARY OF THE INVENTION According to one embodiment, an encryption system can include an input buffer that receives data blocks. The data blocks can be organized into a number of contexts. According to a scheduler, data blocks from different contexts can be applied to an encryption circuit having pipelined cipher stages. According to one aspect of the embodiment, a scheduler can include a column of storage locations, each corresponding to a context. The values of a column are prioritized to designate one particular context with the encryption circuit. According to another aspect of the embodiment, a system can include more than one encryption circuit, and a scheduler can include a storage array having rows and columns. Rows can correspond to a particular context and columns can correspond to a particular encryption circuit. Columns can be selected and the values therein prioritized to designate a context with a particular encryption circuit. According to another aspect of the embodiment, an encryption circuit can include a feedback loop. By applying data blocks from different contexts data blocks can be processed by the encryption circuit in a pipelined fashion. While the term "encryption" is used throughout this description, it is understood that "encryption" can include both encryption and decryption. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram of a first embodiment. FIGS. 2A to 2C are tables illustrating the operation of a scheduler according to one embodiment. FIG. 3 is a timing diagram illustrating the operation according to the scheduling shown in FIGS. 2A and 2C. FIG. 4A is a timing diagram illustrating the operation of an embodiment. FIG. 4B is a diagram illustrating one buffer arrangement that can correspond to FIG. 4A. FIG. 5 is a block diagram of a second embodiment. FIG. 6 is a block diagram of a conventional encryption circuit. FIG. 7 is a block diagram of another conventional pipelined encryption circuit. FIG. 8 is a block diagram of a conventional pipelined encryption circuit. FIG. 9 is a block diagram of a conventional CBC mode DES circuit. FIG. 10 is a block diagram of a conventional encryption circuit having a reduced number of stages. DETAILED DESCRIPTION OF THE EMBODIMENTS Various embodiments of the present invention will now be described in conjunction with a number of diagrams. The various embodiments include an encryption system that can provide higher throughput than other conventional approaches. In particular embodiments, multiple data blocks can be pipelined across one or more encryption circuits. Such an arrangement can allow a new encrypted block to be generated on each operational cycle, where a cycle can be as small as one clocked cipher stage within an encryption circuit. Referring now to FIG. 1, a block diagram is set forth illustrating a first embodiment. The first embodiment is designated by the general reference character 100, and is shown to include an encryption circuit 102, an input buffer/working store 104, an output buffer 108, and a scheduler 106. An encryption circuit 102 can include a number of cipher stages that enable pipelined operation. The encryption circuit 102 can process a given input data block with a latency L, where L=nT. The value n can be the number of cipher stages, and the value T is the clock period of the system, which will be no smaller than the delay introduced by the slowest cipher stage. The input buffer/working store 104 can include various storage circuits that store data blocks from multiple data streams. Each data stream can include one data block, or a sequence of data blocks having a particular order. Each such data block and/or sequence of data blocks will be referred to herein as a "context." As just one example, each context can represent data from a particular network packet. An input buffer/working store 104 can be implemented in a variety of forms. As but two of the many possible examples, and input buffer can include first-in-first-out (FIFO) memory device(s) or random access memory (RAM) device(s). The output buffer 108 can include storage circuits corresponding to those in the input buffer/working store 104. In particular, the output buffer 108 can store out-going (encrypted) blocks in the same general fashion as incoming (non-encrypted) blocks. An output buffer 108 may be constructed in the same fashion as the input buffer/working store 104, or be formed from different types of storage circuits than the input buffer/working store 104. The scheduler 106 determines the order in which the data stored in the buffer 104 will be processed. The scheduler may take a variety of forms. In one particular embodiment, the scheduler 106 can include an array of bits, with one bit corresponding to each context. If a context has data that can be processed, its corresponding bit can have one value ("1" for example). If a context does not have data to be processed, its corresponding bit can have another value ("0" for example). Referring now to FIGS. 2A-2C, a sample of a scheduler operation is illustrated in a series of tables. Each table includes a column identifying a particular context (CTXT) and a valid column (VALID) corresponding to each context. A particular context can be selected by prioritizing VALID bits on a rotating basis. FIGS. 2A to 2C illustrate three consecutive system cycles, where prioritization begins with context zero. In FIG. 2A contexts 0, 1, 3 and 4 are active. Because prioritization begins with context zero, a data block can be accessed from context zero and applied to an encryption circuit. In a subsequent cycle, as shown by FIG. 2B, because context 0 has been accessed, its corresponding valid bit is changed to zero. In addition, the priority changes on a "round robin" basis. As a result, context 1 has priority. Consequently, the data block from context 1 is applied to an encryption circuit. In FIG. 2C, because context 1 has been accessed, its corresponding valid bit is changed to zero. In addition, the priority changes once again. Because the valid bit of context 2 is zero, it is not accessed. The next context that is valid, context 3 in the example of FIG. 2C, can provide a corresponding data block to an encryption circuit. It is noted that priority may be established in a variety of ways within a scheduler. As one example, priority can be given to the "oldest" context request. As another example, priority may rotate by one context each system cycle. As yet another of the many possible examples, priority may rotate by more than one context by encoding any "invalid" contexts between a selected context and the next valid context, and then rotating so that the next priority begins at the next valid context. A timing diagram illustrating an operation corresponding to that shown in FIGS. 2A to 2C is shown in FIG. 3. FIG. 3 is a timing diagram illustrating the inputs to an encryption circuit (B.sub.i) and the outputs of an encryption circuit (C.sub.i). The variable i indicates a particular context. As shown by FIG. 3, once an input data block is applied, it will be provided in encrypted form a latency L later (where L is determined by the encryption circuit). Referring now to FIG. 3 in conjunction with FIG. 1, in one particular arrangement, in each cycle, the scheduler 106 can provide an address BUFF_ADD for the first buffer 104 on consecutive cycles. A corresponding buffer read signal (RD) can be activated if a valid block of data is available. A data block can then be read into encryption circuit 102. A latency L later, an address can be applied to output buffer 108 along with a write signal (WR). The address applied to output buffer 108 can ensure that an encrypted data block is placed into an output buffer context that corresponds the input buffer context from which the original data block originated. In the particular arrangement of FIG. 1, the address applied to the output buffer 108 can be the same address that is applied to the input buffer 106, but delayed by a latency L. Similarly, the write signal (WR) can be the read signal (RD) delayed by the latency L. Of course, one skilled in the art would recognize that alternate embodiments can generate buffer addresses, read commands and write commands according to a series of instructions executed by a special or general purpose processor. It is noted that in the event accesses to an input buffer and/or to an output buffer take longer than the delay of longest pipeline stage (T) in encryption circuit 102, data blocks can be applied at appropriate multiples of T. That is, if a buffer access time is less than 4T but greater than 3T, data blocks can be read at periods of 4T. Still further, in the event accesses to an input buffer and/or to an output buffer take longer than the delay of longest pipeline stage (T), a buffer can include individually accessible storage devices accessed by staggered clocks generated by a phased-lock-loop (PLL) or delay-locked-loop (DLL) circuit or the like. In this way, pipelining can occur with a period T even when accesses to a buffer are greater than T. In this way, a scheduler can apply data blocks from different contexts in a pipelined fashion to an encryption circuit. This is in contrast to a conventional approach that may process one series of data blocks at a time. The present invention may thus more evenly distribute encryption resources over multiple data block sequences, preventing particular one sequence of data blocks from being "stuck" behind another data block sequence as it is encrypted. An encryption circuit 102 can also be capable of "feedback"-type encryption functions, such as cipher block chaining (CBC), cipher feedback (CFB) or output feedback (OFB) modes of the data encryption algorithms such as DES and Triple DES, or any of various secure hash algorithms. In such an arrangement, consecutive blocks from a context can be applied a predetermined latency from one another, where the predetermined latency is that of the encryption circuit. For example, in a DES CBC mode, a latency can be one pass through an encryption pipeline. Referring now to FIG. 4A, a timing diagram is set forth illustrating an encryption operation according to one embodiment where multiple contexts are encrypted with a feedback-type method in a pipelined fashion. FIG. 4B illustrates contexts that correspond to the operation of FIG. 4A. In FIG. 4A, data blocks from different contexts are given a particular letter designation and number designation. The letter designation indicates a context of origin, the number designation indicates how many blocks have previously been processed for the context in question. Thus, a first context can provide data block A1, followed by data block A2, followed by data block A3, etc. Further, if it is assumed that CBC is employed, the encrypted form of data block A1 (designated as E[A1]) is an input that is used together with data block A2 to encrypt data block A2. In general, each context will have its own encryption/decryption key (or, in the case of Triple-DES and similar algorithms, set of encryption/decryption keys). The keys for all active contexts are stored and retrieved at appropriate times as seen below. In FIG. 4B, five contexts are illustrated as items 400-1 to 400-5. Five data block sequences are shown as A1 to A4, B1 to B3, C1 to C5, D1 to D3 and E2 to E2, corresponding to contexts 400-1 to 400-5, respectively. Context 400-5 includes three blank leading blocks that are "blank." This can indicate that while data blocks A1, B1, C1 and D1 are immediately available, data block E1 will not be available for another three cycles. The example of FIGS. 4A and 4B assumes that an encryption circuit (such as 102) has a pipeline of four stages. Further, the encryption type is a CBC mode DES. Referring now to FIG. 4A, initial priority can be determined in the same general fashion as the example described in FIGS. 2A, 2B and 3. Consequently, in the first four cycles, data blocks originating from the first four contexts are read from an input buffer/working store. This is shown as data blocks A1, B1, C1 and D1 being input at times t1 to t4. Simultaneously with block A1 input to the encryption circuit, key KA is retrieved from a key store and input to the encryption circuit. Similarly, simultaneously with block B1 input to the encryption circuit, key KB is retrieved from a key store and input to the encryption circuit; and so forth. In the particular case of a CBC mode, a "seed" data block value (or "initial vector") can be combined with the initial data block of a sequence prior to being applied to the encryption pipeline. Such an operation may be accomplished with a multiplexer or the like. Initial vectors are shown as IVA to IVD in FIG. 4A. Once four values are read into an encryption pipeline, corresponding second data blocks must be read in a predetermined order to ensure proper feedback-type encryption. Because more data blocks are present in the sequences corresponding to contexts 400-1 to 400-4, data blocks A2, B2, C2 and D2 are input at times t5 to t8. At the same time, encrypted data values E.sub.KA [A1,IVA], E.sub.KB [B1,IVB], E.sub.KC [C1,IVC] and E.sub.KD [D1,IVD] are provided as output values and, internal to the encryption circuit, as feedback values for combination with data blocks A2, B2, C2 and D2, respectively. A scheduler 106 can be programmed to provide appropriate priority to ensure feedback-type encryption operations. In particular, the active contexts can be stored, and on consecutive cycles, priority can be shifted to give the desired context priority. As shown in FIG. 4A, at time t14, priority can be shifted to give data block E1 priority. Further, one skilled in the art would recognize that the feedback loop in an encryption circuit would be disabled on this cycle to prevent the E.sub.KB [B3] value from being combined with the E1 value. In an alternate embodiment, a system may include as many contexts as there are pipeline stages. Each context can be accessed sequentially. In the event a context does not include a data block, a read from the input buffer and write to the output buffer can be suppressed. In this way, an encryption system can provide an encrypted data block in each system cycle for feedback-type encryption. This is in contrast to a conventional approach that may supply a first data block of a sequence to an encryption circuit and then supply the second block a predetermined time later, limited by the latency of the encryption process on the first data block. Thus, the present invention can process a data block on each system cycle (provided sufficient contexts are active) even when the encryption function includes a feedback loop. While the above description has described the particularly useful application of the invention to encryption, the described embodiments could also be utilized in other computations, such as modular exponentiation, as but one example. As one very particular example, if the method described in the background above is employed to compute y=(A.sup.e)mod n, a modular multiply computation circuit (in place of the encryption circuit 102) could provide the yy=(yy*aa) mod n operation and/or the aa=(aa*aa) mod n operation. Of course, the scheduler operation could be adjusted to ensure that the yy=(yy*aa) mod n operation is performed only for iterations corresponding to an "e" bit value equal to one. Various other methods for computing modular exponentiation could be employed, including Montgomery's method and Barrett's algorithm, to name but two examples. Modular exponentiation can be particularly useful in public key algorithms such as "Diffie-Hellman" and RSA, to name but two examples. However, one skilled in the art would recognize that modular exponentiation represents but one of many other possible computations that could be performed with high throughput, multi-context pipelining according to the present invention. Referring now to FIG. 5, a second embodiment is set forth in a block diagram. The second embodiment is designated by the general reference character 500 and is shown to include a scheduling section 502, an input buffer/working store 504, an encryption section 506 and an output buffer 508. A scheduling section 502 can include a register array 510 having n rows and m columns. The variable n can be the number of contexts in an input buffer/working store 504. As but one example, n rows can correspond to n FIFO pipelines storing data blocks for encryption. The variable m can be the number of parallel encryption circuits within encryption section 506, where each encryption circuit is capable of processing one data block per m cycles, such that encryption section 506 can process in aggregate one data block per cycle. The various rows of the register array 510 can be loaded on a row-by-row basis by operation of load circuit 512. The load circuit 512 may load a row of the register array 510 according to a current address (ADD_CURR) and current data (DATA_CURR) or alternatively, according to a next address (ADD_NXT) and next data (DATA_NXT). In the particular arrangement of FIG. 5, each row of register array 510 can include m bits, and either all bits will be "0," or one or more bit will be a "1." If row i stores 0s, this can indicate that context i in input buffer/working store 504 does not hold valid data that is ready to be launched into the encryption pipeline. This may be because context i is idle with no data to encrypt, or because a data block for context i is "in flight" in one of the encryption pipelines and its feedback information is not yet available to start the next data block for context i. If the bit in column j of row i stores a 1, the 1 value designates that context i is ready to launch a data block into a encryption pipeline j. Note that more than one bit in a row i may be set to "1" at the same time. This state would indicate that the next data block to be encrypted for context i can be encrypted by any of several encryption pipelines. If all m encryption pipelines in 506 are the same, then in general all bits in a row would be set to "1" at the same time. On the other hand if the encryption circuits 524-1, 524-2, . . . , 524-m implement heterogenous functions, then the group of bits set to "1" simultaneously in a particular row will correspond to the subset of the m pipelines that implement the function required by the context in question. As one skilled in the art would recognize, any of the m pipelines might also retain state from one data block to the next. In this scenario a context would need to continue using the same encryption pipeline for all data blocks in a sequence of blocks for which state information is kept. Then the context in question, when ready to launch a data block, would activate only the "1" bit for the encryption pipeline holding the associated state. As shown in FIG. 5, each column of the register array 510 can be applied in parallel to a multiplexer 514. Each particular set of column bits can be conceptualized as a list of all current context "requests" for a particular resource (a particular encryption circuit in this case). One set of column bits is selected by a selection value (SEL_RES). The SEL_RES values can correspond to an available encryption pipeline. The selected column of n bits includes one bit for each context, indicating whether or not that context is ready to use the selected resource on this clock cycle. A selected column of bits can then be applied to a rotate circuit 516. A rotate circuit 516 may also receive an offset value OFFSET. The rotate circuit 516 can rotate the n-bit vector that represents the selected column. The rotate operation selects which context will be considered highest-priority. The output of rotate circuit 516 can be provided to a priority encoder 518. The priority encoder 518 can examine its n-bit rotated input and select the lowest-numbered (or highest-numbered) "1" bit from its input, ignoring the "0" bits in its input. The encoded value is then added, in adder 520, to the original offset value to generate an input buffer address ADD_IBUFF. The input buffer/working store 504 can include the same general constituents and variations as input buffer/working store 104 of FIG. 1. Input buffer/working store 504 includes diagrammatic representations of input contexts 522-1 to 522-n which can correspond to the rows of register array 510. The encryption section 506 is shown to include encryption circuits 524-1 to 524-m, which can correspond to the columns of register array 510. Each encryption circuit (524-1 to 524-m) can provide the same type of encryption, or alternatively, one or more encryption circuits (524-1 to 524-m) can provide a different type of encryption or other pipelined operation on its input data block. In this way, even higher throughput and/or different encryption or processing streams can be provided. The output buffer 508 can include output contexts 526-1 to 526-n, which can correspond to input contexts 522-1 to 522-n. Thus, a data block originating from input context 522-1 can be stored as an encrypted data block in output context 526-1, and so on for each context. In this way the encryption of data blocks from multiple contexts can be distributed in a pipelined fashion across multiple encryption circuits. Scheduling, such as that described in conjunction with FIGS. 4A and 4B can be used to accomplish pipelined feedback-type encryption operations. One particular feedback structure is set forth in FIG. 5. A feedback bus 528 can couple output buffer 508 data to a combining circuit 530 (in this particular example an XOR circuit). In the case of a feedback-type algorithm, a read operation can be performed on the output buffer 508. The read data (which can be a previously encrypted data block) can then be combined with an "incoming" data block in combining circuit 530. The load circuit 512 can have two functions. First, when a previously idle context becomes active and has data that it wishes to encrypt, a circuit external to encryption system 500 will write a row of bits for that context into the scheduler 510 using the ADD_CURR and DATA_CURR inputs to load circuit 512. Additionally a WRITE_CURR signal will be activated to instruct the load circuit to load the DATA_CURR bits into the row specified by ADD_CURR. When an active context launches a data block into an encryption circuit, all the bits in its scheduler row will be reset to "0" during the period that the data block is "in flight." Once the encryption of the data block is complete then one or more bits of the row are set to "1". This may be accomplished by writing to the load circuit 512, at the time of launching a data block into an encryption circuit, the vector of bits that is to be loaded to the context's row after the data block has completed. The vector of bits is written to the DATA_NXT input and the context number is applied to the ADD_NXT port. When the load circuit 512 receives a signal WRITE_NXT, it can reset the bits of context number ADD_NXT to zero, and store in a holding register for that context the bits applied on DATA_NXT. Later, when the data block for that context has completed, the encryption pipeline can drive the context number of the completed block on a TRIGGER input to load circuit 512, instructing the load circuit to move the vector from the holding register to the scheduler row. In one particular arrangement, a TRIGGER input can be activated in synchronism with a system clock signal. It is understood that while particular types of encryption pipelines have been described (i.e., various DES modes) other types of computation stages could be employed. The present invention may be particularly advantageous when utilized with "feedback" type computations (i.e., computations that perform an operation on a previously calculated result and a value being input into the computation stage). Among the many possible computations are the Secure Hash Algorithm (SHA) and Message Digest 5 (MD5). Thus, it is thus understood that while the preferred embodiments set forth herein have been described in detail, the present invention could be subject various changes, substitutions, and alterations without departing from the spirit and scope of the invention. Accordingly, the present invention is intended to be limited only as defined by the appended claims.
|
Same subclass Same class Consider this |
||||||||||
