|
|
|
Batch or transaction processing |
Bus arrangements for interconnection of discrete and/or integrated modules in a digital system and associated method6321285
Abstract
Bus arrangements for interconnecting a number of discrete and/or integrated modules in a digital system are described herein. Implementations of the bus arrangements are contemplated at chip level, forming part of an overall integrated circuit, and are also contemplated as interconnecting discrete modules within an overall processing system. These bus arrangements and associated method provide for high speed, efficient digital data transfer between the modules through optimizing bus utilization by eliminating the need for maintaining a fixed time relationship between the address and data portions of transactions which are executed by the system. In this manner, the bus arrangement is capable of supporting more active transactions than the number of individual buses which make up the bus arrangement. Systems disclosed may include any number of individual buses within their bus arrangements. In one implementation, a system includes a single address bus and two or more data buses such that different data transfers may be executed simultaneously on each data bus.
Claims
What is claimed is:
1. In a digital system including a bus arrangement having a number of separate buses which interconnect at least three modules such that one of said buses serves at least as an address bus and any remaining buses serve as data buses, a method of executing a plurality of transactions involving said modules on said bus arrangement wherein each transaction includes an address period which defines an associated data transfer between said modules, said method comprising the steps of:
a) initiating the address periods for said transactions on said address bus such that each transaction is active in the system until such time that its associated data transfer is completed; and
b) executing said data transfers on said bus arrangement such that all of the initiated data transactions are active at one point in time and so that the number of active transactions is greater than the number of said separate buses.
2. The method according to claim 1 wherein the associated data transfer for a first one of said transactions, which first transaction is initiated prior to a second one of said transactions, is completed on said bus arrangement after the associated data transfer for said second transaction has been completed on said bus arrangement.
3. The method according to claim 1 wherein said step for executing said data transfers includes the steps of (i) separating each data transfer into a series of discrete data packets and (ii) transferring the data packets over said bus arrangement such that data packets from a first transaction alternate in a controlled way with data packets from a second transaction.
4. The method according to claim 3 wherein said step for try O,Erring said data packets includes the step of alternating the packets in said controlled way based upon certain criteria.
5. The method according to claim 1 wherein said one bus serves as a multiplexed bus which carries said address periods and said data transfers in a multiplexed manner.
6. The method according to claim 1 wherein said bus arrangement includes one data bus in addition to said one bus which serves solely as an address bus and wherein said address periods are performed on said address bus and said data transfers are performed on said data bus such that a total of two buses are present and wherein at least three transactions are active at said one point in time on the two buses.
7. The method according to claim 1 wherein said bus arrangement includes at least two data buses in addition to said one bus which serves solely as an address bus and said executing step includes the step of executing a first data transfer associated with a first transaction on one data bus while simultaneously executing a second data transfer associated with a second transaction on the other data bus.
8. In a digital system including a bus arrangement having one or more buses which interconnect at least three modules, a method of executing at least two transactions involving all of said modules on said bus arrangement wherein each transaction includes an address period which defines an associated data transfer between said modules, said method comprising the steps of:
a) performing the address periods for said transactions on one of said buses; and
b) executing said data transfers in corresponding data intervals on a particular one of said buses such that a first data interval associated with a first data transfer includes at least one idle period during which no data associated with the first data transaction is transferred, and said first data interval further includes at least one data period following said idle period during which data is transferred over that particular bus, and so that a second data interval associated with a second data transfer includes at least one data period during which data is transferred over that particular bus, said first and second data transfers being executed in timed relation so that said idle period of the first data interval occurs for the duration of said data period of the second data interval so as to perform the first and second data transfers in an interleaved manner on said particular bus.
9. The method according to claim 8 wherein said bus arrangement includes only one bus which serves as a multiplexed address and data bus and wherein said address periods and said data transfers are executed on said multiplexed bus.
10. The method according to claim 8 wherein said system includes an address bus and at least one data bus and wherein said address periods are performed on said address bus and said data transfers are executed on said data bus.
11. A digital system comprising:
a) at least three modules;
b) a bus arrangement having a number of separate buses which interconnect said modules such that one of said buses serves at least as an address bus and any remaining buses serve as data buses;
c) an arrangement for defining a plurality of transactions involving all of said modules on said bus arrangement such that each transaction includes an address period which is initiated by one module addressing another module and which defines an associated data transfer between the two modules, each transaction being active from its initiation until such time that its associated data transfer is completed; and
d) a control arrangement for executing said data transfers on said bus arrangement such that all of the initiated data transactions are active at one point in time wherein the number of active transactions is greater than the number of said separate buses.
12. The system of claim 11 wherein said control arrangement includes an arrangement for separating each data transfer into a series of discrete data packets and for transferring the data packets over said bus arrangement such that data packets from a first transaction alternate in a controlled way with data packets from a second transaction.
13. The system of claim 12 wherein said control arrangement transfers said data packets in said controlled way based upon certain criteria.
14. The system of claim 11 wherein the data transfer for each transaction is performed during a respective data interval and wherein said control arrangement is configured for executing data intervals corresponding to at least two active transactions on a particular one of said buses such that a first data interval associated with a first data transfer includes at least one idle period during which no data associated with the first data transaction is transferred, and said first data interval further includes at least one data period following said idle period during which data is transferred over that particular bus, and so that a second data interval associated with a second data transfer includes at least one data period during which data is transferred over that particular bus, said first and second data transfers being executed in timed relation so that said idle period of the first data interval occurs for the duration of said data period of the second data interval so as to perform the first and second data transfers in an interleaved manner on said particular bus.
15. In a digital system including a bus arrangement which interconnects at least two modules, a method of executing a transaction in solving said modules on said bus arrangement wherein said transaction includes an address period which defines an associated data transfer between the two modules, said method comprising the steps of:
a) specifying a first one of said modules as a source module and a second one of said modules as a destination module such that said data transfer will pass from the source module to the destination module over said bus arrangement;
b) initiating said address period on said bus arrangement using one of said modules; and
c) controlling said data transfer on said bus arrangement using said source module so as to execute the data transfer irrespective of which module initiated said address period.
16. The method according to claim 15 including the step of using said source module to determine a specific time at which to execute said data transfer which specific time is subsequent to performing said address period.
17. The method according to claim 16 wherein said step for determining said specific time includes the step of establishing the availability to the source module of the data which makes up said data transfer such that said data transfer is initiated after said data is established as available.
18. A digital system comprising:
a) at least two modules;
b) a bus arrangement interconnecting said modules;
c) an arrangement for defining a transaction involving said modules on said bus arrangement such that said transaction includes an address period in which a first module addresses a second module and which defines an associated data transfer between the two modules;
d) an arrangement for specifying one of said modules as a source module and the other one of said modules as a destination module based on said transaction such that said data transfer will pass from the source module to the destination module over said bus arrangement; and
e) a control arrangement forming part of said source module for controlling the execution of said data transfer on said bus arrangement irrespective of which module initiated said address period.
19. The system of claim 18 wherein said control arrangement includes an arrangement for determining a specific time at which to execute said data transfer which specific time is subsequent to performing said address period.
20. The system of claim 19 wherein said arrangement for determining said specific time includes an arrangement for establishing the availability of the data which makes up said data transfer such that said data transfer is initiated only after said data has been established as available.
21. A method of operating a digital system including a plurality of components which are interconnected in a predetermined way by a bus arrangement, said method comprising the steps of:
a) storing certain information associated with each component at a predetermined location;
b) using said certain information in controlling a series of transactions between said components on said bus arrangement, each transaction including an address portion defining an associated data portion, said control arrangement controlling the execution of said data portions on said bus arrangement based on said certain information.
22. The method of claim 21 wherein: (i) said bus arrangement includes a bus which serves at least as a data bus, (ii) each data portion includes a plurality of data periods, and (iii) said step for using said certain information in controlling the execution of said data portions includes the step of alternating data periods associated with a first data transaction with data periods from a second data transaction on said bus.
23. In a digital system including a plurality of modules, a method comprising the steps of:
a) providing at least two separate, independently controllable memory arrangements for storing digital information each of which is connected to one of two memory controllers for controlling each of the memory arrangements;
b) interconnecting at least one processing module with the two memory controllers using a bus arrangement including an address bus interconnecting said processing module and each said memory controller, and at least two, separate data buses each of which is connected with both of said memory controllers and both of which are connected with said processing module; and
c) controlling the interaction of said processing module and said memory arrangements such that a data transaction using either one of the memory arrangements may be performed using either of the data buses.
24. A digital system comprising:
a) at least first, second, third and fourth components; and
b) a bus arrangement having at least one address bus and at least first and second data buses for (i) interconnecting the components in a predetermined way configured for performing on said bus arrangement at least a first address transaction between said first and second components and at least a second address transaction between said third and fourth components such that said first and second address transactions define respective first and second data transfers and (ii) simultaneously executing, at least for a duration of time, said first data transfer between said first and second components on said first data bus and said second data transfer between said third and fourth components on said second data bus.
Description
BACKGROUND OF THE INVENTION
The present invention relates generally to a bus arrangement which interconnects a number of modules or components in a digital system and more particularly to bus arrangements and associated methods for providing high speed, efficient digital data transfer between the modules. Implementations of the bus arrangements are contemplated at chip level, forming part of an overall integrated circuit, and are also contemplated as interconnectig discrete modules within an overall processing system.
Many bus structures and associated operating methods typically employ an address bus and a data bus wherein transactions executed on the bus structure include an address operation and an associated data operation. Normally, the address operations are transacted on the address bus and the associated data operations are transacted on the data bus in such a way that a fixed time relationship is maintained between the address and data operation of a particular transaction. In most instances, the bus structure operates in such a way that the data operation places data on the data bus during the associated address operation. As will be seen hereinafter, this fixed relationship requirement introduces inherent limitations with regard to system performance. Some prior art bus structures use a single bus for both address and data operations, in which case the data operation immediately follows the address operation i.e., another fixed time relationship.
The above described requirement for maintaining a fixed time relationship between the address and data operation of a particular transaction, in and by itself, reduces the efficiency of bus utilization, particularly with regard to the data bus. As one example of a read transaction, a CPU addresses a particular peripheral device on the address bus thereby requesting data. In this instance, the CPU typically holds the data bus while the peripheral fetches the requested data for delivery via the data bus. During this "bus hold time" the data bus is not utilized in an efficient manner since no data is transmitted. Moreover, the addressing operation is itself extended in duration by the length of the bus hold time in waiting for data to appear on the data bus. Significant bus hold times may be encountered, for example, in the case of read transactions involving peripheral devices accessing dynamic random access memory (hereinafter DRAM). As one example of a bus hold delay involving DRAM, it is well known that DRAM must be refreshed periodically in order to maintain the data stored therein. In the instance where a peripheral such as, for example, a CPU attempts to perform a read (or, for that matter, a write) during the refresh cycle, a bus hold delay is produced until such time that the refresh cycle ends. As another example of a bus hold delay involving DRAM, the CPU may attempt to access the DRAM while another peripheral is actually using the DRAM. Thus, the CPU must wait so as to introduce a bus hold delay. Read transactions, in general, introduce bus hold delays since essentially no device is capable of instantaneous response to a read request. One of skill in the art will appreciate that system performance is directly dependent upon the efficiency of bus utilization. Other types of transactions introduce bus hold delays with consequent adverse effects on system performance, as will be described.
Write transactions performed by a CPU may also introduce bus hold delays which are similar in nature to those which are introduced by read transactions. As a specific example, the head of a fixed disk must be moved to the appropriate location at which data is to be written. This write access time constitutes a bus hold delay.
"Slaving" operations serve as still another example of non-optimum bus utilization by causing bus hold delays. In particular, a master module which requests data from a slave module typically holds the address bus and the data bus at least until the slave transmits the requested data. Unfortunately, the slave module may not have the requested data immediately at hand for any number of different reasons such as, for example, its need to prepare the requested data by performing certain processing steps. For purposes herein, the term "master module" refers to the module making a request (i.e., read or write) and the term "slave module" refers to the module that is the recipient of that request.
It should be appreciated that the discussion above is not intended to cover every instance in which system performance is adversely affected by non-optimum bus utilization, but rather to give a few examples so as to clearly point out the mechanism by which the problem occurs.
In the past, digital system designers have tolerated non-optimum bus utilization by simply accepting its reduced efficiency and consequent lower data throughput. More recently, certain arrangements have emerged which provide improvement in some aspects of bus utilization. One such arrangement is the Peripheral Component Interconnect (hereinafter PCI) Bus. One of skill in the art, however, will recognize that the PCI bus does not offer a sweeping solution to the bus utilization problem. More specifically, the PCI Bus maintains the aforedescribed fixed relationship between a transaction's address and data portions such that bus hold delays continue to be encountered.
Another arrangement which is referred to as "pipelining" offers bus utilization improvement in certain situations. These certain situations necessitate that data is transferred in a system from a particular source module to a particular destination module by way of a fixed number of physical elements which make up the "pipe". The data passing through the "pipe" is processed in precisely the same manner between the two modules so as to perform a particular operation. Unfortunately, pipelining has limited value in improving bus utilization and efficiency since improvements are only realized for that particular operation which is performed by the pipeline. Improving bus utilization and efficiency in the remainder of the system therefore remain is a concern.
As processing applications continue to increase in complexity and required levels of data throughput continue to increase, future digital systems in the form of individual integrated circuits and bus interconnected discrete components will be pushed to correspondingly higher levels of performance. As will be seen hereinafter, the present invention provides bus arrangements and associated methods which contemplate heretofore unattainable performance levels through improved bus utilization efficiency within individual integrated circuits and within bus interconnected discrete module digital systems.
SUMMARY OF THE INVENTION
As will be described in more detail hereinafter, there are disclosed herein digital systems and an associated method.
In accordance with one aspect of the method of the present invention, a series of address transactions may be performed between modules which are interconnected on a bus arrangement such that each address transaction defines an associated data transaction. The data transactions are thereafter performed on the bus arrangement such that the data transactions are completed in a sequence which is different than the order in which the series of address transactions were performed.
In accordance with another aspect of the method of the present invention, disclosed systems may include a bus arrangement having a number of separate buses which interconnect at least three modules such that one of the buses serves at least as an address bus and any remaining buses serve as data buses. In executing a plurality of transactions involving the modules on the bus arrangement wherein each transaction includes an address period which defines an associated data transfer between said modules, the systems operate by initiating the address periods of the transactions on the address bus such that each transaction is active in the system until such time that its associated data transfer is completed. Subsequent to completion of each address period, the associated data transfer of each transaction is executed on the bus arrangement such that all of the initiated data transactions are active at one point in time and so that the number of active transactions is greater than the number of separate buses.
In accordance with still another aspect of the method of the present invention, disclosed systems may include a bus arrangement having one or more buses which interconnect at least three modules. During operation, at least two transactions are executed involving all of the modules on the bus arrangement wherein each transaction includes an address period which defines an associated data transfer between the modules. The address periods for the transactions are performed on one of the buses. Subsequent to completion of each address period, the associated data transfers are executed in corresponding data intervals on a particular one of the buses such that a first data interval associated with a first data transfer includes at least one idle period during which no data associated with the first data transaction is transferred. The first data interval further includes at least one data period, following its idle period, during which data is transferred over that particular bus. Furthermore, a second data interval associated with a second data transfer includes at least one data period during which data is transferred over that particular bus. In accordance with the present invention, the first and second data transfers are executed in tined relation so that the idle period of the first data interval occurs for the duration of the data period of the second data interval so as to perform the first and second data transfers in an interleaved manner on that particular bus.
In accordance with yet another aspect of the method of the present invention, its use is equally applicable in a system which uses a single address bus and a single data bus.
In accordance with a further aspect of the method of the present invention, disclosed systems may include a bus arrangement which interconnects at least two modules. During operation of the system, a transaction involving the modules is executed on the bus arrangement wherein the transaction includes an address period which defines an associated data transfer between the two modules. The address period is performed on the bus arrangement such that a first module addresses a second module. Following the performance of the address period, the data transfer is controlled using the second module so as to execute the data transfer on the bus arrangement such that data is transferred from the second module to the first module. Within the context of the present invention, the first module is considered as the destination module of the data transfer while the second module is considered as the source module of the data transfer. Accordingly, the data transfer of a transaction is controlled on the bus arrangement using the source module of the data transfer irrespective of which module initiated that transaction's address period.
The bus arrangements of the systems disclosed herein may be implemented in a number of highly advantageous ways. In a first implementation, the bus arrangement in may include a single, multiplexed bus which serves for the transfer of address and data information.
In accordance with a second implementation, a digital system includes at least one processing module, memory means for storing digital information and a bus arrangement. The bus arrangement includes an address bus interconnecting the processing module with the memory means and at least two, separate data busses which are arranged so as to interconnect the processing module and the memory means in a predetermined way.
In accordance with one aspect of this multi-data bus implementation, the system performs transactions each of which includes an address portion that defines a data transfer. In one feature, the system is configured for selecting one of the data buses on which to perform each data transfer. Data bus selection may be based on certain criteria relating to the operation of the system and may be dynamic such that bus selection is based on current operational status of the system so as to optimize bus utilization. In another feature, the system may be configured for permitting simultaneous execution of different data transfers on the respective data busses. Systems having more than two data buses may simultaneously execute different data transfers on each data bus.
In accordance with a third implementation of the present invention, a digital system includes at least one processing module and memory means. The memory means includes first and second separate, independently controllable memory storage arrangements for storing digital information. The system further includes a bus arrangement interconnecting the processing module and the memory storage arrangements in a predetermined way.
In one aspect of this multi-memory implementation of the invention, the memory means includes memory control means for automatically and retrievably storing a stream of data received from the bus arrangement into at least two of the memory storage arrangements such that portions of the stream are stored in different ones of the memory storage arrangements in an interleaved manner. In implementations which include more than two memory storage arrangements, a particular data stream may be stored in this interleaved manner amongst all of the memory storage arrangements.
BRIEF DESCRIPTION OF THE DRAWINGS
The present invention may be understood by reference to the following detailed description taken in conjunction with the drawings briefly described below.
FIG. 1 is a block diagram illustrating a digital system including a bus arrangement which is implemented in accordance with the present invention.
FIG. 2 is a diagrammatic illustration which shows two, separate memories which are interleaved with a predetermined segment size and the way in which a data stream is distributed between the memories in accordance with the present invention.
FIG. 3 is a flow diagram which illustrates the steps of the transaction execution technique of the present invention.
FIG. 4a is a graphic representation illustrating the execution of four transactions by the system of FIG. 1 and in accordance with the method of the present invention.
FIG. 4b is a diagrammatic representation illustrating the operation of a bus controller configured in accordance with the present invention.
FIG. 4c is a graphic representation of the four transactions shown originally in FIG. 4a illustrating alternate way of executing the transactions in accordance with the method of the present invention.
FIG. 5 is a block diagram illustrating another digital system including another bus arrangement which is implemented in accordance with the present invention.
FIG. 6 is a graphic representation illustrating the execution of four transactions by the system of FIG. 5 in accordance with the method of the present invention.
FIG. 7 is a block diagram illustrating a digital system including another embodiment of a bus arrangement which is implemented in accordance with the present invention.
FIG. 8 is a graphic representation illustrating the execution of three transactions by the system of FIG. 7 in accordance with the method of the present invention.
FIG. 9 is a block diagram illustrating a system manufactured in accordance with the teachings herein which is referred to as a "FusionBus" system.
FIG. 10 is a block diagram illustrating a flip flop module interconnection interface designed in accordance with the present invention.
FIG. 11 is a block diagram illustrating a complex system configuration which is manufactured in accordance with the present invention and is shown here to illustrate certain configuration and bussing issues.
FIG. 12 is a block diagram illustrating the way in which a FusionBus to PCI Bus Bridge bridges the FusionBus to an external PCI bus.
FIG. 13 is a block diagram illustrating the way in which multiple PCI/PCI bridges can be attached to the primary PCI bus to create secondary busses and subordinate buses.
FIG. 14 is a block diagram illustrating further features of a multiple PCI Bridge to FusionBus topology.
FIG. 15 is a block diagram illustrating a generalized physical layer for interconnecting the link layer of a module with the bus arrangement of the present invention.
FIG. 16 is a diagrammatic representation of an address state machine which controls Address Phases in FusionBus operations.
FIG. 17 is a diagrammatic representation of a Data State Machine (DSM), which controls all data transfers across the FusionBus. Specifically, that portion of the DSM is shown which is used for a Source Data Transfer operation.
FIG. 18 is a diagrammatic representation of another portion of the DSM which is used for a Destination Data Transfer operation.
DETAILED DESCRIPTION OF THE INVENTION
Attention is immediately directed to FIG. 1 which illustrates one embodiment of a digital system manufactured in accordance with the present invention and generally indicated by the reference numeral 10. System 10 includes a host processor 12, a memory bank A indicated by the reference number 14 and a memory bank B indicated by the reference number 16. Host processor 12 is connected with a host interface module 18. Memory bank A is connected with a memory A control module 20 while memory bank B is connected with a memory B control module 22. It should be appreciated that host interface modules, memory control modules and other modules which are used herein should be designed in view of interface considerations which will be described once the reader has been made aware of relevant details. Memory banks A and B may comprise standard RAM banks having a combined capacity which is suited to the intended system application(s). It is to be understood that substantially any CPU either currently available or to be developed may serve as host processor 12 based upon considerations to be described below and in view of overall performance requirements. Moreover, system 10 accommodates the use of multiple host processors with relative ease, as will be discussed later. System 10 further includes a plurality of additional modules to be described below which are selected so as to fulfill specific functional needs based upon processing requirements of the intended application. For illustrative purposes, these modules will be chosen in a way which serves to best illustrate the advantages which are achieved through the teachings of the present invention.
Continuing to refer to FIG. 1, selected modules which form part of system 10 include a fixed disk interface module 24 which is connected with an external fixed disk 26, a PCI bus interface module 28 connected with a PCI bus and a hardware accelerator module 32. PCI bus 30 may extend to any number of PCI bus configured peripherals such as, for example, a network interface (not shown). Hardware accelerator 32 may be configured so as to serve any one of a number of functions within the context of the present invention. For example, hardware accelerator module 32 may comprise an inverse discrete cosine transform module (hereinafter IDCT module) which is useful in multimedia image processing. Since a hardware accelerator module is dedicated to a particular task, its design may be optimized so as achieve a very high processing speed in performing that particular task. Hardware accelerator modules will be described in further detail at appropriate points below.
System 10 further includes a bus arrangement implemented in accordance with the present invention and generally indicated by the reference number 40. Bus arrangement 40 includes a module interface arrangement 41 which is comprised of a link layer portion 42 which interfaces directly with a physical layer portion 44. Link layer portion 42 provides the individual modules in the system with an interface to the overall bus arrangement in the form of individual link layers 46a-f. Physical layer portion 44 includes a plurality of individual physical layers 48a-f which are associated with respective link layers 46a-f. Physical layers 48a-f, in turn, are each connected with an address bus 50 and are selectively connected with a data bus A indicated by reference number 52 and a data bus B indicated by the reference number 54. Selective connection of individual module physical layers with data buses A and B will be discussed at appropriate points below. Bus arrangement 40 is completed by a bus controller module 60 which is designed in accordance with the present invention and which is connected with address bus 50 and both data buses. Bus controller 60 serves in all bus arbitration and allocation needs, as will be further described below. At this point, it is worthy of mention that such a multiple data bus arrangement, which shares one address bus, has not been seen before by applicants and that its attendant advantages are significant, as will be described in detail hereinafter.
Having generally described the structure of system 10 including bus arrangement 40 and appreciating that this system represents a relatively complex digital system, a discussion will now be provided which serves to bring into view relatively broad considerations and concepts with regard to the design, operation and many advantages of system 10. Specific operational details, designs and clock cycle diagrams will be provided within the context of a later discussion.
In system 10, typical modules such as, for example, fixed disk 24, PCI bus interface 28 and hardware accelerator 32 are capable of operating as both "masters" and "slaves" with respect to one another and with respect to the host processor and connect to both of the data buses. The terms "master" and "slave" are used in their generally known senses wherein a master requests a data read or write and the slave presents or receives the requested data, as stated previously. The primary exception in module dual master/slave capability in this system are memory controller modules 20 and 22, which possess only slave functionality. That is, the memory modules are subject to read or write requests which are always initiated by another module. In another aspect which is different from most other modules, memory controllers 20 and 22 are each connected to only one data bus by module interface arrangement 41. Specifically, memory A controller module 20 is connected with data bus A via link layer module 46b and physical layer module 48b while memory B controller module 22 is connected with data bus B via link layer module 46c and physical layer module 48c. This data bus/memory arrangement achieves certain advantages in conjunction with the specific way in which address space is allocated between the respective memories in accordance with an overall address allocation scheme which will be described below. It should be noted that memory controller modules 20 and 22 may each be connected (not shown) with both data buses A and B by their respective physical layers, as indicated by dashed lines 62. As will be appreciated at an appropriate point below, connection of each memory controller with both data busses is highly advantageous in facilitating data bus selection in a dynamic way based, for example, on current availability of a particular data bus or on overall data bus utilization monitoring statistics.
As noted, bus controller module 60 performs all bus arbitration and allocation functions within the system so as to function as a sort of "traffic controller." For this reason, it represents another exception in module dual master/slave capability. Further details regarding the role and capabilities of bus controller module 60 will be mentioned at appropriate points in the remaining discussions. For the moment, it is sufficient to note that many of the control tasks performed by bus controller 60 are relegated to the CPU of prior art systems. Therefore, the bus controller of the present invention provides a significant advantage in relieving host processor 12 of such burdensome control tasks.
Host interface module 18 represents some what of an exception in module dual master/slave capability in that it possesses few, if any, slave functions. However, the host interface module varies in other respects from typical modules. For example, it is assigned a higher priority than other modules whereby to allow for fast, low latency accesses to the bus system. These accesses are typically performed as bursts to memory banks A and/or B. Specific priority schemes and bus access by the host processor and remaining modules will be discussed below. It should be noted that the host interface module logic may contain a level 2 cache controller (not shown) in which case it possesses a greater number of slave capabilities. As another example of the way in which the host interface varies from typical modules, the host interface also possesses logic which implements a configuration mode. The configuration mode employs a special high speed access technique which allows direct host processor access to individual modules without using standard module to module protocol, as will be described below. For purposes of the present discussion, it is noted that the configuration mode is used to initialize the system, to read module identifications in determining the system configuration, to set up the address space for each module and to implement error recovery from hardware or software hang conditions. These and other aspects of the configuration mode will be covered further in the course of the following discussions.
With regard to addressing, it is noted that address bus 50 of bus arrangement 40 may be configured with any suitable number of address lines. For example, if address bus 50 is made up of 32 address lines, system 10 includes an addressable space of 4 gigabytes. The slave logic, if any, within each module (including the memory banks) responds to a portion of this address space. Master logic within respective modules, of course, does not respond to address space, since all operations are initiated by the master logic. The size of the address space that is associated with a particular module is determined by the needs of the module while the address range is itself programmable. The present invention provides a highly advantageous feature in that the master logic of any module can initiate a transfer with the slave logic of any other module, allowing peer to peer transfers between modules without intervention by host processor 12. This feature and associated features will be further described hereinafter in conjunction with a discussion of the specific way in which transactions are configured and executed by system 10. It is noted that that PCI bus 30 and associated bus bridge hardware (i.e., interface 28) represent one portion of system 10 which may be addressed using a non-response subtractive decoding technique which is implemented in accordance with the present invention, for reasons will be described. Other types of buses such as, for example, the ISA bus may be operated in system 10 or in other systems to be disclosed below using appropriately configured bus bridges.
Continuing with a discussion of memory allocation considerations and referring to FIGS. 1 and 2, memory A and B controllers can be configured in several different ways. In one highly advantageous configuration, memories A and B are interleaved with a predetermined segment size 100. FIG. 2 graphically illustrates memories A and B gridded with memory segments so as to show this interleaving scheme in a very simplified way for purposes of understanding. Segment addresses alternate in a consecutive fashion between the memories with each segment 100 itself including contiguous addresses (not shown) over its own range. Odd numbered segments from 101 to 131 are located in memory A while even numbered segments from 102 to 132 are located in memory B. It should be understood that the interleaved addressing is accomplished at the hardware level by physical layers 48b and 48c of memory controllers 14 and 16, respectively. In this manner, memory interleaving is very fast and is essentially invisible to the other modules in system 10. That is, other modules in the system need have no specific knowledge of the interleaved addresses when reading from or writing to memory since all associated functions are preformed at the hardware level by physical layer portion 44 of module interface arrangement 41, as will be seen in the example immediately hereinafter.
Still referring to FIGS. 1 and 2, an incoming data stream 140, which is moving in the direction indicated by an arrow 142 is illustrated as being stored in the respective odd and even segments of the two memories upon passing through memory controllers 14 and 16. For exemplary purposes, stream 140 is considered as a relatively large stream (i.e., larger than one segment 100) which flows from fixed disk 26 along a bus 144 to fixed disk interface 24. As best seen in FIG. 1, stream 140 passes from fixed disk interface 24 to link layer 46d and then is handed to physical layer 48d. The stream is segmented by physical layer 48d and travels along paths A and B wherein path A (indicated by an arrow) follows data bus A to physical layer 48b of memory bank A and path B (indicated by another arrow) follows data bus B to physical layer 48c of memory bank B, as best seen in FIG. 1. The memory controllers then store the data in the corresponding memory locations, as best seen in FIG. 2. It should be appreciated that the hardware interleaved memory capability of the present invention is achieved by the cooperation of the memory controllers, physical layer portions 48b-d and the bus controller since any operation which utilizes the bus arrangement is given access thereto by the bus controller in its bus arbitration role.
While the foregoing discussion represents a somewhat simplified description of the hardware interleaved memory configuration of the present invention, it serves to illustrate the specific way in which a data transfer which is larger than segment size 100 moves through system 10. Further details with regard to this operation will be made evident below, particularly with regard to the bus controller and module interactions which cause the stream to be segmented in the illustrated manner. While data stream 140 represents a write operation, it should be appreciated that read operations simply flow in a direction which is opposite that shown by arrow 142 such that a stream of outgoing data (not shown) is assembled by the cooperation of the memory controllers. Stream reassembly is a relatively simple task since the location of the stream's data may be specified with an initial address and a stream length. As an example, if the stream is 1000 bytes long and each segment holds 200 bytes, the stream could be stored in the five consecutive segments 101, 102, 103, 104 and 105. The location of this exemplary stream may be specified simply using the address of segment 101. Where more than two memory banks and associated controllers are provided, interleaving can be provided amongst all of the memory banks.
The segment interleaved memories of the present invention are highly advantageous for at least one particular reason. Specifically, segment interleaving allows memory access over the range of addresses which are specified in the interleaved address space to be distributed between two or more memory controllers. In so doing, accesses in this space, which might be used heavily by different modules, exercise both memory controllers evenly and take advantage of the full memory bandwidth available to the system. This is done automatically in hardware so as to require no software scheduling of memory in order to get the full memory bandwidth utilization (which in some operating systems would not be possible to do in software). Stated in another way, a long stream (such as stream 140) involving both memory controllers does not tie up the entire system memory. That is, since a particular stream, either outgoing or incoming, utilizes each memory only approximately fifty percent of the time, other data operations have significant access opportunities during the remaining approximate fifty percent of the time. This advantage is enhanced by the use of two or more data buses such that the memories may be involved in two simultaneously occuring data operations which may even alternate in a non-interfering manner between the two or more memories.
In a practical system, the interleaving segment size which is used may be based on a number of factors which vary depending on particular applications and implementations such as, for example, the expected average length of data transfers and the overhead required by the access to an interleaved segment. For the readers benefit, it is mentioned that the memory interleaving segment size is not to be confused with the data packet/period size to be described below. It should be noted that the present invention contemplates the use of more than two memories, as stated above. In fact, any number of memories greater than two may be used with the aforedescribed advantages being further enhanced so long as each memory is independently controllable. It is further noted that the separate memory controllers of FIG. 1 may be configured as a single unit having equivalent functionality.
Having provided a basic understanding of the structure of system 10 and certain operational concepts embodied therein, a discussion of the specific way in which transactions are performed by the system will now be provided with initial reference to FIGS. 1-3. As in prior art systems, transactions in system 10 are typically requested by application software which is running on the system and are performed between modules. The process of performing a requested transaction begins with step 160 in which host processor 12 builds or sets up a transaction instruction set, for example, in its cache. Once the transaction instruction set has been completed, step 162 is performed in which the host processor may transfer the transaction instruction set in the manner of a typical transfer performed in accordance with the method of the present invention directly to locations within the system memory i.e., memory banks A and/or B over the bus arrangement. For example, the transaction instruction set might be stored in memory segment 107. Following movement of the transaction instruction set to memory, the host processor notifies whichever module serves as the master for the transaction as to the location of the transaction instruction set. At this point, responsibility of host processor 12 for execution of the transaction terminates until all or at least a portion of the transaction is executed between the modules. It is noted that certain commands, which will be described in context below, may be embedded within the transaction instruction set which re-involve the host processor.
Step 164 is next performed wherein the master module reads the stored instructions directly from memory and then addresses the slave module over address bus 50. In step 166, the slave sends data on the address bus to the master which includes all of the necessary parameters for executing the data transaction so as to appropriately configure the transfer to the two modules. The specific format used will be described below. Data which is sent to the master includes items such as, for example, which data bus(es) the slave is connected with, the time at which the data transfer can begin (which will prevent the master from requesting the data bus too early) a throttle control (i.e., the speed of the transfer) such that modules having different data transfer rate capabilities can be interconnected within the same system and the width control, which allows modules with different data widths (for example, 16 bits vs. 32 bits) to be interconnected in the same system. As an example of a particular transaction, if one module includes a data width of 16 bits and is capable of transferring at normal bus speed and if the other involved module includes a data width of 32 bits and is capable of transferring at one-half of normal bus speed, the transaction parameters will be selected to use a 16 bit bus width and data will be transferred no more frequently than on every other data cycle for this particular data transfer.
In addition to those parameters described immediately above, the source and destination modules of the transfer are identified. On a write transaction, the master module serves as the source while the slave module serves as the destination. Conversely, on a read transaction, the master module serves as the destination while the slave module serves as the source. It should be appreciated that this sort of highly advantageous transaction parameter setup technique has not been seen heretofore and serves to completely set up the data transfer using the typically under utilized address bus. In this manner, the efficiency of data bus utilization is further improved. Further attention will be devoted to the subject of the transaction parameter setup technique at appropriate points below.
Still referring to FIGS. 1-3 and in accordance with the present invention, the source module of any transaction initiates and controls the data transfer portion of any transaction in step 168. In this way, the source module need not initiate the data transfer until such time that the data is available. The subject data may not be immediately available to the source module for reasons including, but not limited to, the need to perform further processing, access delays causing wait states such as hard disk "spin up", refresh cycles in DRAM or contention for the data resource. Therefore, wait states which are commonly encountered by system 10 do not tie up the bus arrangement with inefficient bus hold delays. In other words, the bus arrangement remains available to the modules of the system irrespective of wait states. This implementation is highly advantageous as compared with the prior art convention wherein the master controls the entire transaction (whether the master represents the source or destination of the transaction data) so as to impose a bus hold delay for each wait state delay such that all buses associated with the transfer (typically the address and one data bus) are unavailable.
While the method illustrated by FIG. 3 represents a simplification of the overall transaction execution process employed by system 10, one of skill in the art will recognize that a highly advantageous and powerful technique has been disclosed in which a fixed time relationship is not required between the addressing and data portions of a transaction, as in the prior art. It is in this spirit that all transactions are preformed by system 10. More specifically, a transaction instruction set is setup by the host processor such that its address and data portions can be performed independent of the host processor and separate in time from one another. Physical layer portion 44 then facilitates performing the address portion and the data portion at different times. Gaining the ability to allow the source module to initiate and control the data portion of any transaction is only one of the advantages which are realized by this technique. Another advantage resides in relieving the host processor of duties in supervising the execution of either the addressing or data transfer portions of the transaction. Still further advantages of the transaction processing approach of the present invention will be seen immediately hereinafter.
Referring now to FIG. 4a in conjunction with FIG. 1, a series of transactions to be performed on system 10 are graphically represented as transactions 1, 2, 3 and 4 (hereinafter T.sub.1, T.sub.2, T.sub.3 and T.sub.4). The transactions include address portions which are individually labeled as ADDR T.sub.1 -T.sub.4 (hereinafter AT.sub.x, where x is the transaction number) and data portions which are individually labeled as DATA T.sub.1 -T.sub.4 (hereinafter DT.sub.x, where x is the transaction number). For descriptive purposes, a time line 180 is shown below the transactions. Time line 180 begins at time t.sub.0, concludes at time t.sub.C and is divided into a series of intervals I.sub.1 -I.sub.17. It is to be understood that the specific configurations of T.sub.1 -T.sub.4, in combination, are chosen in a way which serves to illustrate a number of highly advantageous features of the present invention. Transaction address portions AT.sub.1 through AT.sub.3 are sequentially initiated on address bus 50 such that the address bus is completely utilized from the beginning of I.sub.1 to the conclusion of I.sub.6. It should be appreciated that the ability of system 10 to utilize the entire bandwidth of its address bus is a highly advantageous feature of the present invention. In initial implementations, it is contemplated that address bus utilization will be relatively low when only a few data buses are provided since data portions of transactions are typically much longer than corresponding address portions. For this reason, it is submitted that, in subsequent implementations, a single address bus operating at full utilization may service as many as 64 data buses. It is in these subsequent implementations that efficient address bus utilization is of the utmost importance in accordance with the teachings herein.
Still referring to FIGS. 1 and 4a, T.sub.1 represents a first data transfer from host processor 12 (source) to memory bank A (destination), T.sub.2 represents a second data transfer from fixed disk 24 (source) to hardware accelerator 32 (destination), T.sub.3 represents a third data transfer from memory bank B (source) to PCI bus 30 (destination) and T.sub.4 represents a fourth data transfer from host processor 12 (source) to PCI bus 30 (destination). Note that there is no need to describe these transactions in terms of read or write operations since the source module is identified in each case.
In accordance with the present invention, the data portion of each transaction is made up of idle periods and data periods. During idle periods for a particular transaction no data is present on bus arrangement 40 while during its data periods a corresponding "packet" of data is transferred over data buses A or B. For example, DT.sub.1 includes two idle periods and data periods d1-d7 such that T.sub.1 concludes at the end of d7 (I.sub.17). Each transaction is considered as being active from the initiation or inception of its address portion to the end of its last data period. Therefore, T.sub.1 is active over I.sub.1 through I.sub.17, T.sub.2 is active over I.sub.3 through I.sub.15 (the beginning of its address portion to the end of its data portion), T.sub.3 is active over I.sub.5 through I.sub.10 and T.sub.4 is active over I.sub.9 through I.sub.15. As described above, bus selection is one function of the transaction parameter setup technique of the present invention. A number of factors may contribute to the data bus selection process. These factors include certain criteria such as, for example, duty cycle statistics maintained by the bus controller, current or predicted traffic levels on the buses, the speed with which a transaction can be completed or other considerations including, but not limited to minimum bandwidth requirements. Evaluation of these conditions may be dynamic. That is, bus selection is based on the immediate values of the aforementioned criteria which values change over time as system operations progress.
It is noted that the objective of slaves such as memory controllers is to complete their transactions as soon as possible so that subsequent transactions can be accepted whereby to avoid retries, as will be described. Another criteria relates to the actual bus interconnections made within system 10. For example, memory bank A is not connected to data bus B while memory bank B is not connected to data bus A. Therefore, bus controller 60 must select data bus A for T.sub.1 since memory A is the destination module of the transaction. For transaction 3, bus controller 60 must select data bus B since memory B is the source module of the transaction. Selection of a data bus on which to transact T.sub.2 may be performed based upon the aforementioned criteria since the source and destination modules of transaction 2 are each connected with both data buses. In the present example, DT.sub.2 is performed on data bus A.
In comparing T.sub.1 and T.sub.2, it is apparent that the data periods of one transaction occur during the idle period or periods of the other transaction since DT1 and DT2 are both transacted on data bus A. In this regard, it should be appreciated that data bus A is used in a highly advantageous way. That is, from the beginning of DT1 at I.sub.3 to the conclusion of transaction T.sub.1 at the end of I.sub.17, data bus A experiences 100 percent utilization. The way in which data periods of two transactions, such as transactions T.sub.1 and T.sub.2, are interleaved on one data bus is determined by bus controller 60 based, for example, on a readily providable fairness algorithm which optimizes access to the data buses for the various modules within the system. In this manner, short data transfers may be given higher priority on the bus arrangement than long data transfers such that access to the data buses for any particular module is not denied for an extended period of time. T.sub.3 and T.sub.4 are representative of such short transfers. It is also important to note that system 10 is configured such that handshaking on bus arrangement 40 is minimized. Specifically, consecutive data periods which constitute at least a portion of one data transfer on one of the data buses may be executed with no intervening handshaking operations. System recovery, in the event that an error occurs during such handshake free transfers, will be discussed at an appropriate point below. In and by itself, this feature is highly advantageous in optimizing the use of the available bus bandwidth.
As another advantage of bus arrangement 40, it should be appreciated that data buses A and B provide the capability of simultaneous execution of unrelated data transfers. A number of examples of this advantage are illustrated in FIG. 4. Specifically, d2-d3 of T.sub.2 are executed on data bus A simultaneous with the execution of d1-d2 of T.sub.3 on data bus B. As another example, d4-d5 of T.sub.1 are executed on data bus A simultaneous with the execution of d3-d4 of T.sub.3 on data bus B. As still another example, d4-d7 of T.sub.2 are executed on data bus A simultaneous with the execution of d1-d4 of T.sub.4 on data bus B.
Continuing to refer to FIGS. 1 and 4a, transaction T.sub.2 represents a transfer which would produce a bus hold delay in a typical prior art system. More specifically, transaction T.sub.2 is a read operation wherein the master is hardware accelerator 32 and the slave is fixed disk 24. In accordance with the present invention, fixed disk 24 is the source module for data transfer DT2 and, therefore, its physical layer 48d initiates and controls DT2 in cooperation with physical layer 48f of hardware accelerator 32. It should be noted that DT2 begins with an idle 1 period during interval I.sub.5 immediately following AT2. This idle 1 period represents the access time of fixed disk 24 in the read operation. In a typical prior art system, data bus A would itself be idle during I.sub.5 thereby reducing the utilization of available bus bandwidth. In contrast, the present invention utilizes the available bus bandwidth in an effective manner by executing data period d3 of transaction 1 on bus A during the idle 1 period of transaction T.sub.1. Stated in another way, T.sub.1 and T.sub.2 are simultaneously active on data bus A over intervals I.sub.5 through I.sub.15. As will be further described immediately hereinafter, this latter feature has not been seen heretofore by applicants.
It has been seen above that system 10 permits more than one transaction to be active on a single data bus, keeping in mind that a transaction is active beginning with the start of its address portion and ends with end of its data transfer, as defined above. FIG. 4a serves to more fully illustrate the importance of this feature. In particular, all four transactions (T.sub.1-4) are simultaneously active in system 10 over intervals I.sub.9 and I.sub.10. While, at first, this may not seen significant, it should be appreciated that more transactions are simultaneously active than the number of individual address and data buses within the system. Such capability is not present in the prior art. That is, a prior art system with one address bus and one data bus (for a total of two buses) is typically capable of sustaining, at most, only two active transactions simultaneously. In accordance with the present invention, system 10 is capable of supporting many more active transactions on its bus arrangement than the number of individual buses.
Continuing to refer to FIG. 4a, it should be appreciated that system 10 provides still another highly advantageous feature relating to transactions. This feature concerns the initiation times of different transactions in relation to the times at which they are completed. As an example, T.sub.1 is initiated at t.sub.0. Subsequently, transactions T.sub.2, T.sub.3 and T.sub.4 are initiated at the beginnings of intervals I.sub.3, I.sub.5 and I.sub.9, respectively, such that the transactions are initiated in numerical order. However, it can clearly be seen that the transactions are completed in the order T.sub.3 first, then T.sub.2 and T.sub.4 (simultaneously on different data buses) and, lastly, T.sub.1. Thus, the completion order is different than the initiation order. This feature has not been seen in the prior art and is highly advantageous in a number of different ways. As one example, this capability at least in part provides for the implementation of complex priority schemes using bus controller 60. In this regard, it is noted that idle periods may either be the result of bus allocation (i.e., the one bus with which a particular module is connected is currently in use) or, as in the example of T.sub.2, the result of a source module initiating a data transfer only after the requested data is ready for transfer. Further details regarding both bus allocation and priority scheme implementations will be provided at appropriate points below.
Turning now to FIGS. 1, 4a and 4b while still considering the bus selection approach of the present invention, bus controller 60 includes a priority criteria segment 200 which includes information relating to each module which makes up system 10. Priority criteria segment 200 may, for example, be loaded into bus controller 60 at system startup and may be updated periodically based upon the needs of individual modules or upon any other relevant system parameter. It should be appreciated that each and every cycle on each data bus within a bus arrangement operating in accordance with the method of the present invention is arbitrated for through bus controller 60. That is, each cycle on each bus is proactively granted to one module out of the group of modules which are each requesting the use of the bus arrangement. In other words, for a particular cycle, the bus controller considers the request of each module based on, for example, each module's individual data rate and priority to determine which module is granted the use of the bus for that cycle. FIG. 4b graphically illustrates transactions T.sub.1 -T.sub.3 each requesting the use of a data bus at the inception of I.sub.7 such that these three transactions are active and in contention for the use of data buses A and B. It should be mentioned that T.sub.4 is not shown in contention with the remaining transactions since DT4 begins later at I.sub.11. Another input to bus controller 60 comprises a master clock signal 202 which defines the data cycles (and, for that matter, the address cycles) throughout bus arrangement 40. Master clock signal (MCLK) 202 will be described in detail at appropriate points to follow. It should be understood that FIG. 4a represents one way in which data periods are allocated to transactions 1-3 by the bus controller based upon criteria 200 using a particular fairness algorithm. As will been seen, the data periods may be allocated in other ways dependent upon either the criteria themselves or a different fairness algorithm.
Referring to FIG. 4c in conjunction with FIG. 4a, MCLK 202 is shown in one particular form wherein a leading edge of the MCLK signal initiates each of intervals I.sub.1 -I.sub.18. Transactions 1-4 are shown once again, however, another allocation of the data periods for transactions 1 and 2 is illustrated, as based upon criteria 200 using, for example, a modified fairness algorithm as compared with that which produced the allocation shown in FIG. 4a. It should be noted that the data portions of T.sub.3, and T.sub.4 have not changed since both of these transactions utilize data bus B and there is no contention between the two transactions for this data bus. However, comparison of FIG. 4c with FIG. 4a reveals that the allocation of data periods for transactions T.sub.2 and T.sub.1 is quite different. Specifically, data periods for these two transactions now alternate on data bus A such that handshaking, for example, using the address bus is not required. In this manner, T.sub.1 is completed at the end of I.sub.13 while T.sub.2 is completed at the end of I.sub.18. Such an allocation of the data periods of T.sub.1 and T.sub.2 clearly demonstrates that each and every cycle on data bus A is arbitrated for through bus controller 60. It should be appreciated that this feature, in and by itself, is highly advantageous and has not been seen heretofore. Moreover, data bus A experiences 100% utilization over the duration of T.sub.1 and T.sub.2.
At this juncture, it should be appreciated that many of the aforedescribed advantages (as well as those yet to be described) are achieved, at least in part, by the heretofore unseen way in which the present invention carries out transactions. Specifically, addressing and data portions of each transaction are performed autonomously from one another. Of course, the address portion must be performed prior to the data portion. However, there are very few, if any, other restrictions introduced by this extremely flexible, adaptive technique. For example, there can be a delay between the end of a transaction's address portion and the beginning of its data portion. As another example, data periods for particular transactions can be intermingled with idle periods such that a large number of different transactions may be active simultaneously in the system.
Attention is now directed to FIG. 5 which illustrates another embodiment of a digital system manufactured in accordance with the present invention and generally indicated by the reference numeral 300. Because system 300 includes many of the same modules as previously described system 10, descriptions of these modules and their bus interfaces will not be repeated for purposes of brevity and like reference numbers are applied. Further, system 300 is implemented with a bus arrangement 306 having a multiplexed bus which interfaces with the physical layer of each module. Bus 308 is similar to prior art multiplexed buses to the extent that it carries all of the address and data information between the modules within the system. However, as will be seen below, system 300 provides many of the advantages described above with regard to system 10. It should be noted that the use of a multiplexed bus is considered to be a practical arrangement for interconnecting discrete modules, for example, on a printed circuit board since the number of electrical interconnections between the modules is decreased such that the overall cost of the printed circuit board is reduced. However, the present invention also contemplates a multiplexed bus arrangement at the chip level. Generally, the present invention is well suited to a multiplexed bus arrangement since significant performance improvements are gained through optimized bus utilization. In fact, multiplexed bus arrangements may enjoy an increase in popularity due to performance enhancements which are attributable to the teachings disclosed herein.
Referring to FIG. 6 in conjunction with FIG. 5, a series of transactions to be performed on system 300 are graphically represented as transactions A, B and C (hereinafter T.sub.A, T.sub.B and T.sub.C). Like previously described transactions T.sub.1 -T.sub.4, transactious T.sub.A, T.sub.B and T.sub.C include address portions and data portions. The address portions are individually labeled as ADDR T.sub.A -T.sub.C (hereinafter AT.sub.A-C) while the data portions are individually labeled as DATA T.sub.A -T.sub.C (hereinafter DT.sub.A-C). Time line 180 is repeated below the transactions. For illustrative purposes, transaction address portions AT.sub.A through AT.sub.B are sequentially initiated on bus 304 from interval I.sub.1 through interval I.sub.6.
Continuing to refer to FIGS. 5 and 6, T.sub.A represents a first data transfer from host processor 12 (source) to system memory (destination), T.sub.B represents a second data transfer from fixed disk 24 (source) to hardware accelerator 32 (destination) and T.sub.C represents a third data transfer from system memory (source) to PCI bus 30 (destination). As in system 10, the address portion of each transaction is performed on bus 304 without interruption. Further like system 10, the data portion of each transaction is made up of idle periods in which no data is transferred for that transaction and data periods in which data packets are transferred over bus 304. For example, DT.sub.A includes two idle periods and data periods d1-d4 such that T.sub.A is active over I.sub.1 through I.sub.17. Transactions T.sub.B and T.sub.C are active over I.sub.3 through I.sub.15 and over I.sub.5 through I.sub.8, respectively. Since bus arrangement 306 includes a single bus, bus controller 60 is configured for performing all arbitration and allocation functions with regard to the use of this bus. Interleaving of the three transactions on bus 304 may be based, for example, on the aforementioned fairness algorithm by considering both addressing and data portions of transactions. The objective of the fairness algorithm is optimize bus access in a way which completes all transactions as soon as possible while still providing bus access to each module, irrespective of long data transfers. For example, short data transfers such as T.sub.C may be given higher priority on the bus arrangement than long data transfers.
In accordance with the use of a single bus, comparison of T.sub.A T.sub.B and T.sub.C reveals that address and data periods of one transaction occur during the idle period or periods of the other transactions when transactions are simultaneously active. In this regard, it should be appreciated that bus 304 experiences 100 percent utilization from the beginning of I.sub.1 through the end of I.sub.17. For this reason alone, system 300 may achieve data throughput levels exceeding that of prior art systems having separate address and data buses. Like system 10, system 300 is configured such that handshaking on bus arrangement 306 is not used between consecutive data periods/packets so as to further enhance data throughput.
At first appearance, it may seem that separately controllable memory banks A and B represent redundant resources within system 300 since only one memory may be accessed at a time by bus arrangement 308. However, the use of these separate memories is highly advantageous for reasons beyond provisions for simultaneous memory accesses to two or more separate memories. For example, memory banks A and B may be different types of memory such as RAM and ROM. As another example, bank A may comprise static RAM while bank B may comprise dynamic RAM in which case different interface types are required. As still another example, if memory banks A and B both comprise dynamic RAM which require a refresh cycle, the refresh cycles may occur alternately between the two memory banks such that access may be gained to one or the other of the memory banks in spite of the refresh cycle. Thus, it should be apparent that the use of separately controllable memories is advantageous for these reasons irrespective of the of the bus arrangement configuration (i.e., the number of separate address and data buses) which is employed.
Still describing the features of system 300, transaction T.sub.B (like previously described transaction T.sub.2 of FIG. 4) illustrates a read operation from fixed disk 24 which entails an access delay that is represented as an initial idle time. In this regard, system 300 shares the advantages of system 10 since bus 304 may be utilized by a different transaction during the idle time of T.sub.B. In the present example, T.sub.C is completed in its entirety during the idle time of T.sub.B and, thereafter, a portion of data transfer DT.sub.A is completed. As still another advantage which is shared with system 10, system 300 permits more than one transaction to be active on a data bus 304. Specifically, all three transactions (T.sub.A-C) are simultaneously active in system 300 over intervals I.sub.5 through I.sub.8. Thus, two or more transactions can simultaneously be active even though the system possesses only one bus. Yet another similar advantage resides in the ability of system 300 to complete transactions in an order which is different than the order in which they were initiated. For example, the transactions are initiated in the order T.sub.A, T.sub.B, T.sub.C but are completed in the reverse order T.sub.C, T.sub.B, T.sub.A. As in system 10, the advantages of system 300 are in many cases associated directly with the transaction processing technique of the present invention which provides for performing the address and data portions of a transaction without the need to maintain a fixed time relationship therebetween.
Turning now to FIG. 7, still another embodiment of a digital system is illustrated as manufactured in accordance with the present invention and generally indicated by the reference numeral 400. Because system 400 includes many of the same modules as previously described systems 10 and 300, descriptions of these modules and their bus interfaces will not be repeated for purposes of brevity and like reference numbers are applied. In this example, the memory A and B arrangement of system 10 has been retained with appropriate changes made within physical layers 48b and 48c of the memory A and memory B controllers, respectively, such that simultaneous access to the memories is facilitated, as will become evident below. In addition, system 400 is implemented with a bus arrangement 402 including previously described address bus 50 and a single data bus 404 such that each bus interfaces with the physical layer of each module. As will be seen, system 300 provides many of the advantages described above with regard to system 10 and all of the advantages associated with system 300.
Referring to FIGS. 6-8, it should be noted that FIG. 8 is similar to previously described FIG. 6 and illustrates its transactions T.sub.A, T.sub.B and T.sub.C as performed in one possible way by system 400. For this reason, the reader's understanding may be further enhanced by direct comparison of FIGS. 6 and 8. It is also noted that these transactions may be performed in a number of different ways by system 400 and that the present example has been selected as effectively illustrating the advantages of system 400 in light of the advantages of previously described systems. Because the transaction execution technique of the present invention remains unchanged as embodied by system 400 and has been described in detail above, the present discussion will be focused upon on the advantages of system 400 over system 300. To that end, it is evident that the presence of data bus 404 in system 400 allows the system to advantageously execute a data transfer on data bus 404 while simultaneously executing the address portion of another transaction on address bus 50. Two examples of this advantage are evident, as will be seen.
In a first example, the address portion of T.sub.B is performed on address bus 50 as d1 and d2 of DT.sub.A are performed on data bus 404. In a second example, the address portion of T.sub.C is performed on address bus 50 as d3 and d4 of DT.sub.A are performed on data bus 404. Thus, T.sub.A is completed with interval I.sub.6 by system 400 as compared with being completed with I.sub.17 by system 300 such that T.sub.A is the first transaction which is completed by system 400. It should be noted that the appearance of T.sub.B and T.sub.C are unchanged in FIG. 8 as compared with FIG. 8. At first, it may seem as though d1 of DT.sub.B should proceed on data bus 404 immediately following the completion of d2 of DT.sub.C on the data bus. This could be possible depending upon the nature of T.sub.B, however, it will be recalled that the initial idle period in DT.sub.B is imposed by the access time of fixed disk 24.
Continuing with the discussion of system 400, it is to be understood that all of the inventive features that have been described in conjunction with previous embodiments, that is, the multiple data bus arrangement of system 10 and the multiplexed bus arrangement of system 300, are equally applicable as features of system 400 except, of course, those features which specifically require two or more data buses. Once again, it is emphasized that system 400, like previously described systems, utilizes the transaction processing technique of the present invention so as to achieve its attendant advantages.
Referring again to FIG. 1, it is to be understood that the present invention is intended to provide a very high performance synchronous module interconnection system at either the chip level or between discrete modules. At the chip level, independently designed and verified integrated circuit modules may easily be assembled into a single IC so as to permit reliable, modular design of complex chips. For example, a dashed line 420 surrounds components of system 10 which may readily be implemented as a single IC in accordance with the concepts of the present invention. Therefore, large IC designs can be created by simply assembling pre-designed and pre-tested modules, without lengthy system debug and test cycles as required in the prior art. Since integrated circuit fabrication capabilities are advancing faster than the corresponding design capability, this plug-in methodology will become more and more important over time. As will be seen, the design protocols disclosed herein provide an interconnection mechanism which differs from standard processor and system busses in several areas which are critical to integrated circuit implementations. In particular, the bus arrangement of the present invention is scaleable, to allow for flexibility in chip implementations; synchronous, to allow for reliable operation in a variety of large circuits; and specifically optimized for the high burst bandwidths required by sophisticated multijnedia applications. As a result, the bus arrangement of the present invention is not only useful as a standard interconnection platform for current digital products, but for a wide variety of future digital products, as well.
With regard to the modular approach taken by the present invention, it should be noted that standardized access to the bus is provided for each module through module interface arrangement 41. Thus, link layer portion 42 and physical layer portion 44 of the module interface arrangement isolate the complexity of interface design considerations of the present invention from the design of the modules themselves. In this way, module design is simplified and the possibility of inadvertent inferior design of new modules, which may be designed by third party vendors, is dramatically reduced. Moreover, module designers need only be concerned with verifying proper module response in a relatively straight forward module interface environment, as provided by the present invention. Such considerations are important in that the bus arrangement of the present invention represents a high performance resource which may not provide all of the advantages disclosed herein if it is not accessed in an optimized manner.
Using the specification to this point and FIGS. 1-8, it is considered that one of ordinary skill in the art nay readily practice the present invention in view of the teachings therein. However, for further explanatory purposes, the bus arrangements and method disclosed thus far will be described in more detail in conjunction with FIGS. 9-18. It is noted that the term FusionBus, as seen hereinafter, is intended to be a bus arrangement manufactured in accordance with the present invention and is a trademark of Fusion MicroMedia Corporation, Longmont, Colo.
1. Introduction
FusionBus.TM. is the standard integrated circuit interconnection platform developed by Fusion MicroMedia to facilitate the creation of a broad variety of complex products. With FusionBus, large IC designs can be created by simply assembling independently pre-designed and pre-tested modules, without the typical lengthy system debug and test cycles required today. Since integrated circuit fabrication capabilities are advancing faster than the corresponding design capability, this plug-in methodology will become more and more important over time. FusionBus has been architected specifically to adapt to the future design requirements of ever increasing bandwidth, complexity, and integration. It is also designed to allow easy adaptation of existing functions, and straightforward creation of FusionBus compatible modules in the future.
FusionBus is a very high performance synchronous module interconnection system designed to allow independently designed and verified integrated circuit modules to be easily assembled into a complex IC. This provides a unique interconnection mechanism which differs from standard processor and system busses in several areas which are critical to integrated circuit implementations. In particular, FusionBus is scaleable, to allow for flexibility in chip implementations; synchronous, to allow for reliable operation in a variety of large circuits; and specifically optimized for the high burst bandwidths required by sophisticated multimedia applications. As a result, FusionBus can and will be used as a standard interconnection platform for a wide variety of future products.
This describes the detailed specification of the FusionBus, and also provides details of the implementation of the Physical Layer, which is the hardware realization of the FusionBus.
2. Bus Structure
FIG. 9 shows the various elements of a FusionBus system. The FusionBus itself consists of an Address Bus and one or more Data Busses. Each bus includes not only the Data or Address signals, but all of the arbitration and handshaking signals. Each of the elements of the system performs a particular set of functions, and are designed so as to maintain as much commonality as possible between modules within a system and in different systems.
In general, all modules on the FusionBus are similar except for the Bus Controller, which performs all of the global arbitration and bus allocation functions. Typical modules contain the capability of being both Masters and Slaves on the Bus, and connect to all of the Data Busses. The primary exception to this structure are the Memory Controllers, which contain only Slave functionality and connect to only one of the Data Busses.
The FusionBus in its first implementation has a 32 bit Address Bus, so there is a 4 GB address space. Each slave responds to a portion of this address space, whose size is determined by the module but whose address is programmable. Any Master can initiate a transfer with any slave, allowing peer to peer transfers between modules.
Each module consists of three components: the FusionBus Physical Layer, the FusionBus Link Layer, and the module logic. The Physical Layer implements the FusionBus handshaking, address decoding, and transfer counting for both Master and Slave functions. The Link Layer implements the MultiStream protocol processing and the linking of multiple transfers into the MultiStream operation. The module logic implements the actual module function. Modules may be completely self contained, such as the 3D accelerator or the MPEG-2 decoder, or they may contain connections external to the chip such as the PCI Interface or the Display Controller.
The Host Interface is somewhat different in that it has few, if any. Slave functions. It also has a higher priority than other modules to allow for fast, low latency accesses which are bursts. The Host Interface module logic may contain a Level 2 Cache Controller, in which case it does have more Slave capabilities. The Host Interface also contains the logic which implements Configuration Mode. which is a special access method allowing direct processor access to individual modules without using the FusionBus protocol. Configuration Mode is used to initialize the system, to read module IDs to determine the system configuration, to set up the FusionBus address space on each module, and to implement error recovery from hardware or software hang conditions. Other than these differences, a Host Interface will include the Physical and Link Layers just as in other modules.
3. Physical Bus Connection
Since modules connecting to the Fusion Bus can be on any part of a large integrated circuit, there can be significant wiring delays on the bus signals themselves. To minimize the effects of these delays on overall performance (i.e. the frequency of bus transactions), every signal on the FusionBus is registered at both ends. In addition, there must be a master chip clock FCLK, which is distributed so as to minimize clock skew, which clocks all of the register flip flops. A typical signal circuit is shown in FIG. 10.
This implementation optimizes performance (in terms of the frequency of FCLK) for the Fusion Bus, and significantly simplifies the integration of large and complex integrated circuits.
4. FusionBus Signals
The FusionBus is made up of two sets of physical signals. The single Address Bus makes up one of the sets, and the second set consists of one or more Data Busses, all of which are identical.
4.1 Address Bus
Table 1 lists all of the signals in the FusionBus Address Bus. Type Indicates the signal's connection, with All meaning that the signal connects to all modules, Ind. meaning that each signal in the group connects one module to a central module such as the Bus Controller or Host Interface, and Sgl meaning that the signal connects from a single module to all other modules. Connections describes the modules to which the signal connects, and the direction of the signal. Level indicates the physical signal type, with T/S meaning a tristate signal, Std.Being a standard CMOS driver, WOR meaning a signal driven low by the source and pulled high otherwise, and CBuf indicating a clock buffer driver.
TABLE 1
Signal Type Connections Level Description
FB_ADDR[31:00] All Master -> Slave T/S Address Bus
FB_CFGREQ Ind. Host -> Ctrl. Std Configuration
Request
FB_AREQ[31:00] Ind. Master -> Ctrl. Std Address Request
FB_AGNT[4:0] Sgl Ctrl. -> Master Std Address Grant
FB_ARDY All Master -> Slave WOR Address Ready
Strobe
FB_AACK All Slave -> Master WOR Address
Acknowledge
FB_ARETRY All Slave -> Master WOR Address Retry
FB_SRETRY All Snoop -> Master Std Snoop Address
Retry
FB_SSPEED[4:0] All Slave -> Master T/S Slave Mode
Speed
FB_MSPEED[4:0] All Master -> Slave T/S Master Mode
Speed
FB_ADATA[2:0] All Slave -> Master T/S Data Bus
Selector
FB_READ All Master -> Slave T/S Read/not Write
FB_MEMIO All Master -> Slave T/S Memory/not I/O
FB_COUNT[9:0] All Master -> Slave T/S Byte Count
FB_IRQ[31:00] Ind. Module -> Host Std Interrupt Request
FB_MSTRID[4:0] All Master -> Slave T/S Master ID
FB_AWAIT[2:0] All Slave -> Master T/S Address Wait
Value
FB_LOCK All Master -> Slave T/S Resource Lock
Request
FB_CONFIG Sgl Host -> All Std Initialization
Selection
FB_INTACK Sgl Host -> PCI Std Interrupt
Acknowledge
Cycle
FB_NOBE All Master -> Cache WOR Not all byte
enables asserted
FB_TESTMODE Sgl Host -> All Std FusionBus Test
Mode
FRST Sgl Host -> All Std Module RESET
FCLK Sgl Host -> All CBuf Main System
Clock
4.1.1 Signal Definitions
FB_ADDR[31:00]--the Address Bus, which carries the starting address of a burst from the Master to the Slave. Note that this is a Byte Address even though the Data Bus will transfer Word64 values (64 bits). The byte addressing is necessary to allow a Cache Controller to correctly snoop the address range of a data transfer.
FB_AREQ[31:00]--the Address Request lines, one from each module to the Bus Controller. These lines are used to request use of the Address Bus by a Master.
FB_CFGREQ--Configuration Request, from the Host Interface to the Bus Controller. This indicates that the Host Interface is requesting a Configurational operation, and the Bus Controller will immediately grant the Address Bus and all Data Busses to the Host Interface and continue to do so until FB_CFGREQ is removed.
FB_AGNT[4:0]--the Address Grant bus, which indicates in an encoded form the Master which has been granted the Address Bus by the Bus Controller.
FB_ARDY--Address Ready, which is driven by the Master to indicate to slaves that the Address Bus contains an address to be decoded, and that the FB_COUNT, FB_MSTRID, FB_MSPEED, FB_MEMIO and FB_READ signals are also valid.
FB_AACK--Address Acknowledge, driven by the addressed Slave to indicate that the address request has been accepted. This indicates that the FB_SSPEED, FB_ARETRY, FB_SRETRY, FB_AWAIT and FB_ADATA signals are valid.
FB_ARETRY--Address Retry, driven by an addressed Slave (along with FB_AACK) to indicate that the address was decoded but not accepted by the Slave.
FB_SRFTRY--Snoop Address Retry, driven by the Snooping Cache Controller to indicate that the address was decoded as being cacheable and a cache snoop operation must occur. If there is not a Snoop Controller in the system, this signal need not be implemented by any Modules.
FB_SSPFED[4:0]--The Slave speed indicator. This is a five bit value which informs the Bus Controller of the minimum number of cycles to be inserted between grants in a transfer.
FB_MSPEED[4:0]--The Master speed indicator. This is a four bit value which informs the Bus Controller of the minimum number of cycles to be inserted between grants in a transfer.
FB_ADATA[2:0]--The Data Bus indicator, which the addressed Slave drives with the ID of the Data Bus to which it is connected. For systems with a single Data Bus, these signals are not used.
FB_READ--The Read/Write signal, which indicates whether the burst is a read (if 1) or a write (if 0).
FB_MEMIO--The Memory I/O signal, which indicates whether the reference is to memory (if 1) or I/O (if 0) space.
FB_COUNT[10:0]--Byte Count, indicating the length of the requested burst in bytes.
FB_IRQ[31:00]--the Interrupt Request lines, one from each module to the Bus Controller. These lines are asserted by a module when its internal interrupt function is activated.
FB_MSTRID[4:0]--Master ID, which indicates which Master has initiated the address transfer. The addressed Slave captures this data for comparison with the DGNT Bus during a write data transfer cycle.
FB_AWAIT[2:01]--The wait period suggested when an address retry is signaled, with the time defined in the following table.
FB_AWAIT[2:0] FCLK cycles to wait
000 8
001 16
010 32
011 64
100 128
101 256
110 512
111 No suggestion
FB_LOCK--Lock, which is driven by a Master along with ARDY to indicate that the Slave should not accept accesses from any other Master.
FB_CONFIG--Configuration Selection, which indicates that Modules must decode the Configuration Address during Plug and Play system initialization. Configuration references are used for reading the Module's System ID, loading its address space registers, and loading other static parameters.
FB_INTACK--Interrupt Acknowledge Cycle, which indicates to the FusionBus to PCI Bridge that the Host Interface is requesting an Interrupt Acknowledge Cycle to be performed on the PCI Bus.
FB_NOBE--Not all Byte Enables, which is asserted if the Master cannot guarantee that all byte enable signals will be asserted for all words in the current transfer. Modules which always drive all byte enables do not need to connect to this signal.
FB_TESTMODE--Test Mode, which indicates that the Host Interface is executing a special test access. Test Mode is only used for chip testing and this signal should never be asserted during normal operation.
FRST--Initialize the module. This signal forces all modules into a known state.
FCLK--Main Clock, the system clock for all modules.
4.2 Data Bus (0 through 7)--Example of Data Bus x
Table 2 lists all of the signals in a FusionBus Data Bus. Type Indicates the signal's connection, with All meaning that the signal connects to all modules. Connections describes the modules to which the signal connects, and the direction of the signal. Level indicates the physical signal type, with T/S meaning a tristate signal, Std.Being a standard CMOS driver, and Wire-OR meaning a signal driven low by the source and pulled high otherwise. If there are multiple Data Busses, there is one such signal group for each of them, and each group is proceeded by Dx, where x is the bus number. If there is only a single Data Bus, the Dx prefix is not used.
TABLE 2
Signal Type Connections Level Description
FB_DxDATA[63:00] All Source <-> Dest. T/S Data Bus
FB_DxDREQ[31:00] All. Source -> Ctrl. WOR Data Request
FB_DxDGNT[4:0] All Ctrl. -> Source Std Data Grant
FB_DxDRDY All Source -> Dest. WOR Data Ready
Strobe
FB_DxBE[7:0] All Source -> Dest. T/S Byte Enables
FB_DxDACK All Dest. -> Source WOR Data
Acknowledge
FB_DxDISC All Any -> Any WOR Disconnect
FB_DxABORT All Any -> Any WOR Transfer Abort
4.2.1 Signal Definitions
FB_DxDATA[63:00]--the Data Bus, used to transfer data between the Source and Destination.
FB_DxDREQ[31:00]--the Data Request lines, one from each module to the Bus Controller. These lines are used to request use of the Data Bus by a Source.
FB_DxDGNT[4:0]--the Data Grant bus, which indicates in an encoded form the Source which has been granted the Data Bus by the Bus Controller.
FB_DxDRDY--Data Ready, which is driven by the Source to indicate to Destinations that the Data Bus contains write data, or that read data is expected to be driven from the Destination.
FB_DxBE[7:0]--Byte Enables, which is driven by the Source to indicate to Destinations which bytes of the Data Bus contain valid write data or read data.
FB_DxDACK--Data Acknowledge driven by the Destination to indicate that write data has been accepted from the Data Bus.
FB_DxDISC--Disconnect, driven by either the Source or Destination to indicate that the current transfer must be interrupted but must be restarted by the Master at some later time.
FB_DxABORT--Transfer Abort, driven by either the Source or Destination during a cycle to cause he other Module to end the current transfer even if the count has not been reached.
For any particular implementation, only those signals which are required are used. For example, if the system contains only seven modules with Source capability, then only FB--AREQ[06:00], FB_IRQ[06:00] and FB_DxDREQ[06:00] will be implemented.
5. Bus Protocol
5.1 Bus Protocol Overview
The FusionBus is designed around the basic concept of point to point transfers between modules connected to the bus. A transfer is initiated by a Master module, which uses the FusionBus Address Bus to connect to another module referred to as the Slave module. The Master connects with the Slave through an Address Transaction. If the Slave responds positively during the Address Transaction, a Connection is created between the Master and Slave. The Master has indicated the direction of the transfer during the Address Transaction, and thus the Source and Destination modules of the Connection are defined. For a write transfer, the Master is the Source, and the Slave is the Destination. Conversely, the Master is the Destination of a read transfer, and the Slave is the Source.
Once a Connection is made, the Source module then manages the actual data transfer on one of the FusionBus Data Busses, through a Data Transaction. Thus for a read transfer, the Master manages the Address Transaction but the Slave (which is the Source) manages the Data Transaction. The Source attempts to transfer one piece of data on each cycle in which it is granted ownership of the Data Bus being used for the transfer. One of the key features of the FusionBus is that ownership of the Data Bus is determined independently on each cycle by the Bus Controller. This means that a number of data transfers may occur between different pairs of connected modules in an interleaved fashion. The Destination must acknowledge receipt of each piece of data, and if such acknowledgment is not received by the Source, the Source will retry the data transfer until it is acknowledged or too many attempts have been made.
Since the Bus Controller allocates bus ownership on a cycle by cycle basis, multiple transactions of the same priority can proceed in an interleaved fashion, allowing extremely high utilization of the Data Busses.
The pipelining shown in the previous section simplifies the physical interconnection, but complicates the flow of arbitration and data transfers. Fortunately, there is a relatively small number of transaction types necessary to provide the required functions. These transactions will be described, and will be illustrated in a set of standard diagrams. In these diagrams, signals designated MST_* are signals on the Master, signals designated SLV_* are the equivalent (pipelined) signal on the Slave, signals designated BUS_* are signals on the actual interconnection bus, and signals designated ARB_* are in the Bus Controller. Each rectangle indicates a single FCLK cycle, numbered at the top of each diagram for reference. For Data Transactions, SRC_* indicates the signal at the Source module, and DST_* indicates signals at the Destination.
5.2 Address Arbitration Sequence
The first operation in a transfer, shown in Table 3, is the Address Transaction. In this operation a Master who wishes to initiate the transfer asserts their FB_AREQn line in cycle 1. This cycle is referred to as the Address Request Phase. In this example Master 2 has asserted its FB_AREQ (MST_AREQ). Two cycles later (cycle 3) this is visible at the Bus Controller (ARB_AREQ). On each cycle the Bus Controller observes all of the FB_AREQn lines and determines the next grant. It communicates this by generating the selected Master's ID on the FB_AGNT Bus (ARB_AGNT), in what is known as the Address Grant Phase. All Masters will then see this Grant two cycles later, in cycle 5. Each Master compares its It) with the FB_AGNT Bus, and removes its FB_AREQn signal when it detects its ID on FB_AGNT.
TABLE 3
Basic Address Arbitration
CYCLE 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
16 17 18 19
MST_AREQ 2 2 2 2 .backslash.
BUS_AREQ 2 2 2 2 Address Request Phase
ARB_AREQ 2 2 2 2 /
ARB_AGNT 2 .backslash.
BUS_AGNT 2 Address Grant Phase
MST_AGNT 2 /
MST_ARDY 2 .backslash.
BUS_ARDY 2 Address Phase
SLV_ARDY 2 /
SLV_AACK 2 .backslash.
BUS_AACK 2 Address Acknowledge
Phase
MST_AACK 2 /
The Master who has a request pending and detects its ID on FB_AGNT must, on the cycle it sees FB_AGNT (cycle 5), drive FB_ARDY, its ID on FB_MSTRID, its FB_ADDR and FB_COUNT fields, the cycle type (read/write/memory or I/O) on FB_READ and FB_MEMIO, and its speed on FB_MSPEED. This is referred to as the Address Phase. This information (MST_ARDY) is seen by all Slaves two cycles later (cycle 7) as SLV_ARDY. Each Slave is continuously comparing the FB_ADDR and FB_MEMIO information when FB_ARDY is asserted to see if the reference is to its address space. When it detects an address in its space, the slave then replies with either Normal (it can accept the transaction request) or Explicit Retry (it cannot accept the request). This is indicated by asserting FB_AACK along with the indicator of which Data Bus the Slave is connected to (FB_ADATA), and asserting FB_ARETRY if there must be a retry. This is known as the Address Acknowledge Phase. If no slaves assert FB_AACK, an Implicit Retry occurs. This is handled by the Masters just as if a regular Retry cycle occurred. In either retry case the Master will assert its FB_AREQn signal and begin the arbitration again. Each Master contains a Retry Counter, and if 256 retries occur on an attempted transaction, an error occurs and is signaled through an interrupt by the Master. The Slave also drives its speed on FB_SSPEED, and its wait suggestion on FB_AWAIT if a retry is signaled.
Similar eight cycle sequences can be pipelined every clock cycle, with the Bus Controller providing new FB_AGNT information every cycle. Note that a Master with a single request will hold its FB_AREQn line for four cycles even if it is granted immediately, so the Bus Controller will mask this request after granting it. A Master must remove its FB_AREQ signal for at least one cycle before reasserting it to request another transfer.
5.3 Data Bus Transfer Sequence
Each Data Bus is arbitrated completely independently from the Address Bus, with the only restriction being that the data must be sent after the corresponding address transaction. Arbitration is similar to the Address Bus. For the Data Bus cycle there will be a Source Module which is the source of the data, and a Destination Module which is the destination. The Module which requests the Data Bus is always the Source Module of the transfer, and thus is determined by the direction of the transfer. On a write, the Master requests the Data Bus, while on a read transaction the Slave will request the Data Bus when it has data available to transfer. The result of this dichotomy is that all transfers on the Data Bus are controlled by the source of the data, and there is no distinction in the Data Bus operation due to the original Master. A typical Data Bus Transfer cycle is shown in Table 4.
TABLE 4
Basic Data Transfer
BUS CYCLE 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15 16 17 18 19
SRC_DREQ 2 2 2 2 .backslash.
BUS_DREQ 2 2 2 2 Data Request
Phase
ARB_DREQ 2 2 2 2 /
ARB_DGNT 2 2 2 2 .backslash.
BUS_DGNT 2 2 2 2 Data Grant Phase
SRC+DST_DGNT 2 2 2 2 /
SRC_DRDY 2 2 2 2 .backslash.
BUS_DRDY 2 2 2 2 Data Phase
DST_DRDY 2 2 2 2 /
SRC_DATA 2 2 2 2
BUS_DATA 2 2 2 2
DST_DATA 2 2 2 2
DST_DACK 2 2 2 2 .backslash.
BUS_DACK 2 2 2 2 Data Acknowledge
Phase
SRC_DACK 2 2 2 2 /
Each Source Module is connected to all of the Data Busses, each of which have their own arbitration logic within the Rug Controller. The Source Module asserts the FB_DxDREQn line which corresponds to the Module ID of the Master. On a write, this is the Source Module's ID, since it is the Master. On a read, it is the value captured from the FB_MSTRID signals during the Address Phase, since the Source is the Slave. This cycle is referred to as the Data Request Phase. The Bus Controller can assert FB_DxDGNT as soon as it receives FB_DxREQn in cycle 3, which is called the Data Grant Phase. The Source Module will see FB_DxDGNT in cycle 5. The Destination will also see FB_DxGNT in cycle 5, and since FB_DxDGNT corresponds to the Source Module's ID, the addressed Destination Module will know that this data cycle is for its pending operation. The default value of each FB_DxDGNT Bus is to grant the bus to the Processor (Module 0).
When the Source Module detects its ID on the FB_DxDGNT Bus, it drives FB_DxDRDY along with the data to be transferred on the FB_DxDATA bus. This cycle is known as the Data Phase. At the same time, the Destination detects the Module ID of the Master on FB_DxDGNT. If it will be able to accept the data two cycles later, it drives FB_DxDACK. This is referred to as the Data Acknowledge Phase. Two cycles later, the Source sees FB_DxDACK and the Destination sees FB_DxDRDY. If both of them are asserted, the data has been transferred.
Since both the Master (in the Address Phase) and the Slave (in the Address Acknowledge Phase) have driven their respective speed values, the Bus Controller is able to determine the slower of the two speeds. It then grants the Data Bus to the Master IO) which corresponds to this connection no more often than the speeds indicate. For example, if the Master Speed (on FB_MSPEED) is 1 and the Slave Speed (on FB_SSPEED) is 2, the slower speed is 2 and the Bus Controller will leave at least two clocks between every two assertions of that Connection's FB_DxDGNT. The Bus Controller will assert the various FB_DxDGiNT values as a function of the pending requests, priority values, and Connection speed values. Each Source and Destination Module which is currently participating in a Connection will constantly monitor the FB_DxDGNT bus, and will respond with a FB_DxDRDY or FB_DxDACK signal whenever the Module ID of the Connection master appears. The following sections will describe different cycle types in more detail.
5.4 Burst Write Cycle
Table 5 shows a single burst write. For this type of transfer, the Master Module is the Source and the Slave Module is the Destination. The first eight cycles are a Basic Address Arbitration, followed by a Basic Data Transfer operation with the Master as the Source Module and the Slave as the Destination Module. The Master (as the Source) can asserts its FB_DxDREQ line in cycle 9 to start the Data Transfer, since it must receive FB_AACK denoting the successful creation of the Connection before initiating the Data Transfer. In order to accelerate the generation of FB_DxDREQ to improve performance, the Slave will assert FB_DxDREQ at the same time as it asserts FB_AACK (in cycle 7), indicated by 2S in the Figure. It will drive FB_DxDREQ for two cycles, and then remove it. At that point the Master has seen FB_AACK and can begin driving FB_DxDREQ. Note that since all requests are on the Master's FB_DxDREQ line, the Bus Controller does not see any difference in the two requests.
TABLE 5
Write Cycle
CYCLE 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15 16 17 18 19
Address Bus
MST_AREQ 2 2 2 2
BUS_AREQ 2 2 2 2
ARB_AREQ 2 2 2 2
ARB_AGNT 2
BUS_AGNT 2
MST_AGNT 2
MST_ARDY 2
BUS_ARDY 2
SLV_ARDY 2
SLV_AACK 2
BUS_AACK 2
MST_AACK 2
Data Bus
MST_DREQ 2S 2S 2 2
BUS_DREQ 2S 2S 2 2
ARB_DREQ 2S 2S 2 2
ARB_DGNT 2 2 2 2
BUS_DGNT 2 2 2 2
MST+SLV_DGNT 2 2 2 2
MST_DRDY 2 2 2 2
BUS_DRDY 2 2 2
2
SLV_DRDY 2 2
2 2
MST_DATA 2 2 2 2
BUS_DATA 2 2 2
2
SLV_DATA 2 2
2 2
SLV_DACK 2 2 2 2
BUS_DACK 2 2 2
2
MST_DACK 2 2
2 2
5.5 Burst Read Cycle
This cycle, shown in Table 6, is similar to the write burst. The Basic Address Arbitration cycle is identical. The main difference is that in the Data Transfer the Slave is the Source Module and the Master is the Destination Module. The Slave detects that it has been addressed in cycle 7 when it sees ARDY and its address. If the Slave will be able to supply data by cycle 11, it can assert FB_DxDREQn (on the Master's request line) in cycle 7, at the same time it has asserted AACK. This allows the data transfer to start two cycles earlier than on a write. In general, the Slave will assert FB_DxDREQn four cycles before it will have data available for transfer.
TABLE 6
Burst Read Cycle
CYCLE 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15 16 17 18 19
Address Bus
MST_AREQ 2 2 2 2
BUS_AREQ 2 2 2 2
ARB_AREQ 2 2 2 2
ARB_AGNT 2
BUS_AGNT 2
MST_AGNT 2
MST_ARDY 2
BUS_ARDY 2
SLV_ARDY 2
SLV_AACK 2
BUS_AACK 2
MST_AACK 2
Data Bus
SLV_DREQ 2 2 2 2
BUS_DREQ 2 2 2 2
ARB_DREQ 2 2 2 2
ARB_DGNT 2 2 2 2
BUS_DGNT 2 2 2 2
MST+SLV_DGNT 2 2 2 2
SLV_DRDY 2 2 2 2
BUS_DRDY 2 2 2
2
MST_DRDY 2 2
2 2
SLV_DATA 2 2 2 2
BUS_DATA 2 2 2
2
MST_DATA 2 2
2 2
MST_DACK 2 2 2 2
BUS_DACK 2 2 2
2
SLV_DACK 2 2
2 2
5.6 Concurrent Transfers
The previous examples showed a single Connection transfer. Table 7 shows concurrent transfers between multiple Master/Slave pairs. The assumption is that all three Masters (2, 3 and 4) have equal priority, and thus the Bus Controller will interleave their FB_DxDGNTs. In this example, Masters 2 and 4 have initiated write transfers, but Master 3 has initiated a read transfer. Since its Slave can assert its FB_DxDREQn earlier, Connection 3 is able to perform its first data transfer before Connection 2.
TABLE 7
Concurrent Cycles
CYCLE 1 2 3 4 5 6 7 8 9 10 11
12 13 14 15 16 17 18 19
Address Bus
MST_AREQ 2 234 234 234 34 4
BUS_AREQ 2 234 234 234 34 4
ARB_AREQ 2 234 234 234 34 4
ARB_AGNT 2 3 4
BUS_AGNT 2 3 4
MST_AGNT 2 3 4
MST_ARDY 2 3 4
BUS_ARDY 2 3 4
SLV_ARDY 2 3 4
SLV_AACK 2 3 4
BUS_AACK 2 3 4
MST_AACK 2 3 4
Data Bus
SRC_DREQ 3 23 23 234
234 234 234 234 234 234 234 234
BUS_DREQ 3 23 23
234 234 234 234 234 234 234 234
ARB_DREQ 3 23
23 234 234 234 234 234 234 234
ARB_DGNT 3 2
3 4 2 3 4 2 3 4
BUS_DGNT 3
2 3 4 2 3 4 2 3
SRC+DST_DGNT
3 2 3 4 2 3 4 2
SRC_DRDY
3 2 3 4 2 3 4 2
BUS_DRDY
3 2 3 4 2 3 4
DST_DRDY
3 2 3 4 2 3
SRC_DATA
3 2 3 4 2 3 4 2
BUS_DATA
3 2 3 4 2 3 4
DST_DATA
3 2 3 4 2 3
DST_DACK
3 2 3 4 2 3 4 2
BUS_DACK
3 2 3 4 2 3 4
SRC_DACK
3 2 3 4 2 3
5.7 Address Retry
In each of the previous cases, the addressed Slave was always ready to respond with FB_AACK when it detected its address on the Address Bus. Table 8 shows an example of address retry, since the addressed Slave responds with a Retry on the first address arbitration. "R" indicates a cycle with FB_AACK and FB_ARETRY, thus causing a Retry. The Master reasserts its FB_AREQ signal in cycle 9 to start another request cycle. The second address arbitration attempt is successful, with FB_DxDRDY asserted by the Master in cycle 21 instead of 13. A read cycle proceeds similarly. Note that the Slave can supply a retry suggestion on the FB_AWAIT Bus along with FB_ARETRY, which could delay the Master's reassertion of FB_AREQ.
TABLE 8
Burst Write Cycle with Address Retry
CYCLE 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15 16 17 18 19 20 21 22 23
Address Bus
MST_AREQ 2 2 2 2 2 2 2 2
BUS_AREQ 2 2 2 2 2 2 2 2
ARB_AREQ 2 2 2 2 2 2 2 2
ARB_AGNT 2 2
BUS_AGNT 2 2
MST_AGNT 2 2
MST_ARDY 2 2
BUS_ARDY 2 2
SLV_ARDY 2
2
SLV_AACK R
2
BUS_AACK R
2
MST_AACK R
2
Data Bus
MST_DREQ
2 2 2 2
BUS_DREQ
2 2 2 2
ARB_DREQ
2 2 2 2
ARB_DGNT
2 2 2 2
BUS_DGNT
2 2 2 2
MST+SLV_DGNT
2 2 2
MST_DRDY
2 2 2
BUS_DRDY
2 2
SLV_DRDY
2
5.8 Data Retry
A Destination Module may not be able to respond to the data cycle. Table 9 shows such a case on a write, where the Slave is not ready to accept write data. It signals this by not asserting FB_DxDACK (indicated by "X") on the first Data Grant cycle, in cycle 13. The Source sees the lack of FB_DxDACK in cycle 15, and retransmits the first data item "2A". The Destination, it it fails to signal FB_DxDACK, must also not send FB_DxDACK in the subsequent cycle (cycle 14 in the example) even if it can accept data.
TABLE 9
Burst Write Cycle with Data Retry
CYCLE 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15 16 17 18 19
Address Bus
MST_AREQ 2 2 2 2
BUS_AREQ 2 2 2 2
ARB_AREQ 2 2 2 2
ARB_AGNT 2
BUS_AGNT 2
MST_AGNT 2
MST_ARDY 2
BUS_ARDY 2
SLV_ARDY 2
SLV_AACK 2
BUS_AACK 2
MST_AACK 2
Data Bus
MST_DREQ 2 2 2 2 2 2
2
BUS_DREQ 2 2 2 2 2
2 2
ARB_DREQ 2 2 2 2
2 2 2
ARB_DGNT 2 2 2 2
2 2 2
BUS_DGNT 2 2 2
2 2 2 2
MST+SLV_DGNT 2 2
2 2 2 2 2
MST_DRDY 2 2
2 2 2 2
BUS_DRDY 2
2 2 2 2 2
SLV_DRDY
2 2 2 2 2
MST_DATA 2A 2B
2A 2B 2C 2D
BUS_DATA 2A
2B 2A 2B 2C 2D
SLV_DATA
2A 2B 2A 2B 2C
SLV_DACK X X 2 2
2 2
BUS_DACK X
X 2 2 2 2
MST_DACK
X X 2 2 2
Unused Data Grant Cycles
On any cycle, the Source of the Connection may detect its Master ID on the FB_DxDGNT Bus but be unable to transfer data. In this case, the Source simply fails to assert FB_DxDRDY during that cycle and no data transfer occurs, and a FusionBus cycle has been wasted. This condition will typically arise for one of two reasons. The first is that the Source Coprocessor has not supplied data to be transferred. This may occur because the Source speed value is too small, or simply because of unpredictable delays in the Coprocessor. The second reason is that the last data has been transferred, but additional FB_DxDGNT cycles occur because the Source was not able to remove its FB_DxDREQn signal quickly enough.
Since the Source cannot predict when it will receive FB_DxDGNT, it must hold FB_DxDREQn until the last data has been transferred. At this point unused Data Grant cycles may occur, particularly if the Data Transfer has been configured at speed zero (no delay between FB_DxDGNT cycles) and the Master Module is the highest priority requester. The worst case of this would be a transfer with a COUNT of one transfer and a speed of zero, potentially resulting in three wasted cycles for one data transfer. In this case, the Bus Controller detects the COUNT of one and forces the speed to be three, which will allow the Source to remove FB_DxDREQ before any unused cycles occur. For longer transfers, software can balance lost cycles with transfer rate using the speed controls for transfers. A speed of one or two ensures at most one lost cycle, while a speed greater than two guarantees no cycles will be lost for that transfer.
It is possible that a Source Module will remove FB_DxDREQn on the last data transfer, but that data transfer must be retried because the Destination failed to respond with FB_DxDACK. In this case the Source must reassert FB_DxDREQn and wait for another FB_DxDGNT cycle to complete the Data Transfer.
5.9 Processor Accesses
The FusionBus is specifically optimized for stream oriented functions, where bandwidth is the critical design parameter. However, systems will typically have at least one central processing unit interface, and the latency of access from |