Design apparatus and a method for generating an implementable description of a digital system7006960
Abstract
The present invention is a design apparatus compiled on a computer environment for generating from a behavioral description of a system comprising at least one digital system part, an implementable description for said system, said behavioral description being represented on said computer environment as a first set of objects with a first set of relations therebetween, said implementable description being represented on said computer environment as a second set of objects with a second set of relations therebetween, said first and second set of objects being part of a design environment.
Claims
What is claimed is:
1. A design apparatus compiled on a computer environment for generating from a behavioral description of a system comprising at least one digital system part, an implementable description for said system, said behavioral description being represented on said computer environment as a first set of objects with a first set of relations therebetween, said implementable description being represented on said computer environment as a second set of objects with a second set of relations therebetween, said first and second set of objects being part of a design environment, and wherein said first and second set of objects are part of a single design environment.
2. The design apparatus of claim 1 wherein said first and second set of objects are part of a single design environment.
3. The design apparatus of claim 1, wherein said design environment comprises an Object Oriented Programming Language.
4. The design apparatus of claim 3, wherein said Object Oriented Programming Language is C++.
5. The design apparatus of claim 1, wherein said design environment is an open environment wherein new objects can be created.
6. The design apparatus of claim 1, wherein at least part of the input signals and output signals of said first set of objects are at least part of the input signals and output signals of said second set of objects.
7. The design apparatus of claim 1, wherein at least part of the input signals and output signals of said behavioral description are at least part of the input signals and output signals of said implementable description.
8. The design apparatus of claim 1, wherein said first set of objects has first semantics and said second set of objects has second semantics.
9. The design apparatus of claim 8, wherein said first semantics is a data-vector model and/or a data-flow model.
10. The design apparatus of claim 1, wherein the behavioral description includes a structure-free description.
11. A hardware circuit or a software simulation of a hardware circuit designed with the design apparatus of claim 1.
12. A design apparatus compiled on a computer environment for generating from a behavioral description of a system comprising at least one digital system part, an implementable description for said system, said behavioral description being represented on said computer environment as a first set of objects with a first set of relations therebetween, said implementable description being represented on said computer environment as a second set of objects with a second set of relations therebetween, said first and second set of objects being part of a design environment, wherein said first and second set of objects are part of a single design environment, wherein said first set of objects has first semantics and said second set of objects has second semantics, and wherein said second semantics is a signal flow graph (SFG) data structure.
13. The design apparatus of claim 12, wherein the impact in said implementable description of at least a part of the objects of said second set of objects is essentially the same as the impact in said behavioral description of at least a part of the objects of said first set of objects.
14. The design apparatus of claim 12, further comprising means for simulating the behavior of said system, said means simulating the behavior of said behavioral description, said implementable description or any intermediate description therebetween.
15. The design apparatus of claim 12, wherein at least part of said second set of objects is derived from objects belonging to said first set of objects.
16. The design apparatus of claim 12, wherein said implementable description is at least partly obtained by refining said behavioral description.
17. The design apparatus of claim 12, wherein said implementable description is an architecture description of said system.
18. The design apparatus of claim 17, further comprising means for translating said architecture description into a synthesizable description of said system, said synthesizable description being directly implementable in hardware.
19. The design apparatus of claim 18, wherein said hardware is a semiconductor chip.
20. The design apparatus of claim 12, further comprising means to derive said first set of objects from a vector description describing said system as a set of operations on data vectors.
21. The design apparatus of claim 20, wherein said vector description is a MATLAB description.
22. The design apparatus of claim 12, further comprising means for simulating statically or demand-driven scheduled dataflow on said dataflow description.
23. The design apparatus of claim 12, further comprising means for clock-cycle true simulating said digital system using said dataflow description and/or one or more of said SFG data structures using an expectation-based simulation.
24. The design apparatus of claim 12, wherein the behavioral description includes a structure-free description.
25. A method of designing a system comprising at least one digital part, comprising refining, wherein a behavioral description of said system is transformed into an implementable description of said system, said behavioral description being represented as a first set of objects with a first set of relations therebetween and said implementable description being represented as a second set of objects with a second set of relations therebetween, and wherein said refining comprises translating behavioral characteristics at least partly into structural characteristics.
26. The method of claim 25, wherein the behavioral description includes a structure-free description.
27. The method of claim 25, further comprising simulating in which the behavior of said behavioral description, said implementable description and/or any intermediate description therebetween is simulated.
28. The method of claim 25, wherein said refining comprises the addition of new objects, permitting interaction with existing objects, and adjustments to said existing objects allowing said interaction.
29. The method of claim 25, wherein said refining is performed in an open environment and comprises expansion of existing objects.
30. The method of claim 25, wherein said refining comprises first refining, said first refining comprising:
determining the input vector lengths of input, output and intermediate signals;
determining the amount of parallelism of operations that process input signals to output signals;
determining actors, edges and tokens of said data-flow model; and
determining the wordlength of said tokens.
31. The method of claim 25, wherein said second set of objects with said second set of relations therebetween are at least partly derived from said first set of objects with said first set of relations therebetween.
32. The method of claim 25, wherein objects belonging to said second set of objects are new objects, identical with and/or derived by inheritance from objects from said first set of objects, or a combination thereof.
33. The method of claim 25, further comprising combining several SFG models with a finite state machine description resulting in an implementable description.
34. The method of claim 33, further comprising transforming said implementable description to synthesizable code.
35. The method of claim 34, wherein said synthesizable code is VHDL code.
36. A hardware circuit or a software simulation of a hardware circuit designed with the method of claim 25.
37. A method of simulating a system, wherein a description of a system is transformed into compilable C++ code, wherein said description is an SFG data structure and said compilable C++ code is used to perform clock cycle true simulations.
38. A method of simulating a system, wherein a description of a system is transformed into compilable C++ code, wherein the description includes a structure-free description.
39. A design apparatus compiled on a computer environment for generating from a behavioral description of a system comprising at least one digital system part, an implementable description for said system, said behavioral description being represented on said computer environment as a first set of objects with a first set of relations therebetween, said implementable description being represented on said computer environment as a second set of objects with a second set of relations therebetween, said first and second set of objects being part of a design environment, wherein said first set of objects has first semantics and said second set of objects has second semantics, and wherein said first semantics is a data-vector model and/or a data-flow model, wherein means for clock-cycle true simulating said digital system using said dataflow description and/or one or more of said SFG data structures using an expectation-based simulation.
40. A method of designing a system comprising at least one digital part, comprising refining, wherein a behavioral description of said system is transformed into an implementable description of said system, said behavioral description being represented as a first set of objects with a first set of relations therebetween and said implementable description being represented as a second set of objects with a second set of relations therebetween, wherein said refining comprises first refining wherein said behavioral description is a data-vector model and is at least partly transformed into a data-flow model, and wherein said data-flow model is an untimed floating point data-flow model.
41. The method of claim 40, wherein said refining further comprises second refining wherein said data-flow model is at least partly transformed into an SFG model.
42. The method of claim 40, wherein said second set of objects with said second set of relations therebetween are at least partly derived from said first set of objects with said first set of relations therebetween.
43. The method of claim 40, wherein objects belonging to said second set of objects are new objects, identical with and/or derived by inheritance from objects from said first set of objects, or a combination thereof.
44. A method of designing a system comprising at least one digital part, comprising refining wherein a behavioral description of said system is transformed into an implementable description of said system, said behavioral description being represented as a first set of objects with a first set of relations therebetween and said implementable description being represented as a second set of objects with a second set of relations therebetween, wherein said refining comprises:
determining the input vector lengths of input, output and intermediate signals;
determining the amount of parallelism of operations that process input signals to output signals;
determining actors, edges and tokens of said data-flow model; and
determining the wordlength of said tokens.
45. A method of designing a system comprising at least one digital part, comprising:
refining, wherein a behavioral description of said system is transformed into an implementable description of said system, said behavioral description being represented as a first set of objects with a first set of relations therebetween and said implementable description being represented as a second set of objects with a second set of relations therebetween,
wherein said refining comprises first refining, wherein said behavioral description is a data-vector model and is at least partly transformed into a data-flow model,
wherein said refining further comprises second refining, wherein said data-flow model is at least partly transformed into an SFG model, and
wherein said SFG model is a timed fixed point SFG model.
46. The method of claim 45, wherein said second set of objects with said second set of relations therebetween are at least partly derived from said first set of objects with said first set of relations therebetween.
47. The method of claim 45, wherein objects belonging to said second set of objects are new objects, identical with and/or derived by inheritance from objects from said first set of objects, or a combination thereof.
48. A method of designing a system comprising at least one digital part, comprising:
refining, wherein a behavioral description of said system is transformed into an implementable description of said system, said behavioral description being represented as a first set of objects with a first set of relations therebetween and said implementable description being represented as a second set of objects with a second set of relations therebetween,
wherein said refining comprises first refining wherein said behavioral description is a data-vector model and is at least partly transformed into a data-flow model,
wherein said refining further comprises second refining wherein said data-flow model is at least partly transformed into an SFG model, and
combining several of said SFG models with a finite state machine description resulting in an implementable description.
49. The method of claim 48, further comprising the step of transforming said implementable description to synthesizable code.
50. The method of claim 48, wherein said synthesizable code is VHDL code.
51. A method of simulating a system, wherein a description of a system is transformed into compilable C++ code, wherein said description comprises the combination of several SFG data structures with a finite state machine description resulting in an implementable description, said implementable description being said compilable C++ code suitable for simulating said system as software.
52. A method of simulating a system, wherein a description of a system is transformed into compilable C++ code, wherein said simulating comprises a clock-cycle true simulation of said system being an expectation-based simulation using one or more SFG data structures, said expectation-based simulation comprising:
annotating a token age to every token;
annotating a queue age to every queue;
increasing token age according to the token aging rules and with the travel delay for every queue that has transported the token;
increasing queue age with the iteration time of the actor steering the queue; and
checking whether token age is never smaller than queue age throughout the simulation.
Description
FIELD OF THE INVENTION
The present invention is situated in the field of design of systems. More specifically, the present invention is related to a design apparatus for digital systems, generating implementable descriptions of said systems.
The present invention is also related to a method for generating implementable descriptions of said systems.
STATE OF THE ART
The current need for digital systems forces contemporary system designers with ever increasing design complexities in most applications where dedicated processors and other digital hardware are used, demand for new systems is rising and development time is shortening. As an example, currently there is a high interest in digital communication equipment for public access networks. Examples are modems for Asymmetric Digital Subscriber Loop (ADSL) applications, and up- and downstream Hybrid Fiber-Coax (HFC) communication. These modems are preferably implemented in all-digital hardware using digital signal processing (DSP) techniques. This is because of the complexity of the data processing that they require. Besides this, these systems also need short development cycles. This calls for a design methodology that starts at high level and that provides for design automation as much as possible.
One frequently used modeling description language is VHDL (VHSIC Hardware Description Language), which has been accepted as an IEEE standard since 1987. VHDL is a programming environment that produces a description of a piece of hardware. Additions to standard VHDL can be to implement features of Object Oriented Programming Languages into VHDL. This was described in the paper OO-VHDL (Computer, October 1995, pages 18-26). Another frequently used modeling description language is VERILOG.
A number of commercially available system environments support the design of complex DSP systems.
MATLAB of Mathworks Inc offers the possibility of exploration at the algorithmic level. It uses the data-vector as the basic semantical feature. However, the developed MATLAB description has no relationship to a digital hardware implementation, nor does MATLAB support the synthesis of digital circuits.
SPW of Alta Group offers a toolkit for the simulation of these kind of systems. SPW is typically used to simulate data-flow semantics. Data-flow semantics define explicit algorithmic iteration, whereas data-vector semantics do not. SPW relies on an extensive library and toolkit to develop systems. Unlike MATLAB, the initial description is a block-based description. Each block used in the systems appears in two different formats, (a simulatable and a synthesizable version) which results in possible inconsistency.
COSSAP of Synopsys performs the same kind of system exploration as SPW.
DC and BC are products of Synopsys that support system synthesis. These products do not provide sufficient algorithm exploration functions.
Because all of these tools support only part of the desired functionality, contemporary digital systems are designed typically with a mix of these environments. For example, a designer might do algorithmic exploration in MATLAB, then do architecture definition with SPW, and finally map the architecture definition to an implementation in DC.
AIMS OF THE INVENTION
It is an aim of the present invention to disclose a design apparatus that allows to generate from a behavioral description of a digital system, an implementable description for said system.
It is another aim of the present invention to disclose a the design apparatus that allows for design, digital systems starting from a data vector or data flow description and generating an implementable level such as VHDL. A further aim is to perform such design tasks within one object oriented environment.
Another aim is to provide a means comprised in said design apparatus for simulating the behavior of the system at any level of the design stage or trajectory.
SUMMARY OF THE INVENTION
A first aspect of the present invention concerns a design apparatus compiled on a computer environment for generating from a behavioral description of a system comprising at least one digital system part, an implementable description for said system, said behavioral description being represented on said computer environment as a first set of objects with a first set of relations therebetween, said implementable description being represented on said computer environment as a second set of objects with a second set of relations therebetween, said first and second set of objects being part of a design environment.
A behavioral description is a description which substantiates the desired behavior of a system in a formal way. In general, a behavioral description is not readily implementable since it is a high-level description, and it only describes an abstract version of the system that can be simulated. An implementable description is a more concrete description that is, in contrast to a behavioral description, detailed enough to be implemented in software to provide an approximative simulation of real-life behavior or in hardware to provide a working semiconductor circuit.
With design environment is meant an environment in which algorithms can be produced and run by interpretion or compilation.
With objects is meant a data structure which shows all the characteristics of an object from an object oriented programming language, such as described in "Object Oriented Design" (G. Booch, Benjamin/Cummings Publishing, Redwood City, Calif., 1991).
Said first and second set of objects are preferably part of a single design environment.
Said design environment comprises preferably an Object Oriented Programming Language (OOPL). Said OOPL can be C++.
Said design environment is preferably an open environment wherein new objects can be created. A closed environment will not provide the flexibility that can be obtained with an open environment and will limit the possibilities of the user.
Preferably, at least part of the input signals and output signals of said first set of objects are at least part of the input signals and output signals of said second set of objects. Essentially all of the input signals and output signals of said first set of objects can be essentially all of the input signals and output signals of said second set of objects.
At least part of the input signals and output signals of said behavioral description are preferably at least part of the input signals and output signals of said implementable description. Essentially all of the input signals and output signals of said behavioral description can be essentially all of the input signals and output signals of said implementable description.
Said first set of objects has preferably first semantics and said second set of objects has preferably second semantics. With semantics is meant the model of computation. Said first semantics is preferably a data-vector model and/or a data-flow model. Said second semantics is preferably a Finite State Machine Data Path (FSMD) data structure, comprising a control part and a data processing part, the data processing part being modeled by a signal flow graph (SFG) data structure and the control part being modeled by a FSM data structure. The terms FSMD and SFr are used interchangeably throughout the text.
Preferably, the impact in said implementable description of at least a part of the objects of said second set of objects is essentially the same as the impact in said behavioral description of at least a part of the objects of said first set of objects.
Preferably, the impact in said implementable description of essentially all of the objects of said second set of objects is essentially the same as the impact in said behavioral description of essentially all of the objects of said first set of objects.
With impact is meant not only the function, but also the way the object interacts with its environment from an external point of view. A way of rephrasing this is that the same interface for providing input and collecting output is present. This does not mean that the actual implementation of the data-processing between input and output is the same. The implementation is embodied by objects, which can be completely different but perform a same function. In an OOPL, the use of methods of an object without knowing its actual implementation is referred to as information hiding.
The design apparatus preferably further comprises means for simulating the behavior of said system said means simulating the behavior of said behavioral description, said implementable description or any intermediate description therebetween. Said intermediate description can be obtained after one or several refining steps from said behavioral description.
Preferably, at least part of said second set of objects is derived from objects belonging to said first set of objects. This can be done by using the inheritance functionalities provided in an OOPL. Essentially all of said second set of objects can be derived from objects belonging to said first set of objects.
Said implementable description can be at least partly obtained by refining said behavioral description. Said implementable description can be essentially obtained by refining said behavioral description. Preferably, said refining comprises the refining of objects.
The design apparatus can further comprise means to derive said first set of objects from a vector description, preferably a MATLAB description, describing said system as a set of operations on data vectors, means for simulating statically or demand-driven scheduled dataflow on said dataflow description and/or means for clock-cycle true simulating said digital system using said dataflow description and/or one or more of said SFG data structures.
In a preferred embodiment, said implementable description is an architecture description of said system, said system advantageously further comprising means for translating said architecture description into a synthesizable description of said system, said synthesizable description being directly implementable in hardware. Said synthesizable description is preferably a netlist of hardware building blocks. Said hardware is preferably a semiconductor chip or a electronic circuit comprising semiconductor chips.
A synthesizable description is a description of the architecture of a semiconductor that can be synthesized without further processing of the description. An example is a VHDL description.
Said means for translating said architecture description into a synthesizable description can be Cathedral-3 or Synopsys DC.
A second aspect of the present invention is a method for designing a system comprising at least one digital part, comprising a refining step wherein a behavioral description of said system is transformed into an implementable description of said system, said behavioral description being represented as a first set of objects with a first set of relations therebetween and said implementable description being represented as a second set of objects with a second set of relations therebetween.
Said refining step preferably comprises translating behavioral characteristics at least partly into structural characteristics. Said refining step can comprise translating behavioral characteristics completely into structural characteristics.
Said method can further comprise a simulation step in which the behavior of said behavioral description, said implementable description and/or any intermediate description therebetween is simulated.
Said refining step can comprises the addition of new objects, permitting interaction with existing objects, and adjustments to said existing objects allowing said interaction.
Preferably, said refining step is performed in an open environment and comprises expansion of existing objects. Expansion of existing objects can include the addition to an object of methods that create new objects. Said object is said to be expanded with the new objects. The use of expandable objects allows to use meta-code generation: creating expandable objects implies an indirect creation of the new objects.
Said behavioral description and said implementable description are preferably represented in a single design environment, said single design environment advantageously being an Object Oriented Programming Language, preferably C++.
Preferably, said first set of objects has first semantics and said second set of objects has second semantics. Said first semantics is preferably a data-vector model and/or a data-flow model. Said second semantics is preferably an SFG data structure.
The refining step comprises preferably a first refining step wherein said behavioral description being a data-vector model is at least partly transformed into a data-flow model. Advantageously, said data-flow model is an untimed floating point data-flow model.
Said refining step preferably further comprises a second refining step wherein said data-flow model is at least partly transformed into an SFG model. Said data-flow model can be completely transformed into an SFG model.
In a preferred embodiment, said first refining step comprises the steps of determining the input vector lengths of input, output and intermediate signals, determining the amount of parallelism of operations that process input signals under the form of a vector to output signals, determination of objects, connections between objects and signals between objects of said data-flow model, and determining the wordlength of said signals between objects. In the sequel of this application, the term "actors" is also used to denote objects. Connections between objects are denoted as "edges" and signals between objects are denoted as "tokens". Said step of determining the amount of parallelism can preferably comprise determining the amount of parallelism for every data vector and reducing the unspecified communication bandwidth of said data-vector model to a fixed number of communication buses in said data-flow model. Said step of determination of actors, edges and tokens of said data-flow model preferably comprises defining one or a group of data vectors in said first data-vector model as actors; defining data precedences crossing actor bounds, as edges, said edges behaving like queues and transporting tokens between actors; construct a system schedule and run a simulation on a computer environment. Said second refining step comprises preferably transforming said tokens from floating point to fixed point. Preferably, said SFG model is a timed fixed point SFG model.
Said second set of objects with said second set of relations therebetween are preferably at least partly derived from said first set of objects with said first set of relations therebetween. Objects belonging to said second set of objects are preferably new objects, identical with and/or derived by inheritance from objects from said first set of objects, or a combination thereof.
Several of said SFG models can be combined with a finite state machine description resulting in an implementable description. Said implementable description can be transformed to synthesizable code, said synthesizable code preferably being VHDL code.
Another aspect of the present invention is a method for simulating a system, wherein a description of a system is transformed into compilable C++ code.
Preferably, said description is an SFG data structure and said compilable C++ code is used to perform clock cycle true simulations.
Several SFG data structures can be combined with a finite state machine description resulting in an implementable description, said implementable description being said compilable C++ code suitable for simulating said system as software.
A clock-cycle true simulation of a system uses one or more SFG data structures.
Said clock-cycle true simulation can be an expectation-based simulation, said expectation-based simulation comprising the steps of: annotating a token age to every token; annotating a queue age to every queue; increasing token age according to the token aging rules and with the travel delay for every queue that has transported the token; increasing queue age with the iteration time of the actor steering the queue, and; checking whether token age is never smaller than queue age throughout the simulation.
Another aspect of the present invention is a hardware circuit or a software simulation of a hardware circuit designed with the design apparatus as recited higher.
Another aspect of the present invention is a hardware circuit or a software simulation of a hardware circuit designed with the method as recited higher.
DETAILED DESCRIPTION OF THE INVENTION
The present invention will be further explained by means of examples, which does not limit the scope of the invention as claimed.
SHORT DESCRIPTION OF THE DRAWINGS
In FIGS. 1A, 1B, 1C and 1D, the overall design methodology according to an embodiment of the invention is described.
In FIG. 2, a targeted architecture of a system that is to be designed according to the invention is described.
In FIG. 3, the C++ modeling levels of target architecture are depicted.
In FIG. 4, an SDF model of the PN correlator of the target architecture of FIG. 2 is shown.
In FIG. 5, a CSDF model of the PN correlator is described.
In FIG. 6, a MATLAB Dataflow model of the PN correlator is shown.
In FIG. 7, the SFG modeling concepts are depicted.
In FIG. 8, the implied description of the max actor is described.
In FIG. 9, example implementations for different expectations are given.
In FIG. 10, an overview of expectation based simulation is shown.
In FIG. 11, the code in OCAPI, or design environment of the invention, for a correlator processor is given.
In FIG. 12, the resulting circuit for datapath and controller is hierarchically drawn.
FIG. 13 describes a DECT Base station setup.
FIG. 14 shows the front-end processing of the DECT transceiver.
In FIG. 15, a part of the central VLIW controller description for the DECT transceiver ASIC is shown.
In FIG. 16, the use of overloading to construct the signal flowgraph data structure is shown.
In FIG. 17, an example C++ code fragment and its corresponding data structure is described.
In FIG. 18, a graphical and C++-textual description of the same FSM is shown.
In FIG. 19, the final system architecture of the DECT transceiver is shown.
In FIG. 20, a data-flow target architecture is shown.
In FIG. 21, the simulation of one cycle in a system with three components is shown.
In FIG. 22, the implementation and simulation strategy is depicted.
In FIG. 23, an end-to-end model of a QAM transmission system is shown.
In FIG. 24, the system contents for the QAM transmission system is described.
The present invention can be described as a design environment for performing subsequent gradual refinement of descriptions of digital systems within one and the same object oriented programming language environment. The lowest level is semantically equivalent to a behavioral description at the register transfer (RT) level.
A preferred embodiment of the invention comprising the design method according to the invention is called OCAPI. OCAPI is part of a global design methodology concept SOC++. OCAPI includes both a design environment in an object oriented programming language and a design method. OCAPI differentiates from current systems that support architecture definition (SPW, COSSAP) in the way that a designer is guided from the MATLAB level to the register transfer level. This way, combined semantic and syntactic translations in the design flow are avoided. - The designer is offered a single coding framework in an object oriented programming language, such as C++, to express refinements to the behavior. An open environment is used, rather than the usual interface-and-module approach.
- The coding framework is a container of design concepts, used in traditional design practice. Some example design concepts currently supported are simulation queues, finite state machines, signal flowgraphs, hybrid floating/fixed point data types, operation profiling and signal range statistics. The concepts take the form of object oriented programming language objects (referred to as object in the remainder of this text), that can be instantiated and related to each other.
- With this set of objects, a gradual refinement design route is offered: more abstract design concepts can be replaced with more detailed ones in a gradual way. Also, design concepts are combined in an orthogonal way: quantization effects and clock cycles (operation/operator mapping) for instance are two architecture features that can be investigated separately. Next, the different design hierarchies can be freely intermixed because of this object-oriented approach. For instance, it is possible to simulate half of the description at fixed point level, while the other half is still in floating point.
- The use of a single object oriented programming language framework in OCAPI allows fast design iteration, which is not possible in the typical nowadays hybrid approach.
Comparing to existing data-flow-based systems like SPW and COSSAP we see that the algorithm iterations can be freely chosen. Comparing to existing hardware design environments like DC or BC, we see that we can start from a specification level that is more abstract than the connection of blocks.
Two concepts of scaleable parallelism and expectation based simulation are introduced. The designer is given an environment to check the feasibility of what the designer thinks that can be done. In the development process, the designer creates his library of Signal FlowGraph (SFG) versions of abstract MATLAB operations.
DESCRIPTION OF OCAPI, A PREFERRED EMBODIMENT OF THE PRESENT INVENTION
OCAPI is a C++ library intended for the design of digital systems. It provides a short path from a system design description to implementation in hardware. The library is suited for a variety of design tasks, including: - Fixed Point Simulations
- System Performance Estimation
- System Profiling
- Algorithm-to-Architecture Mapping
- System Design according to a Dataflow Paradigm
- Verification and Testbench Development
Development Flow
The Flow Layout
The design flow according to an embodiment of the present invention, as shown in FIG. 1D, starts off with an untimed, floating point C++ system description 101. Since data-processing intensive applications such as all-digital transceivers are targeted, this description uses data-flow semantics. The system is described as a network of communicating components.
At first, the design is refined, and in each component, features expressing hardware implementation are introduced, including time (clock cycles) and bittrue rounding effects. The use of C++ allows to express this in an elegant way. Also, all refinement is done in a single environment, which greatly speedups the design effort.
Next, the timed, bittrue C++ description 103 is translated into an equivalent HDL description by code generation. For each component, a controller description 105 and a datapath description 107 can be generated. Also for each component a single HDL description can be generated, this description preferably jointly representing the control processing and data processing of the component. This is done because OCAPI relies on separate synthesis tools for both parts, each one optimized towards controller or else datapath synthesis tasks. Through the use of an appropriate object modeling hierarchy the generation of datapath and controller HDL can be done fully automatic.
For datapath synthesis 109, OCAPI relies on the Cathedral-3 datapath synthesis tools, that allow to obtain a bitparallel hardware implementation starting from a set of signal flowgraphs. Controller synthesis 111 on the other hand is done by the logic synthesis of Synopsys DC. This divide and conquer strategy towards synthesis allows each tool to be applied at the right place.
During system simulation, the system stimuli 113 are also translated into testbenches that allow to verify the synthesis result of each component. After interconnecting all synthesized components into the system netlist, the final implementation can also be verified using a generated system testbench 115.
The System Model
The system machine model that is used is a set of concurrent processes. Each process translates to one component in the final system implementation.
At the system level, processes execute using data flow simulation semantics. That is, a process is described as an iterative behavior, where inputs are read in at the start of an iteration, and outputs are produced at the end. Process execution can start as soon as the required input values are available.
Inside of each process, two types of description are possible. The first one is an untimed description, and can be expressed using any C++ constructs available. A firing rule is also added to allow dataflow simulation. Untimed processes are not subject to hardware implementation but are needed to express the overal system behavior. A typical example is a channel model used to simulate a digital transceiver.
The second flavor of processes is timed. These processes operate synchronously to the system clock. One iteration of such a process corresponds to one clock cycle of processing. Such a process falls apart in two pieces: a control description and a data processing description.
The control description is done by means of a finite state machine, while the data description is a set of instructions. Each instruction consists of a series of signal assignments, and can also define process in- and outputs. Upon execution, the control description is evaluated to select one or more instructions for execution. Next, the selected instructions are executed. Each instruction thus corresponds to one clock cycle of RT behavior.
For system simulation, two schedulers are available. A dataflow scheduler is used to simulate a system that contains only untimed blocks. This scheduler repeatedly checks process firing rules, selecting processes for execution as their inputs are available. When the system also contains timed blocks however, a cycle scheduler is used. The cycle scheduler manages to interleave execution of multi-cycle descriptions, but can incorporate untimed blocks as well.
The Standard Program
The library of OCAPI has been developed with the g++ C++ GNU compiler. The best mode embodiment uses the g++ 2.8.1 compiler, and has been successfully compiled and run under the HPUX 10 (HPUX10) operating system platform. It is also possible to use a g++ 2.7.2 compiler, allowing for compilation and run under operating system platforms such as HPUX-9 (HPRISC), HPUX-10 (HPUX10), SunOS (SUN4), Solaris (SUN5) and Linux 2.0.0 (LINUX).
The layout of the 'standard' g++ OCAPI program will be explained, including compilation and linking of this program.
First of all, g++ is a preferred standard compilation environment. On Linux, this is already the case after installation. Other operating system vendors however usually have their own proprietary C++ compiler. In such cases, the g++ compiler should be installed on the operating system, and the PATH variable adapted such that the shell can access the compiler.
The OCAPI library comes as a set of include files and a binary lib. All of these are put into one directory, which is called the BASE directory.
The 'standard program' is the minimal contents of an OCAPI program. It has the following layout.
| | | | | include ''qlib.h'' | | | int main ( ) | | | { | | | //your program goes here | | | } | | | |
The include "qlib.h" includes everything you need to access all classes within OCAPI.
If this program is called "standard.cxx", then the following makefile will transform the source code into an executable for you:
| | | | | HOSTTYPE = HPUX10 | | | BASE = /imec/vsdm/OCAPI/release/v0.9 | | | CC = g++ | | | QFLAGS = -c -g -Wall -I${BASE} | | | LIBS = -lm | | | %.o: %.cxx | | | TARGET = standard | | | all: $(TARGET) | | | define lnkqlib | | | $(CC) ${circumflex over ( )} -o $@ $(LIBS) | | | endef | | | OBJS = standard.o | | | standard:${OBJS} $(BASE)/lib$(HOSTTYPE) qlib.a | | | clean: | | | rm -f *.o $(TARGET) | | | |
This is a makefile for GNU's "make"; other "make" programs can have a slightly different syntax, especially for the definition of the "lnkqlib" macro. It is not the shortest possible solution for a makefile, but it is one that works on different platforms without making assumptions about standard compilation rules.
The compilation flags "QFLAGS" mean the following: "-c" selects compilation-only, "-g" turns on debugging information, and "-Wall" is the warning flag. The debugging flag allows you to debug your program with "gdb", the GNU debugger.
Even if you don't like a debugger and prefer "printf( )" debugging, "gdb" can at least be of great help in the case the program core dumps. Start the program under "gdb" (type "gdb standard" at the shell prompt), type "run" to let "standard" crash again, and then type "bt". One now see the call trace.
Calculation
OCAPI processes both floating point and fixed point values. In contrast to the standard C++ data types like "int" and "double", a "hybrid" data type class is used, that simulates both fixed point and floating point behavior.
The dfix Class
This class is called "dfix". The particular floating/fixed point behavior is selected by the class constructor. The standard format of this constructor is
| | | | | dfix a; // a floating point value | | | dfix a (0.5);// a floating point value with initial value | | | dfix a (0.5, 10, 8); | | | // a fixed point value with initial value, | | | // 10 bits total word-length, 8 fractional bits | | | |
A fixed point value has a maximal precision of the mantissa precision of a C++ "double". On most machines, this is 53 bits.
A fixed point value can also select a representation, an overflow behavior, and a rounding behavior. These flags are, in this order, optional parameters to the "dfix" constructor. They can have the following values. - Representation flag: "dfix::tc" for two's complement signed representation, "dfix::ns" for unsigned representation.
- Overflow flag: "dfix::wp" for wrap-around overflow, "dfix::st" for saturation.
- Rounding flag: "dfix::fl" for truncation (floor), "dfix::rd" forrounding behavior.
Some examples are
| | // the default is two's complement, wrap-around, | | | // truncated quantisation | | | dfix a(0.5, 10, 8, dfix::tc, dfix::st, dfix::rd); | | | // two's complement, saturation, rounding quantisation | | | dfix a(0.5, 10, 8, dfix::ns); | | | // unsigned, wrap-around, truncated quantisation | | | |
When working with fixed point "dfix"es, it is important to keep the following rule in mind: "quantisation occurs only when a value is defined or assigned". This means that a large expression with several intermediate results will never have these intermediate values quantised. Especially when writing code for hardware implementation, this should be kept in mind. Also intermediate results are stored in finite hardware and therefore will have some quantisation behavior. There is however a a "cast" operator that will come at help here.
The dfix Operators
The operators on "dfix" are shown below - +, -, *, /
- Standard addition, subtraction (including unary minus), multiplication and division.
- +=, -=, *=, /=
- In-place versions of previous operators.
- abs
- <<, >>
- <<=, >>=
- In place left and right shifts.
- msbpos
- Most-significant bit position.
- &, |, ^, ˜
- Bitwise and, or, exor, and not operators.
- frac( ) (member call)
- ==, !=, <=, >=, <, >
- Relational operators: equal, different, smaller then or equal to, greater then or equal to, smaller then, greater then. These return an "int" instead of a "dfix".
All operators with exception of the bitwise operators work on the maximal fixed point precision (53 points). The bitwise operators have a precision of 32 bits (a C++ "long"). Also, they assume the fixed point representation contains no fractional bits.
In addition to the arithmetic operators, several utility methods are available for the "dfix" class.
| | | | | dfix a,b; | | | // cast a to another type | | | b = cast(dfix(0, 12, 10), a); | | | // assign b to a, retaining the quantisation of a | | | a = b; | | | // assign b to a, including the quantisation | | | a.duplicate(b); | | | // return the integer part of b | | | int c = (int) b; | | | // retrieve the value of b as a double | | | double d,e: | | | d = b.Val( ); | | | e = Val (b); | | | // return quantisation characteristics of a | | | a.TypeW( ); | // returns the number of bits | | | a.TypeL( ); | // returns the number of fractional bits | | | a.TypeSign( ); | // returns dfix::tc or dfix::ns | | | a.TypeOverflow( ); | // returns dfix::wp or dfix::st | | | a.TypeRound( ); | // returns dfix::fl or dfix::rd | | | // check if two dfixes are identical in value and | | | quantisation | | | identical(a,b); | | | // see wether a is floating or fixed point | | | a.TypeMode( ); // returns dfix::fixpoint or dfix::floatpoint | | | a.isDouble( ); | | | a.isFix( ); | | | // write a to cout | | | cout << a; | | | // write a to stdout, in float format, | | | // on a field of 10 characters | | | write (cout, a, 'f', 10); | | | // now use a fixed-format | | | write (cout, a, 'g', 10); | | | // next assume a is a fixed point number, and write out an | | | // integer representation (considering the decimal point at | | | // the lsb of a) use a hexadecimal format | | | write (cout, a, 'x', 10); | | | // use a binary format | | | write (cout, a, 'b', 10); | | | // use a decimal format | | | write (cout, a, 'd', 10); | | | // read a from stdin | | | cin >> a; | | | | Communication
Apart from values, OCAPI is concerned with the communication of values in between blocks of behavior. The high level method of communication in OCAPI is a FIFO queue, of type "dfbfix". This queue is conceptually infinite in length. In practice it is bounded by a sysop phonecall telling that you have wasted up all the swap space of the system.
The dfbfix Class
A queue is declared as
dfbfix a("a");
This creates a queue with name a. The queue is intented to pass value objects of the type "dfix". There is also an alias type of "dfbfix", known as "FB" (flow buffer). So you can also write
The dfbfix Operations
The basic operations on a queue allow to store and retrieve "dfix" objects. The operations are
| | | | | dfix k; | | | dfix j(0.5); | | | dfbfix a(''a''); | | | // insert j at the front of a | | | a.put(j); | | | // operator format for an insert | | | a << j; | | | // insert j at position 5, with position 0 corresponding to | | | // the front of a. | | | a.putIndex(j,5); | | | // read one element from the back of a | | | k = a.get( ); | | | // operator format for a read | | | a >> j; | | | // peek one element at position 1 of a | | | k = a.getIndex(1); | | | // operator format for peek | | | k = a[1]; | | | // retrieve one element from a and throw it | | | a.pop( ); | | | // throw all elements, if any, from a | | | a.clear ( ); | | | // return the number of elements in a as an int | | | int n = a.getSize( ); | | | // return the name of the queue | | | char *p = a.name( ); | | | |
Whenever you perform an access operation that reads past the end of a FIFO, a runtime error results, showing Queue Underflow @ get in queue a
Utility Calls for dfbfix
Besides the basic operations on queues, there are some additional utiliy operations that modify a queue behavior
| | | | | // make a queue of length 20. The default length of a queue | | | // is 16. Whenever this length is exceeded by a put, the | | | // storage in the queue is dynamically expanded by a factor | | | // of 2. | | | dfbfix a(''a'', 20); | | | // After the asType( ) call, the queue will have an input | | | // ''quantizer'' that will quantize each element inserted | | | // into the queue to that of the quantizer type | | | dfix q(0, 10, 8); | | | a.asType(q); | | | // After an asDebug( ) call, the queue is associated with a | | | // file, that will collect every value written into the | | | // queue. The file is opened as the queue is initialized | | | // and closed when the queue object is destroyed. | | | a.asDebug (''thisfile.dat''); | | | // Next makes a duplicate queue of a, called b. Every write | | | // into a will also be done on b. Each queue is allowed to | | | // have at most ONE duplicate queue. | | | dfbfix b(''b''); | | | a.asDup(b); | | | // Thus, when another duplicate is needed, you write is as | | | dfbfix c(''c''); | | | b.asDup(c); | | | |
During the communication of "dfix" objects, the queues keep track of some statistics on the values that are passed through it. You can use the "<<" operator and the member function "stattitle( )" to make these statistics visible.
The next program demonstrates these statistics
| | | | | #include "qlib.h" | | | void main( ) | | | { | | | dfbfix a ("a"); | | | a << dfix(2); | | | a << dfix(l); | | | a << dfix(3); | | | a.stattitle(cout); | | | cout << a; | | | } | | | |
When running this program, the following appears on screen
| | Name | put | get | MinVal | @idx | MaxVal | @idx | Max# | ®idx | | | A | 3 | 0 | l.0000e+00 | 2 | 3.0000e+00 | 3 | 3 | 3 | |
The first line is printed by the "stattitle( )" call as a mnemonic for the fields printed below. The next line is the result of passing the queue to the standard output stream object. The fields mean the following:
| | | | | Name | The name of the queue | | | put | The total number of elements "put ( )" into | | | | the queue | | | get | The total number of elements "get ( )" from | | | | the queue | | | MinVal | The lowest element put onto the queue | | | @idx | The put sequential number that passed | | | | this lowest element | | | MaxVal | The highest element put onto the queue | | | @idx | The put sequential number that passed | | | | this highest element | | | Max# | The maximal queue length that occurred | | | @idx | The put sequential number that resulted ion | | | | this maximal queue length | | | |
Globals and Derivatives for dfbfix
There are two special derivates of "dfbfix". Both are derived classes such that you can use them wherever you would use a "dfbfix". Only the first will be discussed here, the other one is related to cycle-true simulation and is discussed in section "Faster Communications".
The "dfbfix_nil" object is like a "/dev/null" drain. Every "dfix" written into this queue is thrown. A read operation from such a queue results in a runtime error.
There are two global variables related to queues. The "listOfFB" is a pointer to a list of queues, containing every queue object you have declared in your program. The member function call "nextFB( )" will return the successor of the queue in the global list. For example, the code snippet
| | | | | dfbfix *r; | | | for ( r = listOfFB ; r ; r = r->nextFB ( ) ) | | | { | will walk trough all the queues present in the OCAPI program.
The other global variable is "nilFB", which is of the type "dfbfix_nil". It is intended to be used as a global trashcan.
The Basic Block
OCAPI supports the dataflow simulation paradigm. In order to define the actors to the system, one "base" class is used, from which all actors will inherit. In order to do untimed simulations, one should follow a standard template to which new actor classes must conform. In this section, the standard template will be introduced, and the writing style is documented.
Basic Block Include and Code File
Each new actor in the system is defined with one header file and one source code C++ file. We define a standard block, "add", which performs an addition.
The include file, "add.h", looks like
| | | | | #ifndef ADD_H | | | #define ADD_H | | | #include ''qlib.h'' | | | class add : public base | | | { | | | add (char *name, FB & _in1, FB & _in2, FB & _o1); | | | int run( ); | | | FB *in1; | | | FB *in2; | | | FB *o1; |
This defines a class "add", that inherits from "base". The "base" object is the one that OCAPI likes to work with, so you must inherit from it in order to obtain an OCAPI basic block.
The private members in the block are pointers to communication queues. Optionally, the private members should also contain state, for example the tap values in a filter. The management of state for untimed blocks is entirely the responsibility of the user; as far as OCAPI is concerned, it does not care what you use as extra variables.
The public members include a constructor and an execution call "run". The constructor must at least contain a name, and a list of the queues that are used for communication. Optionally, some parameters can be passed, for instance in case of parametrized blocks (filters with a variable number of taps and the like).
The contents of the adder block will be described in "add.cxx".
| | | | | #include ''add.cxx'' | | | add::add(char *name, FB & _in1, FB & _in2, FB & _o1): | | | base(name) | | | { | | | in1 = _in1.asSource(this); | | | in2 = _in2.asSource(this); | | | o1 = _o1.asSink (this); | | | // firing rule | | | if (in1->getSize( ) < 1) | | | return 0; | | | if (in2->getSize( ) < 1) | | | return 0 ; | | | o1->put (in1->get( ) + in2->get( )); | | | return 1; |
The constructor passes the name of the object to the "base" class it inherits from. In addition, it initializes private members with the other parameters. In this example, the communication queue pointers are initialized. This is not done through simple pointer assignment, but through function calls "asSource" and "asSink". This is not obligatory, but allows OCAPI to analyze the connectity in between the basic blocks. Since a queue is intended for point-to-point communication, it is an error to use a queue as input or ouput more then once. The function calls "asSource" and "asSink" keep track of which blocks source/sink which queues. They will return a runtime error in case a queue is sourced or sinked more then once. The constructor can optionally also be used to perform initialization of other private data (state for instance). The "run( )" method contains the operations to be performed when the block is invoked. The behavior is described in an iterative way. The "run" function must return an integer value, 1 if the block succeeded in performing the operation, and 0 if this has failed.
This behavior consists of two parts: a firing rule and an operative part. The firing rule must check for the availability of data on the input queues. When no sufficient data is present (checked with the "getSize( )" member call), it stops execution and returns 0. When sufficient data is present, execution can start. Execution of an untimed behavior can use the different C++ control constructs available. In this example, the contents of the two input queues is read, the result is added and put into the ouput queue. After execution, the value 1 is returned to signal the behavior has completed.
Predefined Standard Blocks: File Sources and Sinks
The OCAPI library contains three predefined standard blocks, which is a file source "src", a file sink "snk", and a ram storage block "ram".
The file sources and sinks define operating system interfaces and allow you to bring file data into an OCAPI simulation, and to write out resulting data to a file. The examples below show various declarations of these blocks. Data in these files is formatted as floating point numbers separated by white space. For output, newlines are used as whitespace.
| | | | | // define a file source block, with name a, that will read | | | // data from the file ''in.dat'' and put it into the queue k | | | dfbfix k(''k''); | | | src a(''a'', k, ''in.dat''); | | | // an alternative definition is | | | dfbfix k(''k''); | | | src a (''a'', k); | | | a.setAttr(src::FILENAME, ''in.dat''); | | | // which also gives you a complex version | | | dfbfix k1(''k1''); | | | dfbfix k2(''k2''); | | | src a (''a'', k1, k2); | | | a.setAttr (src::FILENAME, ''in.dat''); | | | // define a sink block b, that will put data from queue o | | | // into a file ''out.dat''. | | | dfbfix o(''o''); | | | snk b(''b'', o, ''out.dat''); | | | // an alternative definition is | | | dfbfix o(''o''); | | | snk b(''b'', o); | | | b.setAttr (snk::FILENAME, ''out.dat''); | | | // which gives one also a complex version | | | dfbfix o1(''o1''); | | | dfbfix o2 (''o2''); | | | snk b(''b", o1, o2); | | | b.setAttr (snk::FILENAME, ''out.dat''); | | | // the snk mode has also a matlab-goodie which will format | | | // output data into a matrix A that can be read in directly | | | // by Matlab. | | | dfbfix o(''o''); | | | snk b(''b'', o, ''out.m''); | | | b.setAttr (snk::FILENAME, ''out.m''); | | | b.setAttr(snk::MATLABMODE, 1); | | | |
Predefined Standard Blocks: RAM
The ram untimed block is intended to simulate single-port storage blocks at high level. By necessity, some interconnect assumptions had to be made on this block. On the other hand, it is supported all the way through code generation.
OCAPI does not generate RAM cells. However, it will generate appropriate connections in the resulting system netlist, onto which a RAM cell can be connected.
The declaration of a ram block is as follows.
| | | | | // make a ram a, with an address bus, a data input bus, a | | | // data output bus, a read command line, a write command | | | // line, with 64 locations | | | dfbfix address(''address''); | | | dfbfix data_in(''data_in''); | | | dfbfix data_out(''data_out''); | | | dfbfix read_c(''read_c''); | | | dfbfix write_c(''write_c''); | | | ram a (''a'',address,data_in,data_out,write_c,read_c,64); | | | // clear the ram | | | a.clear( ); | | | // fill the ram with the linear sequence data = k1+address | | | // * k2; | | | a.fil(k1, k2); | | | // dump the contents of a to cout | | | a.show( ); | | | |
The execution semantics of the ram are as follows. For each read or write, an address, a read command and a write command must be presented. If the read command equals "dfix(1)", a read will be performed, and the value stored at the location presented through "address" will be put on "data_out". If the read command equals any other value, a dummy byte will be presented at "data_out". If no read command was presented, no data will be presented on "data_out". For writes, an identical story holds for reads on the "data_in" input: whenever a write command is presented, the data input will be consumed. When the write command equals 1, then the data input will be stored in the location provided through "address". When a read and write command are given at the same time, then the read will be performed before the write. The ram also includes an online "purifier" that will generate a warning message whenever data from an unwritten location is read.
Untimed Simulations
Given the descriptions of one or more untimed blocks, a simulation can be done. The description of a simulation requires the following to be included in a standard C++ "main( )" procedure: - The instantiation of one or more basic blocks.
- The instantiation of one or more communication queues that interconnect the blocks
- The setup of stimuli. Either these can be included at runtime by means of the standard file source blocks, or else dedicated C++ code can be written that fills up a queue with stimuli.
- A schedule that drives the execution methods of the basic blocks.
A schedule, in general, is the specification of the sequence in which block firing rules must be tested (and fired if necessary) in order to run a simulation. There has been quite some research in determining how such a schedule can be constructed automatically from the interconnection network and knowledge of the block behavior. Up to now, an automatic mechanism for a general network with arbitrary blocks has not been found. Therefore, OCAPI relies on the designer to construct such a schedule.
Layout of an Untimed Simulation
In this section, the template of the standard simulation program will be given, along with a description of the "scheduler" class that will drive the simulation. A configuration with the "adder" block (described in the section on basic blocks) is used as an example.
| | | | | #include ''qlib.h" | | | #include ''add.h" | | | void main( ) | | | { | | | dfbfix i1("i1"); | | | dfbfix i2("i2"); | | | dfbfix o1("o1") ; | | | src SRC1("SRC1", i1,"SRC1"); | | | src SRC2("SRC2", i2, "SRC2"); | | | add ADD ("ADD", i1, 12, o1); | | | snk SNK1("SNK1", o1, "SNK1"); | | | schedule S1 ("S1"); | | | S1.next(SRC1); | | | S1.next(SRC2); | | | S1.next(ADD ); | | | S1.next(SNK1); | | | while (S1.run( )) ; | | | i1.stattitle(cout); | | | cout << i1; | | | cout << i2; | | | cout << o1; | | | } | | | |
The simulation above instantiates three communication buffers, that interconnect four basic blocks. The instantiation defines at the same time the interconnection network of the simulation. Three of the untimed blocks are standard file sources and sinks, provided with OCAPI. The "add" block is a user defined one.
After the definition of the interconnection network, a schedule must be defined. A simulation schedule is constructed using "schedule" objects. In the example, one schedule object is defined, and the four blocks are assigned to it by means of a "next( )" member call.
The order in which "next( )" calls are done determines the order in which firing rules will be tested. For each execution of the schedule object "S1" the "run( )" methods of "SRC1", "SRC2", "ADD" and "SNK1" are called, in that order. The execution method of a scheduler object is called "run( )". This function returns an integer, equal to one when at least on block in the current iteration has executed (i.e. the "run( )" of the block has returned one). When no block has executed, it returns zero.
The while loop in the program therefore is an execution of the simulation. Let us assume that the directory of the simulator executable contains the two required stimuli files, "SRC1" and "SRC2". Their contents is as follows
| | | | | SRC1 | SRC2 | -- not present in the file | | | ---- | ---- | -- not present in the file | | | 1 | 4 | | | 2 | 5 | | | 3 | 6 | | | |
When compiling and running this program, the simulator responds: *** INFO: Defining block SRC1 *** INFO: Defining block SRC2 *** INFO: Defining block ADD *** INFO: Defining block SNK1
| | Name | put | get | MinVal | @idx | MaxVal | @idx | Max# | @idx | | | i1 | 3 | 3 | l.0000e+00 | 1 | 3.0000e+00 | 3 | 1 | 1 | | i2 | 3 | 3 | 4.0000e+00 | 1 | 6.0000e+00 | 3 | 1 | 1 | | o1 | 3 | 3 | 5.0000e+00 | 1 | 9.0000e+00 | 3 | 1 | 1 | | and in addition has created a file "SNK1", containing SNK1—not present in the file ----—not present in the file 5.000000e+00 7.000000e+00 9.000000e+00
The "INFO" message appearing on standard output are a side effect of creating a basic block. The table at the end is produced by the print statements at the end of the program.
More on Schedules
If you would examine closely which blocks are fired in which iteration, (for instance with a debugger) then you would find
| | run SRC1 => i1 contains 1.0 | | | run SRC2 => i2 contains 4.0 | | | run ADD => o1 contains 5.0 | | | run SNK1 => write out o1 | | | schedule.run( ) returns 1 | | | iteration 2 | | | run SRC1 => i1 contains 2.0 | | | run SRC2 => i2 contains 5.0 | | | run ADD => o1 contains 7.0 | | | run SNK1 => write out o1 | | | schedule.run( ) returns 1 | | | iteration 3 | | | run SRC1 => i1 contains 3.0 | | | run SRC2 => i2 contains 6.0 | | | run ADD => o1 contains 9.0 | | | run SNK1 => write out o1 | | | schedule.run( ) returns 1 | | | iteration 4 | | | run SRC1 => at end-of-file, fails | | | run SRC2 => at end-of-file, fails | | | run ADD => no input tokens, fails | | | run SNK1 => no input tokens, fails | | | schedule.run( ) returns 0 => end simulation | | | |
There are two schedule member functions, "traceOn( )" and "traceOff( )", that will produce similar information for you. If you insert S.traceOn( ); just before the while loop, then you see *** INFO: Defining block SRC1 *** INFO: Defining block SRC2 *** INFO: Defining block ADD *** INFO: Defining block SNK1 S1 [SRC1 SRC2 ADD SNK1] S1 [SRC1 SRC2 ADD SNK1] S1 [SRC1 SRC2 ADD SNK1] S1 [ ]
| | Name | put | get | MinVal | @idx | MaxVal | @idx | Max# | @idx | | | i1 | 3 | 3 | l.0000e+00 | 1 | 3.0000e+00 | 3 | 1 | 1 | | i2 | 3 | 3 | 4.0000e+00 | 1 | 6.0000e+00 | 3 | 1 | 1 | | o1 | 3 | 3 | 5.0000e+00 | 1 | 9.0000e+00 | 3 | 1 | 1 | | appearing on the screen. This trace feature is convenient during schedule debugging.
In the simulation ouput, you can also notice that the maximum number of tokens in the queues never exceeds one. When you had entered another schedule sequence, for example
| | S1.next(ADD ); | | | S1.next(SRC2); | | | S1.next(SRC1); | | | S1.next(SNK1); | | | | then you would notice that the maximum number of tokens on the queues would result in different figures. On the other hand, the resulting data file, "SNK1", will contain exactly the same results. This demonstrates one important property of dataflow simulations: any arbitrary but consistent schedule yields the same results. only the required amount of storage will change from schedule to schedule.
In multirate systems, it is convenient to have different schedule objects and group all blocks working on the same rate in one schedule.
Profiling in Untimed Simulations
Untimed simulations are not targeted to circuit implementation. Rather, they have an explorative character. Besides the queue statistics, OCAPI also enables you to do precise profiling of operations. The requirement for this feature is that - You use "schedule" objects to construct the simulation
- You describe block behavior with "dfix" objects
Profiling is by default enabled. To view profiling results, you send the schedule object under consideration to the standard output stream. In the "main" example program given above, you can modify this as
| | | | | include ''qlib.h'' | | | include ''add.h'' | | | void main( ) | | | { | | | . . . | | | schedule S1("S1"); | | | . . . | | | cout << S1; |
When running the simulation, you will see the following appearing on stdout: *** INFO: Defining block SRC1 *** INFO: Defining block SRC2 *** INFO: Defining block ADD *** INFO: Defining block SNK1
| | Name | put | get | MinVal | @idx | MaxVal | @idx | Max# | @idx | | | i1 | 3 | 3 | 1.0000e+00 | 1 | 3.0000e+00 | 3 | 1 | 1 | | i2 | 3 | 3 | 4.0000e+00 | 1 | 6.0000e+00 | 3 | 1 | 1 | | o1 | 3 | 3 | 5.0000e+00 | 1 | 9.0000e+00 | 3 | 1 | 1 | |
Schedule S1 ran 4 times:
| | | | | SRC1 | 3 | | | SRC2 | 3 | | | ADD | 3 | | | + | 3 | | | SNK1 | 3 | | | |
For each schedule, it is reported how many times it was run. Inside each schedule, a firing count of each block is given. Inside each block, an operation execution count is given. The simple "add" block gives the rather trivial result that there were three additions done during the simulation.
The gain in using operation profiling is to estimate the computational requirement for each block. For instance, if you find that you need to do 23 multiplications in a block that was fired 5 times, then you would need at least five multipliers to guarantee the block implementation will need only one cycle to execute.
Finally, if you want to suppress operation profiling for some blocks, then you can use the member function call "noOpsCnt( )" for each block. For instance, writing ADD.noOpsCnt( ); suppresses operation profiling in the ADD block. Implementation
The features presented in the previous sections contain everything you need to do untimed, high level simulations. These kind of simulations are useful for initial development. For real implementation, more detail has to be added to the descriptions.
OCAPI makes few assumptions on the target architecture of your system. One is that you target bitparallel and synchronous hardware. Synchronicity is not a basic requirement for OCAPI. The current version however constructs single-thread simulations, and also assumes that all hardware runs at the same clock. If different clocks need to be implemented, then a change to the clock-cycle true simulation algorithm will have to be made. Also, it is assumed that one basic block will eventually be implemented into one processor.
One question that comes to mind is how hardware sharing between different basic blocks can be expressed. The answer is that you will have to construct a basic block that merges the two behaviors of two other blocks. Some designers might feel reluctant to do this. On the other hand, if you have to write down merged behavior, you will also have to think about the control problems that are induced from doing this merging. OCAPI will not solve this problem for you, though it will provide you with the means to express it.
Before code generation will translate a description to an HDL, one will have to take care of the following tasks: - One will have to specify wordlengths. The target hardware is capable of doing bitparallel, fixed point operations, but not of doing floating point operations. One of the design tasks is to perform the quantisation on floating point numbers. The "dfix" class discussed earlier contains the mechanisms for expressing fixed point behavior.
- One will have to construct a clock-cycle true description. In constructing this description, one will not have to allocate actual hardware, but rather express which operations one expects to be performed in which clock cycle. The semantical model for describing this is clock cycle true behavior consists of a finite state machine, and a set of signal flow graphs. Each signal flow graph expresses one cycle of implemented behavior. This style of description splits the control operations from data operations in your program. In contrast, the untimed description you have used before has a common representation of control and data.
OCAPI does not force an ordening on these tasks. For instance, one might first develop a clock cycle true description on floating point numbers, and afterwards tackle the quantization issues. This eases verification of the clock-cycle true circuit to the untimed high level simulation.
The final implementation also assumes that all communication queues will be implemented as wiring. They will contain no storage, nor they will be subject to buffer synthesis. In a dataflow simulation, initial buffering values can however be necessary (for instance in the presence of feedback loops). In OCAPI, such a buffer must be implemented as an additional processor that incorporates the required storage. The resulting system dataflow will become deadlocked because of this. The cycle scheduler however, that simulates timed descriptions, is clever enough to look for these 'initial tokens' inside of the descriptions.
In the next sections, the classes that allow you to express clock cycle true behavior are introduced.
Signals and Signal Flowgraphs
Some initial considerations on signals are introduced first.
Hardware Versus Software
Software programs always use memory to store variables. In contrast, hardware programs work with signals, which might or might not be stored into a register. This feature can be expressed in OCAPI by using the "_sig" class. Simply speaking, a "_sig" is a "dfix" for which one has indicated whether is needs storage or not.
In implementation, a signal with storage is mapped to a net driven by a register, while an immediate signal is mapped to a net driven by an operator.
Besides the storage issue, a signal also departs from the concept of "scope" one uses in a program. For instance, in a function one can use local variables, which are destroyed (i.e. for which the storage is reclaimed) after one has executed the function. In hardware however, one controls the signal-to-net mapping by means of the clock signal.
Therefore one have to manage the scope of signals. The signal scope is expressed by using a signal flowgraph object, "sfg". A signal flowgraph marks a boundary on hardware behavior, and will allow subsequent synthesis tools to find out operator allocation, hardware sharing and signal-to-net mapping.
The _sig Class and Related Operations
Hardware signals can expressed in three flavors. They can be plain signals, constant signals, or registered signals. The following example shows how these three can be defined.
| | | | | // define a plain signal a, with a floating point dfix | | | // inside of it. | | | _sig a(''a''); | | | // define a plain signal b, with a fixed point dfix inside | | | // of it. | | | _sig b(''b'', dfix(0,10,8)); | | | // define a registered signal c, with an initial value k | | | // and attached to a clock ck. | | | dfix k(0.5); | | | clk ck; | | | _sig c(''c'', ck, k); | | | // define a constant signal d, equal to the value k | | | _sig d(k); | | | |
The registered signals, and more in particular the clock object, are explained more into detail when signal flowgraphs and finite state machines are discussed. This section concentrates on operations that are available for signals.
Using signals and signal operations, one can construct expressions. The signal operations are a subset of the operations on "dfix". This is because there is a hardware operator implementation behind each of these operations. - +, -, *
- Standard addition, subtraction (including unary minus), multiplication
- &, |, ^, ˜
- Bitwise and, or, exor, and not operators
- ==, !=, <=, >=, <, >
- <<, >>
- s.cassign(s1,s2)
- Conditional assignment with s1 or s2 depending on s
- cast(T,s)
- Convert the type of s to the type expressed in "dfix" T
- lu(L,s)
- Use s as in index into lookuptable L and retrieve
- msbpos(s)
- Return the position of the msb in s
Precision considerations are the same as for "dfix". That is, precision is at most the mantissa precision of a double (53 bits). For the bitwise operations, 32 bits are assumed (a long). "cast", "lu" and "msbpos" are not member but friend functions. In addition, "msbpos" expects fixed-point signals.
| | | | | _sig a(''a''); | | | _sig b(''b''); | | | _sig c(''c''); | | | // some simple operations | | | c = a + b; | | | c = a - b; | | | c = a * b; | | | // bitwise operations works only on fixed point signals | | | _sig e(dfix(0xff, 10, 0)); | | | _sig d(''d'',dfix(0,10,0)); | | | _sig f(''f'' ,dfix(0,10,0)); | | | f = d & e; | | | f = d | e; | | | f = ˜d; | | | f = d {circumflex over ( )} _sig(dfix(3,10,0)); | | | // shifting | | | // a dfix is automatically promoted to a constant _sig | | | f = d << dfix (3,8,0); | | | // conditional assignment | | | f = (d < dfix(2,10,0)).cassign(e,d); | | | // type conversion is done with cast | | | _sig g(''g'',dfix(0,3,0)); | | | g = cast (dfix(0,3,0), d); | | | // a lookup table is an array of unsigned long | | | unsigned long j = {1, 2, 3, 4, 5}; | | | // a lookuptable with 5 elements, 3 bits wide | | | lookupTable j_lookup (''j_lookup'', 5, dfix (0,3,0)) = j; | | | // find element 2 | | | g = lu(j_lookup, dfix (2,3,0)); | | | |
If one is interested in simulation only, then one should not worry too much about type casting and the like. However, if one intends implementation, then some rules are at hand. These rules are induced by the hardware synthesis tools. If one fails to obey them, then one will get a runtime error during hardware synthesis. - All operators, apart from multiplication, return a signal with the same wordlength as the input signal.
- Multiplication returns a wordlength that is the sum of the input wordlengths.
- Addition, subtraction, bitwise operations, comparisons and conditional assignment require the two input operands to have the same wordlength.
Some common pitfalls that result of this restriction are the following. - Intermediate results will, by default, not expand wordlength. In contrast, operations on dfix do not loose precision on intermediate results. For example, shifting an 8 bit signal up 8 positions will return you the value of zero, on 8 bits. If you want too keep up the precision, then you must first cast the operation to the desired output wordlength, before doing the shift.
- The multiplication operator increases the wordlength, which is not automatically reduced when you assign the result to a signal of smaller with. If you want to reduce wordlength, then you must do this by using a cast operation.
For complex expressions, these type promotion rules look a bit tedious. They are however used because they allow you to express behavior precisely downto the bit level. For example, the following piece of code extracts each of the bits of a three bit signal: _sig threebits(dfix(6,3,0)); dfix bit(0,1,0); _sig bit2("bit2"), bit1("bit1"), bit0("bit0"); bit2=cast(bit, threebits>>dfix(2)); bit1=cast(bit, threebits>>dfix(1)); bit0=cast(bit, threebits);
These bit manipulations were not possible without the given type promotion rules.
For hardware implementation, the following operators are present. - Addition and subtraction are implemented on ripple-carry adder/subtractors.
- Multiplication is implemented with a booth multiplier block.
- Casts are hardwired.
- Shifts are either hardwired in case of constant shifts, or else a barrel shifter is used in case of variable shifts.
- Comparisons are implemented with dedicated comparators (in case of constant comparisons), or subtractions (in case of variable comparisons).
- Bitwise operators are implemented by their direct gate equivalent at the bit level.
- Lookup tables are implemented as PLA blocks that are mapped using two-level or multi-level random logic.
- Conditional assignment is done using multiplexers.
- Msbit detection is done using a dedicated msbit-detector.
Globals and Utility Functions for Signals
There are a number of global variables that directly relate to the "_sig" class, as well as the embedded "sig" class. In normal circumstances, you do not need to use these functions.
The variables "glbNumberOf_Sig" and "glbNumberOfSig" contain the number of "_sig" and "sig" that your program has defined. The variable "glbNumberOfReg" contains the number of "sig" that are of the register type. This represents the word-level register count of your design. The "glbSigHashConflicts" contain the number of hash conflicts that are present in the internal signal data structure organization. If this number is more then, say 5% of "glbNumberOf_Sig", then you might consider knocking at OCAPIs complaint counter. The simulation is not bad if you exceed this bound, only it will go slower.
The variable "glbListOfSig" contains a global list of signals in your system. You can go through it by means of
| | | | | sig *run; | | | for (run = glbListOfSig; run; run = run->nextsig( )) | | | { |
For each such a "sig", you can access a number of utility member functions. - "isregister( )" returns 1 when a signal is a register.
- "isconstant( )" returns 1 when a signal is a constant value.
- "isterm( )" returns 1 when you have defined this signal yourself. These are signals which are introduced through "_sig( )" class constructors. OCAPI however also adds signals of its own.
- "getname( )" returns the "char *" name you have used to define the signal.
- "get_showname( )" returns the "char *" name of the signal that is used for code generation. This is equal to the original name, but with a unique suffix appended to it.
The sfg Class
In order to construct a timed (clocked) simulation, signals and signals expressions must be assigned to a signal flowgraph. A signal flowgraph (in the context of OCAPI) is a container that collects all behavior that must be executed during one clock cycle.
The sfg behavior contains - A set of expressions using signals
- A set of inputs and outputs that relate signals to output and input queues
Thus, a signal flowgraph object connects local behavior (the signals) to the system through communications queues. In hardware, the indication of input and output signals also results in ports on your resulting circuit.
A signal flowgraph can be a marker of hardware scope. This is also demonstrated by the following example.
| | | | | _sig a(''a''); | | | _sig b(''b''); | | | _sig c(dfix(2)); | | | dfbfix A(''A''); | | | dfbfix B(''B''); | | | // a signal flowgraph object is created | | | sfg add_two, add_three; | | | // from now on, every signal expression written down will | | | // be included in the signal flowgraph add_two | | | add_two.starts( ); | | | a = b + c; | | | // You must also give a name to add_two, for code | | | // generation | | | add_two << ''add_two''; | | | // also, inputs and ouputs have to be indicated. | | | // you use the input and ouput objects ip and op for this | | | add_two << ip(b, B); | | | add_two << op(a, A); | | | // next expression will be part of add three | | | add_three.starts( ); | | | a = b + dfix(3); | | | add_three << ''add_three''; | | | add_three << ip(b,B); | | | add_three << op(a,A); | | | // you can also to semantical checks on signal flowgraphs | | | add_two.check( ); | | | add_three.check( ); | | | |
The semantical check warns you for the following specification errors: - Your signal flowgraph contains a signal which is not declared as a signal flowgraph input and at the same time, it is not a constant or a register. In other words, your signal flowgraph has a dangling input.
- You have written down a combinatorial loop in your signal flowgraph. Each signal must be ultimately dependent on registered signals, constants, or signal flowgraph inputs. If any other dependency exists, you have written down a combinatorial loop for which hardware synthesis is not possible.
Execution of a Signal Flowgraph
A signal flowgraph defines one clock cycle of behavior. The semantics of a signal flowgraph execution are well defined. - At the start of an execution, all input signals are defined with data fetched from input queues.
- The signal flowgraph output signals are evaluated in a demand driven way. That is, if they are defined by an expression that has signal operands with known values, then the ouput signal is evaluated. Otherwise, the unknown values of the operands are determined first. It is easily seen that this is a recursive process. Signals with known values are: registered signals, constant signals, and signals that have already been calculated in the current execution.
- The execution ends by writing the calculated output values to the output queues.
Signal flowgraph semantics are somewhat related to untimed blocks with firing rules. A signal flowgraph needs one token to be present on each input queue. Only, the firing rule on a signal flowgraph is not implemented. If the token is missing, then the simulation crashes. This is a crude way of warning you that you are about to let your hardware evaluate a nonsense result.
The relation with untimed block firing rules will allow to do a timed simulation which consist partly of signal flowgraph descriptions and partly of untimed basic blocks. The section "Timed simulations will treat this more into detail.
Running a Signal Flowgraph by Hand
A signal flowgraph is only part of a timed description. The control component (an FSM) still needs to be introduced. There can however be situations in which you would like to run a signal flowgraph directly. For instance, in case you have no control component, or if you have not yet developed a control description for it.
The "sfg" member function "run( )" performs the execution of the signal flowgraph as described above. An example is used to demonstrate this.
| | | | | #include "qlib.h" | | | void main( ) | | | { | | | _sig a("a"); | | | _sig b("b"); | | | _sig c(dfix(2)); | | | dfbfix A("A"); | | | dfbfix B("B"); | | | sfg add_two; | | | add two.starts( ); | | | a = b + c; | | | add_two << "add_two"; | | | add_two << ip(b, B); | | | add_two << op(a, A); | | | add_two.check( ); | | | B << dfix(1) << dfix(2); | | | // running silently | | | add_two.eval( ); | | | cout << A.get( ) << "\n"; | | | // running with debug information | | | add_two.eval(cout); | | | cout << A.get( ) << "\n"; | | | add_two.eval(cout); | | | } | | | |
When running this simulation, the following appears on the screen.
| | 4.000000e+00 | | | add_two(Queue Underflow @ get in queue B | | | |
The first line shows the result in the first "eval( )" call. When this call is given an output stream as argument, some additional information is printed during evaluation. For each signal flowgraph, a list of input values is printed. Intermediate signal values are printed after the ":" at the beginning of the line. The output values as they are entered in the ouput queues are printed after the "=>". Finally, the last line shows what happens when "eval( )" is called when no inputs are available on the input queue "B".
For signal flowgraphs with registered signals, you must also control the clock of these signals. An example of an accumulator is given next.
| | | | | #include "qlib.h" | | | void main( ) | | | { | | | clk ck; | | | _sig a("a",ck,dfix(0)); | | | _sig b("b"); | | | dfbfix A("A"); | | | dfbfix B("B"); | | | sfg accu; | | | accu.starts( ); | | | a = a + b; | | | accu << "accu"; | | | accu << ip(b, B); | | | accu << op(a, A); | | | accu.check( ); | | | B << dfix(1) << dfix(2) << dfix(3); | | | while(B.getSize( )) | | | { | | | accu.eval(cout); | | | accu.tick(ck); | | | } |
The simulation is controlled in a while loop that will consume all input values in queue "B". After each run, the clock attached to registered signal "a" is triggered. This is done indirectly through the "sfg" member call "tick( )", that updates all registered signals that have been assigned within the scope of this "sfg". Running this simulation results in the following screen ouput
| | | | | accu | ( | b | 1) | | | | | : | a | 0/ | 1 | | | | => | a | 0/ | 1 | | | accu | ( | b | 2) | | | | : | a | 1/ | 3 | | | | => | a | 1/ | 3 | | | accu | ( | b | 3) | | | | : | a | 3/ | 6 | | | | => | a | 3/ | 6 | | | |
The registered signal "a" has two values: a present value (shown left of "/"), and a next value (shown right of "/"). When the clock ticks, the next value is copied to the present value. At the end of the simulation, registered signal "a" will contain 6 as its present value. The ouput queue "A" however will contain the 3, the "present value" of "a" during the last iteration.
Finally, if you want to include a signal flowgraph in an untimed simulation, you must make shure that you implement a firing rule that guards the sfg evaluation.
An example that incorporates the accumulator into an untimed basic block is the following.
| | | | | #include "qlib.h" | | | class accu : public base | | | { | | | public: | | | accu(char *name, dfbfix &i, dfbfix &o); | | | int run( ); | | | private: | | | dfbfix *ipq; | | | dfbfix *opq; | | | sfg _accu; | | | clk ck; | | | } | | | accu::accu(char *name, dfbfix &i, dfbfix &o) : base(name) | | | { | | | ipq = i.asSource(this); | | | opq = o.asSink(this); | | | _sig a("a",ck,dfix(0)); | | | _sig b("b"); | | | _accu.starts( ); | | | a = a + b; | | | _accu << "accu"; | | | _accu << ip(b, *ipq); | | | _accu << op(a, *opq); | | | _accu.check( ); | | | if (ipq->getSize( ) < 1) | | | return 0; | | | _accu.eval( ); | | | accu.tick(ck); |
In this example, the signal flowgraph _accu is included into the private members of class _accu.
Globals and Utility Functions for Signal Flowgraphs
The global variable "glbNumberOfSfg" contains the number of "sfg" objects that you have constructed in your present OCAPI program. Given an "sfg( )" object, you have also a number of utility member function calls. - "getname( )" returns the "char *" name of the signal flowgraph.
- "merge( )" joins two signal flowgraphs.
- "getisig(int n)" returns a "sig *" that indicates which signal corresponds to input number "i" of the signal flowgraph. If 0 is returned, this input does not exist.
- "getiqueue(int n)" returns the queue ("dfbfix *") assigned to input number "i" of the signal flowgraph. If 0 is returned, then this input does not exist.
- "getosig(int n)" returns a "sig *" that indicates which signal corresponds to output number "i" of the signal flowgraph. If 0 is returned, this output does not exist.
- "getoqueue(int n)" returns the queue ("dfbfix *") assigned to output number "i" of the signal flowgraph. If 0 is returned, then this output does not exist.
You should keep in mind that a signal flowgraph is a data structure. The source code that you have written helps to build this data structure. However, a signal flowgraph is not executed by running your source code. Rather, it is interpreted by OCAPI. You can print this data structure by means of the "cg(ostream)" member call. For example, if you appended accu.cg(cout);
to the "running-an-sfg-by-hand" example, then the following output would be produced:
| | inputs | { b_2 } | | | outputs | { a_1 } | | | code { | Finite State Machines
With the aid of signals and signal flowgraphs, you are able to construct clock-cycle true data processing behavior. On top of this data processing, a control sequencing component can be added. Such a controller allows to execute signal flowgraphs conditionally. The controller is also the anchoring point for true timed system simulation, and for hardware code generation. A signal flowgraph embedded in an untimed block cannot be translated to a hardware processor: you have to describe the control component explicitly.
The ctlfsm and State Classes
The controller model currently embedded in OCAPI is a Mealy-type finite state machine. This type of FSM selects the transition to the next state based on the internal state and the previous output value.
In an OCAPI description, you use a "ctlfsm" object to create such a controller. In addition, you make use of "state" objects to model controller states. The following example shows the use of these objects.
| | | | | #include ''qlib.h'' | | | void main( ) | | | { | | | sfg dummy; | | | dummy << ''dummy''; | | | // create a finite state machine | | | ctlfsm f; | | | // give it a name | | | f << ''theFSM''; | | | // create 2 states for it | | | state rst; | | | state active; | | | // give them a name | | | rst << ''rst''; | | | active << ''active''; | | | // identify rst as the initial state of | | | // ctlfsm f | | | f << deflt(rst); | | | // identify active as a plain state of ctlfsm | | | // f | | | f << active; | | | // create an unconditional transition from | | | // rst to active | | | rst << allways << active; | | | // allways' is a historical typo and will be | | | // replaced by "always" in the future | | | // create an unconditional transition from | | | // active to active, executing the dummy sfg. | | | active << allways << dummy << active; | | | // show what's inside f | | | cout << f; |
There are two states in this fsm, "rst" and "active". Both are inserted in the fsm by means of the "<<" operator. In addition, the "rst" state is identified as the default state of the fsm, by embedding it into the "deflt" object. An fsm is allowed to have one default state. When the fsm is simulated, then the state at the start of the first clock cycle will be "rst". In the hardware implementation, a "reset" pin will be added to the processor that is used to initialize the fsm's state register with this state.
Two transitions are defined. A transition is written according to the template: starting state, conditions, actions, target state, all of this separated by the "<<" operator. The condition "allways" is a default condition that evaluates to true. It is used to model unconditional transitions.
The last line of the example shows a simple operation you can do with an fsm. By relating it to the output stream, the following will appear on the screen when you compile and execute the example.
| | rst [shape=box]; | | | rst->active; | | | active->active; |
This output represent a textual format of the state transition diagram. The format is that of the "dotty" tool, which produces a graphical layout of your state transition diagram.
"dotty" is commercial software available from AT&T.
You cannot simulate a "ctlfsm" object on itself. You must do this indirectly through the "sysgen" object, which is introduced in the section "Timed Simulations".
The cnd Class
Besides the default condition "allways", you can use also boolean expressions of registered signals. The signals need to be registered because we are describing a Mealy-type fsm. You construct conditions through the "cnd" object, as shown in the next example.
| | | | | #include "qlib.h" | | | void main( ) | | | { | | | clk ck; | | | _sig a("a",ck, dfix(0)); | | | _sig b("b",ck, dfix(0)); | | | _sig a_input("a"); | | | _sig b_input("a"); | | | dfbfix A("A"); | | | dfbfix B("B"); | | | sfg some_operation; | | | // some operations go here . . . | | | sfg readcond; | | | readcond.starts( ); | | | a = a_input; | | | b = b_input; | | | readcond << "readcond"; | | | readcond << ip(a_input,A); | | | readcond << ip(binput,B); | | | readcond.check( ); | | | // create a finite state machine | | | ctlfsm f; | | | f << "theFSM"; | | | state rst; | | | state active; | | | state wait; | | | rst | << "rst"; | | | active | << "active"; | | | wait | << "wait"; | | | f << deflt(rst); | | | f << active; | | | f << wait; | | | rst | << allways << readcond << active; | | | active | << _cnd(a) << readcond << some_operation | | | wait | << (_cnd(a) && _cnd(b)) << readcond | | | wait | <<(!_cnd(a) ||!_cnd (b)) <<readcond<< active; | |