User level scheduling of inter-communicating real-time tasks6490611Abstract In a multi-tasking system a writer task generates real-time and non real-time messages having multiple priority levels in an ascending order arranged in a plurality of priority bands. The real-time messages have predetermined timing deadlines. A plurality of queues indexing the messages by pointers, there is one queue for each priority band. A dispatcher moves pointers from a lower priority queue to a higher priority queue in response to a time-out signal dependent on the timing deadline. A data push agent transmits the messages indexed by the pointers according to the multiple priority levels. Claims We claim: Description FIELD OF THE INVENTION
TABLE A
Combined Data Type Deadline Task
SB Sporadic G Issue Command
AN Periodic G Trend Graph
SB Sporadic G Sensor Data
AN Periodic G Video
SB Sporadic NG Background
In the Table A, SB = synchronous blocking, AN = asynchronous non-blocking,
G = guaranteed, NG = no guarantee.
Server-based User Level Scheduling In the following sections, we describe our invention in greater detail. Aspects of our invention include mapping of task requirements into schedulable entities, end-to-end scheduling of tasks with limited number of priorities, and accommodating both real-time and non-real-time tasks in the same system configured with standard off-the-shelf hardware and software components. Mapping of Task Requirements Our mapping of task requirements accommodates the precedence constraints imposed by the communicating tasks. End-to-End Maximum Delay Guarantees FIG. 2 shows the components involved in end-to-end computations and communications. The end-to-end computation and communication entities include the tasks 112 and 122 that write and read data, agents 113 and 123 that send and receive data, and other network interfaces and network transfer protocols 130. Therefore, the specific schedulable items are the write and read tasks, and the DPA and DRA. In general, the user has no control over the network 130. Two end-to-end spans can be identified. The first span 201 is task-to-task (T-to-T), and the second span 202 is memory-to-memory (M-to-M). Associated with the spans are maximum permissible delays, or deadline guarantees. Some of the task activities, such as command-and-control, require T-to-T delay (i.e., deadline) guarantees, whereas others, such as trend graphs, only require M-to-M delay guarantees. To support T-to-T maximum delay guarantees, we schedule all the tasks from task writes to task reads as a precedence constrained task graph. On the other hand, to support M-to-M maximum delays, we schedule task writes and reads independently, while treating all the rest of the tasks as one precedence constrained task set. By definition, synchronous data write operations, as well as blocking read operations impose precedence constraints between the tasks involved, while asynchronous data write, and non-blocking read operations support independent task representation naturally. In particular, a precedence constraint exists between a task write operation and the DPA 113 for synchronous operations, and between the DRA 123 and task read operations for blocking mode. Model Precedence Constrained Tasks with Release Jitter To enable the end-to-end scheduling, for both the T-to-T and M-to-M spans, on any node using only priority-based scheduling support, we model precedence constrained tasks as independent periodic tasks with release jitter as follows. If J.sub.y is the jitter of task entity y, and C.sub.y is the compute time, then in the case of the span T-to-T: J.sub.write =0, J.sub.DPA =C.sub.write, J.sub.DRA =J.sub.DPA +C.sub.DPA +C.sub.ntwk, and J.sub.read =J.sub.DRA +C.sub.DRA. In the case of the span M-to-M: J.sub.write =0, J.sub.read =0, J.sub.DPA =0, J.sub.DRA =C.sub.DPA +C.sub.ntwk. To map task timing requirements into system schedulable components, we take the following approach: Writer and reader tasks specify their maximum permissible delay, synchronous or asynchronous data reception, as well as blocking or non-blocking data retrieval requirements. With reference to Table A, timing requirements are mapped as follows: (1) SB tasks: The period and delay for the writer task are also used for all other tasks over the span T-to-T, and the release jitters of tasks are calculated according to the T-to-T case, as above. (2) AN tasks: Writer and reader tasks have independent periods and delays. Agent tasks DPA and DRA use the period of the reader task, while the delay is determined as: D.sub.DRA =D.sub.read -C.sub.read D.sub.DPA =D.sub.read -C.sub.read -C.sub.DRA -C.sub.ntwk where C is the time delay due to computations. The respective release jitters are determined according to the M-to-M case, as described above. Actual timing metrics can be collected by a profiler (P1 and P2 in FIG. 1) described in greater detail below. Communication Server We use a "communication server" to handle the problems associated with using off-the-shelf components, i.e., processing messages at the priority of the sending/receiving task priority, minimizing priority inversion, and dealing with limited operating system priority levels. A priority inversion can occur when the transmission of a high priority message is blocked while transmitting a very long low priority message. All the priority assignments, according to our invention, fall into four priority bands that are related as follows: P.sup.one >P.sup.two >P.sup.three >P.sup.four, where .sup."one" is the highest priority band, and .sup."four" is the lowest. The manner in which these four priority bands are mapped to specific operating system priority levels depends on the number of priority levels available. The use of these priority bands is described in greater detail below. FIG. 3 shows an asynchronous server 300 according to a preferred embodiment of the invention. The server 300 can be implemented as software programs and data structures on the writer node 110. The server 300 includes a dispatcher 310, a manager 330, and a callback facility 340. The server 300 also includes multiple queues, one for each priority band for example, four queues 311-314. The data queues are allocated in the writer's local reflective memory 111. Entries in the queues can be pointers to the actual messages to be transmitted. In other words, the pointers in the queue entries index the actual data to be transmitted as messages. The entries can be sorted in the queues in an order of priorities associated with the data. This accommodates the problem where the number of priority levels available in the operating system is smaller than the number of priorities required by the applications. Effectively, the communication server 300 acts as a rate controller. The DPA pushes out data via the queues in the order of the priorities. For example, no messages will be pushed via queue 314 as long as there messages in any of the queues 311-313. A flag (F) is associated with each of the queues 311-314 to determine if the queue is empty or not. During operation of the server 300, reader and writer tasks "register" with the server using the manager 330 and requests 301. In other words, tasks "attach" and "detach" with the server 300 using the manager 330. The manager maintains a database 331 storing task specific parameters, such as timing information, and task identities (IDs). The registration process calculates appropriate timer parameters 302 associated with the requests to define maximum permissible delays, as described above. The callback facility uses the timer parameters to activate system level timers that generate periodic promotion time-out signals 303 when a message needs to be pushed out at a high priority. Communication between the callback facility 340 and the dispatcher 310 can use standard operating system messaging primitives. Writer tasks store data in ring buffers 304. The ring buffers are allocated in the reflective memory 111 by the manager 330 in response to requests by the tasks. As described below, data are stored in the ring buffers as fixed size messages 350. Pointers to the messages are stored in the queues 311-314 according to the corresponding priorities P.sup.one -P.sup.four. The DPA normally pushes out real-time data in priority band P.sup.four via queue 314. The system schedules the DPA to push data only when there are no tasks executing at higher priority bands. Non real-time data can be pushed out via queue 313 using standard static round robin scheduling at priority P.sup.three. Real-time data in the highest priority band P.sup.one is always pushed out via queue 311. When a low priority real-time data needs to be "promoted" to a high priority because of an imminent timing deadline, the dispatcher moves (305) the pointer from queue 314 to queue 312. This is done in response to the callback activated promotion time-out signal 303. The separation of the operations of the dispatcher 310 and the DPA 113, minimizes priority inversion at the socket or communication channel level by ensuring that high priority data push requests suffer blocking time due to priority inversion for only the duration of pushing one message of size S.sup.0. Message sizes S.sup.0 are described below. Our main focus is the scheduling of the writer task, the reader task, and the DPA 113. The DRA 123 is scheduled using a simple event driven model, that is, we do not explicitly account for the priority of the received messages. In order to support tasks with end-to-end timing constraints, we ensure that the task priorities, and the message sending priorities are consistent. This way, messages of a high priority task will not be blocked by the transmission of messages of a low priority task for an unpredictable amount of time. As stated above, general purpose operating systems do not support such priority tracking. To alleviate this problem, we have developed our server-based approach for communication over sockets. The server 300 is a message transmission manager. All network communications are handled by the server. The server knows the priorities of the tasks that are requesting message transmission and schedules the DPAs that push data onto the network in such a way that priority tracking is achieved. It is worth noting that many real-time schedulers only schedule with respect to an abstract notion of time, thereby simply acting as "capacity" managers. In contrast, our communication server not only schedules the data transfers, but also actually carries out the communications on behalf of the tasks. This allows our system to alleviate the priority inversion problem for data transfer. Specifically, we provide user level interfaces to specify the timing and data message size requirements of a task. The server handles the socket establishment, and schedules the data transmission according to the specification. Allowing users to specify the requirements of task components in terms of timing, data, and relationships among tasks, such as periods, rates, data size, delay, response bounds, and precedence relations is a much more intuitive and effective proposition than requiring task designers to assign relative priorities to their tasks. The user specified real-time requirements are translated into unique priority levels to be used within the server. This approach also alleviates the problem of limited priority levels supported by a typical operating system. Thus, we assign and manage priorities of tasks explicitly in our server, while providing the users with the ability to specify task timing, message size and precedence, and synchronization constraints. In summary, our server minimizes the effect of priority inversion, handles priority tracking with limited operating system priority levels, and permits user specified timing requirements. Integrating Real-Time and Non-Real-Time Task Activities Our server 300 uses a modified dual priority scheduling procedure to accommodate different real-time and non-real-time task activities. Typical activities to be supported include the following. With command and control activities, users sporadically issue commands and need immediate data transmission, and delay bounded data reception. Video/audio applications require data transmissions with low jitter display. With device and instrument monitoring, data are acquired periodically, but some signals, such as alarms, require immediate attention. Trend graph rely on periodic data collection and transmission, and periodic data display. Most background activities only require best effort data transmission and reception. Some of these tasks have real-time constraints, while others may have no real-time requirements. The real-time tasks demand synchronous (S) or asynchronous (A) data transmission and blocking (B) or non-blocking (N) data reception. In general, the synchronous tasks constitute the more urgent time critical tasks, while the asynchronous ones are periodic in nature. The non-real-time activities can be serviced with best effort. Based on these observations, we have developed the following scheme for mapping tasks into a set of unique priorities, one for each task type, namely, SB real-time, AN real-time, and non-real-time. All SB tasks are assigned to the P.sup.one band, and will have priorities according to a deadline monotonic algorithm. That is, within the P.sup.one band, SB tasks have unique priorities determined by the deadlines of the SB tasks. In many tasks, e.g., plant monitoring, there are only one or two such entities in a system. All AN data with timing constraints are initially assigned to priority P.sup.four band according to the deadline monotonic algorithm, and promoted to P.sup.two when a deadline is imminent. All the non real-time tasks are assigned a priority in the P.sup.three band. These priorities can either be assigned according to their requested execution rate, or randomly when the tasks do not specify any rate. Dual Priority Scheduling Dual priority scheduling executes real-time tasks such that the tasks will not miss their timing constraints while at the same time improving the responsiveness of the system to non real-time tasks. This is in contrast with always giving real-time tasks the higher priority. To this end, each real-time task normally executes with a priority lower than the priorities assigned to non real-time tasks. When the time comes for a real-time task to execute in preference to a non real-time task, so that the real-time task will not miss its deadline, the priority of the real-time task is increased to be above that of the non real-time task. This time, called the priority promotion time, PT.sub.i, for each invocation of a task T.sub.i with deadline D.sub.i, is determined according to the following formulation, where R.sub.i is the response time, w.sub.i is the blocking time, J.sub.i is the release jitter, C is the total compute time, and hp(i) is the set of tasks with priorities higher than T.sub.j : PT.sub.i =D.sub.i -R.sub.i R.sub.i =w.sub.i +J.sub.i w.sub.i.sup.m+1 =C.sub.i +S.sup.0 +.SIGMA..sub.j.epsilon.hp(i).left brkt-top.(w.sub.i.sup.m +J.sub.j)/T.sub.j.right brkt-top.C.sub.j and where the iterations start with w.sub.i.sup.0 =0, and ends when w.sub.i.sup.m+1 =w.sub.i.sup.m, or w.sub.i.sup.m+1 >D.sub.i. Thus, promotion times take into consideration the release jitter J.sub.i described below, and a one-quantum blocking time S.sup.0 that is due to the transmission of uniform sized messages from a lower priority task, also described below. In other words, the server executes at the lowest priority P.sup.four while servicing real-time tasks, until one of two events happens, the arrival of a request to serve a non-real-time task, or the occurrence of the promotion time of a real-time task. Upon the arrival of a request to service a non-real-time task, the priority of the server can be raised to P.sup.three, and stays at P.sup.three as long as there are any non-real-time tasks to service, or until the promotion time of one or more real-time tasks occurs. When the promotion time of one or more real-time tasks occurs, the priority of the server is raised to P.sup.two, or P.sup.one depending on whether the task is an SB task or an AN task. In some sense, the priority P.sup.one can be viewed as lying at the high-end of the P.sup.two band. In summary, the server services all the communication tasks according to their respective periodicity, data size, and priority. It implements the dual priority algorithm such that task data communications are explicitly scheduled within the server. Rate Controlled Uniform Size Message Using the server described above, there is still a possibility of a high priority data being blocked by lower priority data for the duration of the time that it takes to transmit the lower priority data. In the worst case, this blocking time can be quite long where message sizes are not bounded. To limit this blocking time to a small quantum, we transmit data as fixed size messages. This allows transmission with fixed preemption points. This way, high priority data will be blocked no longer than it takes to transmit the fixed size message. The optimal message size for a particular system configuration can be determined empirically by, for example, a profiler. The message size will serve as the non-preemptive unit of computation time and transmission time. Henceforth, the term S.sup.0 can serve as either the unit message size, or the time needed to send and receive a message of this size. All data pushed are organized into messages of unit size S.sup.0. Profiler To service real-time tasks, we need to know the computation time, communication overhead and propagation delay for the scheduler to be effective. Our solution is to use profiling at system configuration time on the specific nodes that our scheduler is used. The profiler includes two tasks that collect timing information as data are transferred in an operational system. A first profiler task (P1 in FIG. 1) determines the end-to-end timing parameters of different entities in the network, for example, the individual tasks and agents. Other profiling data that can be collected include memory-to-memory round trip time, computation time and overhead of asynchronous DPA server scheduler, jitter introduced at each stage of the data path, time to complete read and write calls, time to push the data, and optimal size for messages. A second profiler task (P2) can determine throughput at the reader node. This information can be used to tune memory allocations. In summary, we provided a system that allows both real-time and non-real-time tasks to execute concurrently on standard off-the-shelf hardware and software components. Our system does not require an apriori knowledge of the specific processor speed, or the knowledge of network propagation delay. In particular, we present user-level scheduling schemes for communicating tasks that are practical and are based on simple primitives that can be found in most of today's commonly used operating systems. The scheduling schemes comprise a combination of server-based execution, rate control of message communication, and dual priority scheduling. In addition, it is well known that in order to predictably schedule real-time activities the worst case execution time is required. To obtain these times, we provide a profiler together with our scheduler support to extract the execution time and message delay parameters for the server components. This invention is described using specific terms and examples. It is to be understood that various other adaptations and modifications may be made within the spirit and scope of the invention. Therefore, it is the object of the appended claims to cover all such variations and modifications as come within the true spirit and scope of the invention.
|
Same subclass Same class Consider this |
||||||||||
