Process for coordinating the course of cooperating processes which access a common object6487579Abstract A coordinator object that contains access patterns indicating the sequences of accesses of the cooperating processes to the object is allocated to the common object. The coordinator object controls the accesses to the object requested by the processes with the assistance of the access patterns. The change from one access type to another access type ensues dependent on a switch condition that can be set. The switch condition is met when all accesses of a group of processes with respect to an access type or when a predetermined plurality of specific accesses have been implemented. The goal of the method is to coordinate the parallel processing of processes in a multi-processor system both given common address space as well as given distributed address space. Claims What is claimed is: Description BACKGROUND OF THE INVENTION
int vsize = v.size(); // the length of the vector
float sum = 0; // for calculation of the scalar product
for(int i = 1; i <= vsize; i++)
sum+= v[i] * v[i]; // calculate scalar product
float norm = sqrt(sum); // calculate norm by taking the
square root
for( i = 1; i <= vsize; i++) // all components of v
v[i] /= norm; // are divided by the norm
This program can be parallelized in that the vector is subdivided into strips and each strip is processed by a different process. Each process calculates a part of the scalar product; the partial sums are then added (in common). As soon as this value is available (i.e. when each process has attached its partial sum), each process can carry out the norm calculation and division (actually, the calculation of the norm only has to occur once, which is not done here for the sake of simplicity). A very frequently occurring access pattern derives: For calculation of the overall scalar product, a coordinator object is declared that is written exactly once by each process before it is allowed to be read. This likewise occurs exactly once per process but is not critical here since the coordinator object is not written again here after the reading. Here, the unit of the parallel execution is the function `normalize`. This has the lower and upper boundary of the sub-vector that is to be handled as well as a reference to the entire vector as parameters. The function is called by the participating processes with respectively suitable lower and upper indices. The programming of the process generation and work distribution is not co-indicated here because it is not significant for the understanding of the programming with coordinator objects.
coordinator<float> dotProd
(write(each(allProcs,1)),
read(each(allProcs,1)));
void normalize(int myLower, int myUpper, vector& v) {
float partial = 0;
for 9intj = myLower; j <= myUpper; j++)
partial += v[j] * v[j]; // calculation of scalar
sub-product
collectiveAdd(dotProd,partial); // cumulative addition
to coordinator
objects
float norm = sqrt(dotProd); // each process reads
the coordinator object
for (j = myLower; j <= myUpper; j++)
v[1] = v[1] / norm; // norming by division
return;
}
This example shows how a controlled access to `dotprod` occurs without any and all programming of synchronization or communication. To this end, the user-defined function `collectiveAdd` is specified as a write operation with respect to its first argument--as collective write operation to be more precise. Collective operations are explained in greater detail in the following section. The global switch condition that triggers the transition into the read status and thus allows reading by the function `sqrt` is only valid when all processes have written their part by calling `collectiveAdd`. The following transition from the read status into the write status has no practical effects here because the program fragment is ended. If it was executed within a loop, the switching from the read status into the write status would likewise take effect. 7. Collective Write Operations Collective operations (also partly called reduction operations) are particularly utilized in the field of scientific computing (see, among others, A. Geist, A. Beguelin, J. Dongarra, H. Jiang, R. Manchek, V. Sunderam: PVM 3.0 Users Guide and Reference Manual, Oak Ridge National Laboratory, 1993; The MPI initiative, netlib@ornl.gov, 1994; R. Butler, E. Lusk, Monitors, messages, and clusters: The p4 parallel programming system, Parallel Computing 20, 1994; D. Culler, SPLIT C, //http.cs.berkeley.edu/.about.culler, 1996 and often allow a significantly simplified notation of algorithms when the value of an object must be repeatedly initialized. Coordinator objects offer support for collective operations in a very simple way by an automatic re-initialization at the respective beginning of a write phase. To this end, a coordinator object implementation must make a generic function `collective` available to the user that makes it possible to also defined user-defined collective operations--the function `collectiveAdd` used in the above example--, for example in the following way:
float collectiveAdd (coordinator<float> & result, const float
arg) {
return collective(operator+,result,0,arg);
}
The signature of the function `collective` is established by:
template <class T> T & collective {
T (*f) (T,T),
coordinator<t> & result,
T initValue,
Targ);
The semantics of `collective` can be described in the following way: A call of `collective` implements a memory access to the argument `result`. When a call of `collective` is the first call within a write phase, then the new value of the coordinator `result` derives as result of the function call `f(initValue,arg)`. In following calls, the new value of the coordinator `result` derives as result of the function call `f(result,arg)`. With respect to `arg`, `collective` is defined as read access; no coordinators dare be handed over as arguments `f` and `initValue`. 8. Semantics and Implementation of Coordinator Objects Access patterns define a parallelism behavior that can be referred to best as an automatic coordination. This behavior is described in greater detail below. First, the semantics of coordinator objects is presented on the basis of a central contol program for a coordinator object. This represents a minimum demand of every implementation of coordinator objects. Implementation versions are discussed subsequently. 8.1 A Central Control Program for a Coordinator Object A central control program for a coordinator object that defines the semantics of the coordinator object is presented below. An instance of this program exists conceptually for each coordinator object. Implementations are allowed to combine the tasks of a plurality of such instances in a program or to distribute the tasks of the control program to a plurality of cooperating programs. In systems with a shared memory, the communication with the control program described below can also occur without explicit message exchange when it is implemented in some other way. However, the logical execution--except for time non-determinisms--should remain the same due to such changes. 8.2 Event-Controlled Functioning For introduction into the functioning of the control program, the various possible accesses of a process to a coordinator object are described in the following sections on the basis of flowcharts. These show the communication with the control program (control program) respectively occurring in conjunction with an access. The messages to the control program are also referred to as events below. First, we shall show the basic pattern for the communication connected with an access, FIG. 4. The process being considered sends a request (requireAccess) to the control program and waits for the request confirmation of the control program before it continues its program execution. From the point of view of the process, this execution thus behaves like a `remote procedure call (rpc)`. The control program registers the request as an event. The event is possibly delayed when it cannot be immediately processed in the momentary status of the coordinator object. To this end, the control program administers the events as four different event sets that are described in greater detail later. When an event is finally selected for processing, then the control program initially executes a few event-related activities, what is referred to as the prologue. Given a distributed memory, the prologue contains the offering of the coordinator data at the requesting process, among other things. Subsequently, the request confirmation (rpcFinish) is sent to the requesting process, so that the latter can continue. When the following code of the process contains accesses to the coordinator data, then a termination message (accessDone) to the control program must ensue after the end of these accesses so that it can implement the terminating event-related activities--the epilogue. The requesting process need not wait for the conclusion of the epilogue since every further access to the coordinator object presumes a new request to the control program. Prologue and epilogue are not explicitly shown in the other diagrams (FIGS. 5-8). By way of example, time-meshed accesses of a further process (concurrent process) are shown that influence, i.e. delay, the executive sequence behavior of the process under consideration (process of interest). 8.2.1 Reading or, Respectively, Writing Access The diagram, FIG. 5, shows a reading access (requireAccess(read)), but is analogously valid for writing accesses (requireAccess(write)). The accessing process is only allowed to access when it has received the confirmation (rpcFinish) of its access request from the control program. It is assumed in the diagram that the read accesses are exclusive, for which reason a wait (1) is implemented for the end of the already started, competing access before the confirmation. The control program must be informed of the end of the access (accessDone), as a result whereof potentially delayed events proceed to processing (2). 8.2.2 The procReadyWith Instruction The `procReadyWith` instructions signals the coordinator object that the calling process has ended its part in a write or, respectively, read phase. It is only meaningful and allowed when an `each( . . . )` access pattern was declared for the corresponding phase. Given an access pattern `each(group, numberOfAccesses)`, the process is potentially ended for the process before reaching the declared number of accesses `numberOfAccesses`, i.e. all further accesses are delayed until the next time the appertaining phase is reached. Since the procReadyWith instruction does not involve an access to the coordinator object by the requesting process, the communication protocol is simpler here: The `accessDone` message to the control program can be omitted. The exemplary diagram, FIG. 6, shows the communication execution of a `procReadyWith` instruction for the read phase. The calling process must be delayed at least until the coordinator object is in the status to which the instruction refers--this is the read status in the example that is reached (1) here by a `procReadyWith` instruction of the competing process. Such a waiting situation can occur, for example, when, following the end of its write phase, the process does not implement a single read access to the coordinator object in the following read phase but immediately ends its read phase with the `procReadyWith` instruction. A following write access is delayed until the coordinator object again switches (2) into the write status, see FIG. 6. 8.2.3 The SwitchFrom Instruction By contrast to the `procReadyWith` instruction, the `switchFrom` instruction is a global status change signal. The status of the coordinator object changes after the end of all accesses already begun. The instruction is allowed for all access patterns; given access patterns with counter, the status also changes when the declared number of accesses has not yet been reached in the momentary phase. The diagram, FIG. 7, shows the communication execution of a `switchFrom` instruction for the read phase. A following write access is delayed until the momentary read access of the other process has ended, since the status of the coordinator object does not change (1) until this point in time. A delay by the control program is also possible given the `switchFrom` instruction, and the confirmation of the control program is therefore required, for example when a process that accesses the coordinator object exclusively in reading fashion successively implements two `switchFrom` instructions without having implemented a read access in the interim. 8.2.4 The Release Instruction The `release` instruction ends the coordination of the data supervised by the coordinator object. These data are made exclusively available to the calling process (as return value of the `release` instruction). The process must therefore wait until the phase indicated in the instruction has been ended--in the example, this occurs by ending the read access of the competing process (1). The conclusion of a `release` instruction is the last allowed action of the control program, i.e. every further access of an arbitrary process to the coordinator object is viewed as an error, FIG. 8. 8.3 Event Processing by the Control Program The above accesses (events) can alternate arbitrarily within a process (with the exception, of course, of the `release` instruction). The control program will delay the process at every event as long as it is allowed according to the access pattern specified for the coordinator object. In order to enable this, the control program administers the events arriving from the various processes in four different event sets, which are described below. The term "sets" (and not "waiting queues") is intentionally used here since it is assumed that the processing sequence of existing events is arbitrary. 8.3.1 The Set of Active Events This set contains all events not yet processed that are pending for processing at the moment. These are either "new" events (that were never yet considered by the control program) or events that, due to processing of an event from one of the waiting sets listed below, were transmitted anew into the set of active events. The control program takes a respective event from the set of active events by calling the procedure `nextEvent`. The procedure waits given an empty set, potentially until a process generates an event. The method according to which the event to be taken is selected given a plurality of existing events is left up to the implementation. A possible prioritizing of the taking is as follows: 1. `accessDone` events in order to enable further exclusive accesses or a status change; 2. `switchFrom` events since this signals that a phase is ended overall; 3. `procReadyWith` events since a process has met its sub-task--a phase change can be potentially accelerated; 4. `requestaccess` events--given a distributed memory, only those of processes that already have the coordinator data locally present--as a result whereof unnecessary data exchange can be avoided; 5. other `requestAccess` events; 6. `release` event. 8.3.2 Set of Events that can be Processed in the Next Write Phase All events for which the control program determines that they can be processed (at the earliest) in the next write phase are accepted into this waiting set by a call `delayForPhase(write,event)`. Given the next transition of the coordinator object into the write status, all events collected in this set are transferred back into the set of active events by calling `enableEventsDelayedForPhase(write)`. 8.3.3 Set of Events that can be Processed in the Next Read Phase Analogous to the set just cited, there is the waiting set for the read phase. Events are added by `delayForPhase(read,event)` and, given transition into the read status, are transferred back into the set of active events by `enableEventsDelayedForPhase(read)`. 8.3.4 Set of Events that Require Exclusive Coordinator Access In addition to the events that must be delayed until a following phase, there is a set of `requireAccess` events that can still be processed in the momentary phase but only after an exclusive access or another process that has already begun has ended. The procedure `delayForExclusiveAccess(event)` serves for adding an event to this set. After the end of an exclusive access, the events collected in this set since the beginning of the access are re-activated by `enableEventsDelayedForExclusive`. 8.4 Variables that Describe the Status of the Coordinator Object The control program works with a number of variables that describe the momentary status of the coordinator object. These variables are listed here with their initial values: pendingAccesses=0; Number of processes that are allowed access to the coordinator object at the moment. The value of the variable must be 0 at a phase change. When accesses of a phase are not free of conflict, `pendingAccess` is maximally allowed to have the value 1. globalReady=false; This is a flag that indicates whether the status of the coordinator object changes (true) after the conclusion of accesses already begun or whether further events must be processed (false) before the change. globalcount=0; In case of `arbitrary(numberOfAccesses)` access patterns, this variable serves as global counter of the accesses in a phase. procReady[ ]={false, . . . ,false}; In case of `each( . . . )` access patterns, this table of flags indicates for every process of the appertaining process group whether the respective process has ended its write or, respectively, read phase. procCount[ ]={0, . . . ,0}; In case of `each(group,numberOfAccesses)` access patterns, this table contains the process-related access counters. releaseByProcess=.o slashed.; This variable is set when processing a `release` event and then indicates the process to which the data supervised by the coordinator object are made available given termination of the momentary phase. notWritten=true; This flag serves for supporting what are referred to as collective operations. At the beginning of a write phase, the variable respectively receives the value `false` in order to inform the first write within the phase that it is the first. An automatic re-initialization of the coordinator value in every write phase is enabled as a result thereof. 8.5 Notation The operational semantics definition is provided in the form of pseudo-code that is based on C or, respectively, C++ notation. However, the semantics of some additionally employed constructs must still be explained: forall id in group: block executes `block` for every process of the process group `group` once; the execution, however, is not continued for further processes when the control flow (for example, due to return) leaves the `block`. In the `block`, the variable `id` is linked to the respectively processed process. match (object) { fits pattern: block fits pattern or pattern: block . . . } implements a pattern comparison (pattern matching) with respect to `object`. A component part of a pattern can be constant values (identified by quotations (for example, `EACH`), that must then correspond to the corresponding value in `object` (for example, EACH) or, on the other hand, variables to which the corresponding values of the predetermined pattern `object` are linked. The appertaining `block` is executed for the first successful comparison. Such a pattern comparison is, among other things, implemented with the access pattern of the coordinator objects valid for the respective access phase `phase` that is respectively supplied by the function `accessSpec(phase)`. The possible patterns are: (EACH,processGroup,maxCount) (EACH,processGroup) (ARBITRARY,maxCount) (ARBITRARY) The values `processGroup` and `maxCount` are thereby linked to the variables indicated in the matching pattern in order to be able to access them within the `block`. In the program context, an event is interpreted as a structured data type that must have the following components available to it: eventSpec the type of event, i.e. one of the values `requireAccess`, `accessDone`, `procReadyWith`, `switchFrom` or `release`; process the accessing process, i.e. the process that generated the event; phase the phase for which the event is intended, i.e. either `write` or `read`; 8.6 The Control Program The meaning of some of the functions/procedures employed were already explained. These and a few other procedures are not specified by pseudo-code. They are either self-explanatory due to their naming or are yet to be addressed in the following discussion of the implementation. All functions/procedures for which a pseudo-code specification exists are emphasized by bold face given their employment.
mainLoop {
while (true) { // endless loop write-read cycle
initForPhase(write); // set initial status for write phase
enableEventsDelayedForPhase(write);
doPhase(write); // implement the write phase
initForPhase(read); // set initial status for read phase
enableEventsDelayedForPhase(read);
doPhase(read); // implement read phase
}
};
initForPhase(phase) { // produce the initial status of the coordinator
object for the
phase to be implemented (write or read)
match(accessSpec(phase)) {
fits (`EACH`, processGroup,maxCount) : {
globalReady = false;
forall process in processGroup: {
procReady[process] = false;
procCount[process]= 0;
}
}
fits (`EACH`,processGroup): {
globalReady = false;
forall process in processGroup: {
procready[process] = false;
}
}
fits (`ARBITRARY`,maxCount): {
globalReady = false;
globalCount = 0;
}
fits (`ARBITRARY`): {
globalready = false;
}
}
if (phase == write) {
notWritten = true; // is used for collective write
operations
}};
doPhase(phase) {
// processing of events within a write/read
phase
while(!checkSwitchCondition(phase)) {
// process the next existing or arriving
event
event
ProcessEvent(phase,nextEvent());
}
if (releaseByProcess != .o slashed.) {
// enable data of the coordinator object and end the control
program
if (distributed MemoryImplementation( )) {
// potentially copy the data (from the last accessor) to the
process
provideDataFor(releaseByProcess);
}
rpcFinished(releaseByProcess);
exit;
}};
boolean checkSwithCOndition(phase) {
// check whether all conditions for ending the current phase
are given
if (pendingAccesses > 0) // ongoing accesses ended first
return false;
// test global switch condition
else if (globalReady) {
return true; // switchFrom event or global maxCount reached
}
// test local switch conditions (procReady or maxCount
reached?)
else {
match(accessSpec(phase)) {
fits (`EACH`,processGroup,maxCount)
or (`EACH`,processGroup) : {
forall process in processGroup: {
if (!procReady[process])
// local switch condition not met for a
process
return false;
}
return true; // local switch condition met for all
processes
}
fits (`ARBITRARY`,maxCount)
or (`ARBITRARY`) : {
// there are no local switch conditions
return false;
}
}
}
};
processEvent(phase,event) {
// processing an event. This is either used, i.e. the request
of a process
// is granted and the process is potentially informed of the
grant
// or the processing of the event is delayed and processed anew
later
// by processEvent
match (event.eventSpec) {
fits `requireAccess`: {
if (!checkDelay(Phase.event)) {
accessProlog(event.process,phase);
// Reactivate process for access to
coordinator data.
if (phase == write) {
// Flag for collective write operations must be
observed
rpcFinish(event.process, notWritten);
notWritten = false;
}
else {
rpcFinish(event.process);
}
}
}
fits `accessDone`: {
// The end of an access is never delayed since
accesses
// are always ended in the phase in which they were
begun
// (checkSwitch Condition waits for this)
accessEpilog(event.process,phase);
// Process does not wait for confirmation.
}
fits `procReadyWith`: {
if (!checkDelay(phase,event)) {
setProcessReady(event.process,phase);
// Process is allowed to continue after processing of the
event.
rpcFinish (event.process);
}
}
fits `switchFrom`: {
if (!checkDelay(phase,event)) {
// set the global switch condition
globalReady = true;
// Process is allowed to continue after processing of the
event.
rpcFinish (event.process);
}
}
fits `release`: {
if (!checkDelay(phase,event)) {
// set release flag
releaseByProcess = event.process;
// Process is allowed to continue only after the end of
the phase.
}
}
}
;
setProcessReady(process,phase) {
// Reaction to `procReadyWith` event; potentially set the
process-local
// switch condition
match(accessSpec(phase)) {
fits (`EACH`, processGroup, maxCount)
or (`EACH`,processGroup) : {
procReady[process] = true;
}
fits (`ARBITRARY`,maxCount)
or (`ARBITRARY`) : {
error("`procReadyWith` not allowed for `arbitrary` patterns");
}
}
};
accessEpilog(process,phase) {
// necessary administration tasks given end of an
access
pendingAcceses-;
if (exclusiveAccessRequired(phase)) {
// coord was blocked because the accesses of this phase
// are not free of conflict (see
checkDelayForExclusiveAccess)
// all events waiting for data access are re-activated
// (implementations can, of course, optimize)
enableEventDelayedForExclusiveAccess();
}
56 ;
accessProlog(process,phase) {
// administration tasks necessary before an
access
if (distributedMemoryImplementation()) {
// potential copying of the data (from the last accessor) to
the process
provideDataFor(process);
}
// The counters must be incremented before the access so
that
// so that further accesses are not erroneously allowed
// before the end of an access.
pendingAccesses++;
updateCounters(process,phase);
// the asking (waiting) process can now be allowed to
access the
// effective data (this occurs by rpcFinish())
}
;
updateCounters(process,phase) {
// log an access, i.e. potentially increment the global
or
// process-local counter, and check whether the global
// or process-focal switch condition is met due to the
access
match(accessSpec(phase)) {
fits (`EACH`,processGroup,maxCount) : {
procCount[process]++; // increment access count for current
process
if (procCount[process] == maxCount) {
procReady[process] = true; // set local switch
condition
}
}
fits (`ARBITRARY`,maxCount) : {
globalCount++; // increment global access count
if (globalCount == maxCount) {
globalReady = true; // set global switch
condition
}
}
fits (`EACH`, processGroup)
or (`ARBITRARY`) : {
// there is no counter; i.e. nothing to do,
either
}
}
};
boolean checkDelay(phase,event) {
// check whether there is a reason for the delay and
implement
// the delay, potentially by entry into one of three
waiting sets.
if (checkDelayForOtherPhase(phase.event)) {
// request is not intended for the current
phase
return true;
}
else if (checkDelayForExclusiveAccess(Phase.event)) {
// Exclusive access is not possible at the
moment
return true;
}
else {
// no delay of the event is necessary
return false;
}
}
;
boolean checkDelayForOtherPhase(phase event) {
// Check whether the event can be processed in the current
access
// phase of the coordinator object s [sic]; if not, then the
event is placed
// into the waiting queue for events of the other phase, i.e.
the
// the processing of this event is attempted again at the next
phase change.
if (event.phase != phase) {
delay ForPhase(event.phase,event);
return true;
}
else {
return false;
}
}
;
boolean checkDelayForNextSamePhase(phase,event) {
// Check whether the processing is still allowed in this phase;
if not,
// a wait must be carried out until the next but one (same)
phase, i.e.
// the event is put into the waiting queue for events of the
current
// phase whose processing is attempted again in the phase
change
// after the next.
// This is also true for `procreadyWith` and `switchFrom`
events, i.e. the
// the user must take note that only one `switchFrom` event
occurs per
// phase or one `procReadyWith` event occurs per process.
match(accessSpec(phase)) {
fits (`EACH`,processGroup,maxCount)
or (`EACH`,processGroup): {
if (!(member(event.process,processGroup))) {
error ("process is not allowed to access the coordinator");
}
if (globalready .vertline..vertline.
procReady[event.process]) {
delayForPhase(phase,event);
return true;
}
else {
return false;
}
}
/ . . .
// . . .
fits (`ARBITRARY`,maxCount)
or (`ARBITRARY`) : {
if (globalReady) {
delayForPhase(phase.event);
return true;
}
else {
8.7 Architecture-Dependent Implementation of Coordinator Objects Every implementation must first realize the communication of the processes accessing the coordinator object with the control program, including the blocking of the processes waiting for an answer. In architectures with shared memory, this will typically occur by synchronized accesses to a shared memory area as well as the respectively offered means for process control (locks, signals, etc.); by message exchange with blocking reception routines for the waiting processes in architectures with distributed memory. In architectures with distributed memory, a coordinator object implementation must, however, additionally see to it that the processes--as soon as they have received the permission from the control program to access the coordinator data--respectively encounter a consistent copy of these data in their local memory. This task is expressed in the above semantic specification by calls of the function `provideDataFor` and contains the exchange of messages that, in addition to containing internal coordinator status information, particularly contain the current value of the data object visible for the user. A coordinator object implementation thus assures an automatic data migration between the processes. The exact semantics of such a migration must be observed, particularly given complex user-defined data types that contain pointers or references to other objects: Which of the components of a structured data type are considered in the transmission to another process? Does the recipient of a message interpret these data in a way that is consistent with the interpretation of the sender? Is the property of an object that two sub-components are pointers or references to one and the same object also met after a migration of this object? The first two points can generally not be decided by the system alone; rather, user support is necessary in order to be able to correctly implement the conversion of user data into the internal exchange format as well as the reverse conversion at the recipient side. This procedure is often referred to as data marshalling and must be supported by the system insofar as a correct conversion of the basic data types is possible. The conversion of the basic data types then forms the basis for the specification of conversion routines for user data types (regarding the topic of data marshalling, see, among others, Open Software Foundation: DCE Application Development Guide. The property addressed in the third point is also referred to as intra-object sharing and is a property that must be assured by the coordinator object implementation by itself. Intra-object sharing, including the implementation aspects connected therewith, is discussed in M. Herlihy, B. Liskov: A Value Transmission Method for Abstract Data Types, ACM Transactions on Programming Languages and Systems Oct. 4, 1982. The entire administration of the status of the coordinator objects occurs by the system. In the distributed case, this means that coordinator object status information is kept consistent between different process spaces. The independent processes always behave as though the coordinator objects lay in a (virtual) shared memory; except for possible semantic deviations due to the (user-defined!) data marshalling, the user sees no difference between the two models. Appendix A: A More Extensive Example A method for the iterative solution of linear equation systems, the conjugated-gradient method (cg-method) is considered. The following C++ program sequentially implements the cg-method in an obvious way. Proceeding from an initial approximation, the vector x is modified in the body of the function until the equation A*x=b (A and b predetermined) is solved adequately exactly. The operations utilized are vector-vector, matrix-vector and vector-scalar operations. In particular, scalar products are frequently calculated.
void cgm(const matrix& A, const vector& b, vector& x) {
vector p = b - (A * x);
vector r = p
int i;
for ( i = 1; i <= A.widt( ) ;i++) {
if((p * p) < limit) return; // compute pp
double respective = r * r; // compute rr1
vectorAp = A * p; // compute Ap
double a = respective / ( p * Ap); // compute pAp
x += (p * a); // compute x
r -= (Ap *= a); // compute r
double bk = (r * r) / respective; // compute rr2
p = r + (p * bk);
};
return;
)
This program can be parallelized in that the required vectors and the matrix (logical) are cut into strips that are handled by different processes. Each of the processes implements the function cgm on its own data area. Individual results, however, are required by all processes, particularly the scalar products. In order to be able to access this in controlled fashion, these are declared as coordinator objects. The same is true of the vector fullP that is written and read by all processes. The "strips" are modelled as vector sections and matrix sections. Sections are administration structures that contain an upper boundary and a lower boundary of a sub-vector or of a sub-matrix as well as the complete vector and/or the complete matrix itself. For understanding the following parallel algorithm, it is meaningful to initially ignore the coordination. The parallelization idea (the parallel strips or, respectively, sections) should be in the foreground. The above-recited sequential algorithm is only modified insofar as some operations become two-stage: first, a strip is operated upon, then the strips are handled in common. (collectiveAdd).
void cgm(const matrixSection& A, const vectorSection& b,
vectorSection& x, // sub-vector that is modified
vector& fullX, // the entire vector is additionally read
Group& workers)
{
vectorSection localP = b - (A * fullX);
vectorSection r = localP;
// Coordinator declarations: (ignore at first read)
coordinator <double> pp (write(each(workers,1)),
read(each(workers,1)));
coordinator <double> respective (write(each(workers,1)),
read(each(workers,2)));
coordinator <double> pAp (write(each(workers,1)),
read(each(workers,1)));
coordinator <double> bk (write(each(workers,1)),
read(each(workers)));
coordinator <vector>
fullP(write(each(workers,1)),read(each(workers,1)));
for(int i = 1; i <= A.width( );i++) {
double localPp = localP * localP;
collectiveAdd(pp,localPp)
if (pp < limit) return
double localRr = r * r;
collectiveAdd(rr,localRr);
fullP.assemble(localP);
vector Ap = A * fullP;
double localPAp = (p * Ap);
collectiveAdd(pAp,localPAp);
double a = rr / pAp;
x += (p * a);
r -= (Ap *= a);
double localBk = (r * r);
collectiveAdd(bk,localBk);
p = r + ( p * bk/rr);
}
return;
}
Comments The method `assemble` copies sub-vectors (the sections localP) into the target vector `fullP` component-by-component. The function `colectiveAdd` has already been described. This algorithm assumes that the section x is sent back at the end of the processing. This occurs automatically by programming models such as, for example, the workpool described in [KnRe96b] but also in DCE[DCE]. It should be noted that rr is read twice per iteration. This is reflected in the coordinator function. This algorithm contains a great deal of coordination. However, as long as the coordinator objects are ignored, not much of this is visible. This demonstrates the advantage of coordinator objects: parallel algorithms can be developed and coordination can merely be made runnable later by the definition of the correct access patterns. The software development process is thereby critically improved. Nonetheless, the programs can then be efficiently run on both architecture types (shared memory and distributed memory). Although other modifications and changes may be suggested by those skilled in the art, it is the intention of the inventors to embody within the patent warranted hereon all changes and modifications as reasonably and properly come within the scope of their contribution to the art. BIBLIOGRAPHY [BaStTa] H. Bal, J. Steiner, A. Tanenbaum: Programming Languages for Distributed Computing Systems ACM Computing Surveys, September 1989, [BHJLC] Distribution and Abstract Types in Emerald IEEE Transactions on Software Engineering SE 13, Jan. 1, 1987 [DCE] Open Software Foundation: DCE Application Dvelopment Guide [HeLi] M. Herlihy, B. Liskov: A Value Transmission Method for Abstract Data Types, ACM Transactions on Programming Languages and Systems Oct. 4, 1982 [mpi] The MPI initiative netlib@ornl.gov, 1994 [NiLo] Bill Nitzberg, Virginia Lo: Distributed Shared Memory: A Survey of issues and Algorithms IEEE COMPUTER, vol. 24, no. Aug. 8, 1991 [PVM] A. Geist, A. Beguelin, J. Dongarra, H. Jiang, R. Manchek, V. Sunderam: PVM 3.0 Users Guide and Reference Manual Oak Ridge National Laboratory, 1993 [P4] R. Butler, E. Lusk Monitors, messages, and clusters: The p4 parallel programming system, Parallel Computing 20, 1994 [SPLIT-C] D. Culler SPLIT C, //http.cs.berkeley.edu/.about.culler, 1996
|
Same subclass Same class Consider this |
||||||||||
