System and method for implementing parallel image processing5721883Abstract A system for performing parallel processing of images. The image data to be processed is supplied to multiple arithmetic processors by a data supply control unit. Each of the processors operates on a portion of the data to produce a multiple of partial image processing results, and the partial results are integrated to form a final result. Thereby, allowing image processing to be conducted at an increased speed. Claims What is claimed is: Description TECHNICAL FIELD
______________________________________
.multidot..multidot..multidot..multidot..multidot..multidot..multidot..mul
tidot..multidot..multidot.
vfor(Vinit=0; Vinit<512; Vinit+=8) {
.multidot..multidot..multidot. (1-1-1)
hfor(Hinit=0; Hinit<768; Hinit+=8) {
.multidot..multidot..multidot. (1-1-2)
read (x, 8, 8); .multidot..multidot..multidot. (1-1-3)
.multidot..multidot..multidot..multidot..multidot..multidot..multidot..mul
tidot..multidot..multidot.
tmp=x›0!+.multidot..multidot..multidot..multidot..multidot.
.multidot..multidot..multidot. (1-1-4)
.multidot..multidot..multidot..multidot..multidot..multidot..multidot..mul
tidot..multidot..multidot.
write(y, 8, 8); .multidot..multidot..multidot. (1-1-5)
.multidot..multidot..multidot..multidot..multidot..multidot..multidot..mul
tidot..multidot..multidot.
total=total+tmp; .multidot..multidot..multidot. (1-1-6)
} .multidot..multidot..multidot. (1-1-7)
} .multidot..multidot..multidot. (1-1-8)
.multidot..multidot..multidot..multidot..multidot..multidot..multidot..mul
tidot..multidot..multidot.
vfor(Vinit=0; Vinit<512; Vinit+=8) {
.multidot..multidot..multidot. (1-2-1)
hfor(Hinit=0; Hinit<768; Hinit+=8) {
.multidot..multidot..multidot. (1-2-2)
read(x, 8, 8); .multidot..multidot..multidot. (1-2-3)
.multidot..multidot..multidot..multidot..multidot..multidot..multidot..mul
tidot..multidot..multidot.
tmp=total+.multidot..multidot..multidot..multidot..multidot.
.multidot..multidot..multidot. (1-2-4)
.multidot..multidot..multidot..multidot..multidot..multidot..multidot..mul
tidot..multidot..multidot.
write(y, 8, 8); .multidot..multidot..multidot. (1-2-5)
.multidot..multidot..multidot..multidot..multidot..multidot..multidot..mul
tidot..multidot..multidot.
} .multidot..multidot..multidot. (1-2-6)
} .multidot..multidot..multidot. (1-2-7)
______________________________________
Parameters within the parentheses of the vfor statement and the hfor statement of the source program, i.e., Vinit=0; Vinit<512 and Hinit=0; Hinit<768 represent as shown in FIG. 4(a) that picture consists of 512.times.768 pixels. Vinit+=8 and Hinit+=8 represent that small areas obtained by dividing the picture consist of block of size of 8.times.8 pixels. Moreover, Vinit=0 and Hinit=0 represent as shown in FIG. 4(b) that read processing or write processing from/into small areas is first carried out from the left upper toward the right end of the picture by the hfor statement, and read processing or write processing from/into small area at the left end position corresponding to the next line is then carried out by the vfor statement. For read or write processing of the small area, read statement and write statement are used. The first argument x of the read statement and the write statement is buffer (arrangement) name for storing small area, the second argument is size designation in a horizontal direction, and the third argument is size designation in a vertical direction. When attention is drawn to the vfor statement of the statements (1-1-1), (1-2-1), the hfor statement of the statements (1-1-2), (1-2-2), the read statement of the statements (1-1-3), (1-2-3), and the write statement of the statements (1-1-5), (1-2-5), the host computer 10 can recognize, at step S11, parameters of the small area necessary for generation of execution codes of the data flow control section 2 by analysis of the source program. Namely, the host computer 10 can recognize, by analysis of the source program, sizes in horizontal/vertical directions, the initial position and way of movement of one small area. Namely, the total number of small areas to be processed of the entirety of the picture can be seen from the vfor statement and the hfor statement. Then, at step S12, the host computer 10 prepares information table of the data flow control section 2 on the basis of analysis result at the step S11. Thus, the processing operation proceeds to step S13. At the step S13, the host computer 10 prepares information table relating to processing of the arithmetic processor within the source program. Information table relating to the processing of the arithmetic processor is a table in which information obtained by analyzing, e.g., source program encompassed by key word of syntax of vfor of the statement (1-2-1) or hfor of the statement (1-2-1), etc. in the source program with respect to processing for respective small areas are collected. Then, at step S14, the host computer 10 takes out information of the first arithmetic block. Namely, the host computer 10 reads thereinto information indicating arithmetic processing carried out, with respect to small areas, by arithmetic processor for, e.g., the statement (1-1-4) between the read statement of the statement (1-1-3) and the write statement of the statement (1-1-5), or the statement (1-2-4) between the read statement of the statement (1-2-3) and the write statement of the statement (1-2-5). Here, the arithmetic block is a block which is encompassed by vfor and hfor and describes operation content that the arithmetic processors carry out. Then, at step S15, the host computer 10 examines whether or not the calculation formula for carrying out the integration processing is included in the arithmetic block which has been read in at the step S14. In the case where there is no calculation formula, such as, for example, the statement (1-1-6) to carry out integration processing with respect to the image processing of the small area (No), the processing operation proceeds to step S17. On the other hand, in the case where there is a calculation formula, such as, for example, the statement (1-1-6) (Yes), the processing operation proceeds to step S16. At the step S16, the host computer 10 raises, e.g., integration flag (sets "1") so as to allow the arithmetic processor within the data flow control section 2 to perform operation relating to the calculation formula. Thus, the processing operation proceeds to step S17. It is to be noted that, at step 15, the host computer 10 can also examine presence or absence of distribution of result of integral operation. Namely, in the case where there is no calculation formula, such as, for example, the statement (1-2-4) (No), the processing operation by the host computer 10 proceeds to step S17. On the other hand, in the case where there is a calculation formula such as the statement (1-2-4) (Yes), the processing operation by the host computer 10 may proceed to the step S16 to raise (set) distribution flag at the step S16 to proceed to step S17. At the step S17, the host computer 10 examines whether or not fraction is produced in allowing respective arithmetic processors to correspond to small areas of the picture. Namely, this fraction indicates remainder obtained when the total number of small areas in dividing the picture into small areas cannot be divided by the total number of arithmetic processors. As a matter of course, the number of fraction of the small area means that it is less than the number of arithmetic processors. Then, the processing operation by the host computer 10 proceeds to step S18 when the fraction exists (Yes), and proceeds to step S19 when no fraction exists (No). At the step S18, the host computer 10 sets fraction flag. Thus, the processing operation proceeds to step S19. At the step S19, the host computer 10 judges whether or not analysis processing with respect to all arithmetic blocks has been completed. When the analysis processing with respect to all arithmetic blocks has not been completed (No), the processing operation proceeds to step S19N. At the step S19N, the host computer 10 reads thereinto information of the next arithmetic block to continue analysis. Namely, after the host computer 10 reads thereinto information of the next arithmetic block, the processing operation proceeds to step S15. On the other hand, when analysis processing with respect to all arithmetic blocks has been completed (Yes) at the step S19, the processing operation by the computer 10 proceeds to return to complete this subroutine SUB1. Explanation will now be given in connection with parallel processing relating to processing program including integration processing with respect to, e.g., four pictures indicated below. This processing program has, e.g., an example of format described below.
______________________________________
.multidot..multidot..multidot..multidot..multidot..multidot..multidot..mul
tidot..multidot..multidot.
vfor(Vinit=0; Vinit<512*2; Vinit+=512) {
.multidot..multidot..multidot. (2-1)
hfor(Hinit=0; Hinit<768*2; Hinit+=768) {
.multidot..multidot..multidot. (2-2)
vfor(Vinit=0; Vinit<512; Vinit+=8) {
.multidot..multidot..multidot. (2-3)
hfor(Hinit=0; Hinit<768; Hinit+=8) {
.multidot..multidot..multidot. (2-4)
read(x, 8, 8); .multidot..multidot..multidot. (2-5)
.multidot..multidot..multidot..multidot..multidot..multidot..multidot..mul
tidot..multidot..multidot.
tmp=x›0!+.multidot..multidot..multidot..multidot..multidot.
.multidot..multidot..multidot. (2-6)
.multidot..multidot..multidot..multidot..multidot..multidot..multidot..mul
tidot..multidot..multidot.
write(y, 8, 8); .multidot..multidot..multidot. (2-7)
.multidot..multidot..multidot..multidot..multidot..multidot..multidot..mul
tidot..multidot..multidot.
total=total+tmp; .multidot..multidot..multidot. (2-8)
} .multidot..multidot..multidot. (2-9)
} .multidot..multidot..multidot. (2-10)
} .multidot..multidot..multidot. (2-11)
} .multidot..multidot..multidot. (2-12)
.multidot..multidot..multidot..multidot..multidot..multidot..multidot..mul
tidot..multidot..multidot.
______________________________________
In this processing program, there is indicated the case where when, e.g., four arithmetic processors exist at the arithmetic section 3, image processing is carried out with respect to four pictures each comprised of 512.times.768 pixels as shown in FIG. 6, for example. In the case of such operation where all pictures are used as unit, there are instances where image processing to determine, e.g., density mean of all the pictures, etc. is carried out. In order to carry out such image processing on the basis of the model corresponding to the previously described architecture, the host computer 10 carries out, at subroutine SUB 2, discrimination between picture division processing including integration processing analyzed at the subroutine SUB 1 and picture parallel processing to carry out, in parallel, operation of picture (pictorial image) by corresponding processors every one frame unit. For example, as shown in FIG. 5, at step S20 of the flowchart of the subroutine SUB2, the host computer 10 takes out information of the first arithmetic block. Thus, the processing operation proceeds to step S21. At the step S21, the host computer 10 carries out, by presence or absence of integration processing flag, discrimination as to whether or not there is the integration processing. When there is an integration processing as indicated by the statement (2-8), for example, since the integration processing flag should be raised (Yes), the processing operation by the host computer 10 proceeds to step S22. When there is no integration processing (No), the processing operation proceeds to step S25. At the step S22, the host computer 10 carries out discrimination as to whether or not the number of pictures to be processed is equal to 1. When the number of pictures to be processed is 1 (Yes), the processing operation by the host computer 10 proceeds to the step S25. When the number of pictures to be processed is 2 or more (No), i.e., when a program as indicated by the statement (2-2) to the statement (2-5), for example, the processing operation proceeds to step S23. At the step S23, the host computer 10 carries out discrimination as to whether or not the number of pictures to be processed can be divided by the number of arithmetic processors of the arithmetic section 3. When the number of pictures to be processed can be divided by the number of arithmetic processors (Yes), the processing operation by the host computer 10 proceeds to step S24. When the number of pictures to be processed cannot be divided by the number of arithmetic processors of the arithmetic section 3 (No), its processing operation proceeds to step S25. At the step S24, the host computer 10 sets picture parallel processing flag. Thus, the processing operation proceeds to the step S25. At the step S25, the host computer 10 judges whether or not analysis with respect to all arithmetic blocks has been completed. When the analysis of the arithmetic processing method has not been completed with respect to all the arithmetic blocks (No), the processing operation proceeds to step S26. At the step S26, the host computer 10 reads thereinto information of the next arithmetic block to continue analysis. Namely, after the host computer 10 reads thereinto information of the next arithmetic block, the processing operation proceeds to step S22. On the other hand, when analysis processing with respect to all arithmetic blocks has been completed at the step S25 (Yes), the processing operation by the host computer 10 proceeds to return. Thus, the subroutine SUB2 is completed. By carrying out discrimination between the picture parallel processing and the picture division processing in this way, measure is taken such that model based on, e.g., the previously described architecture is not broken when execution code generation at the subsequent subroutine SUB3 with respect to the program is carried out. Compiling corresponding to the analysis result and the processing control of the source program carried out in this way is carried out in accordance with the flowchart of the subroutine SUB3 shown in FIG. 7. Namely, at the step S30, the host computer 10 carries out discrimination as to whether or not source program delivered for implementation of compiling is program for data flow control section 2. In the case where the source program is program for data flow control section 2 (Yes), the processing operation by the host computer 10 proceeds to step S31. In the case where the source program is not program for data flow control section 2 (No), it is considered as program for arithmetic processor. Thus, the processing operation proceeds to step S40. At the step S31, the host computer 10 discriminates as to whether or not the picture parallel processing flag is raised. When the picture parallel processing flag is raised (Yes), the host computer 10 judges that it is indicated that the number of pictures has been divisible by the number of arithmetic processors. Thus, the processing operation proceeds to step S32. When the picture parallel processing flag is not raised (No), the processing operation proceeds to step S33. At the step S32, the host computer 10 delivers all data of one picture to corresponding one arithmetic processor to generate execution codes of the data flow control section 2 which carries out, in parallel, image processing with respect to a plurality of pictures. Thereafter, the processing operation proceeds to step S43. The processing of this step is a processing for taking, e.g., density mean value of one picture, or the like as more practical image processing. Thus, as execution codes generated with respect to the data flow control section 2, e.g., such execution codes to send data of one picture necessary for operation with respect to one arithmetic processor of the arithmetic section 3 are generated. At step S33, the host computer 10 carries out judgment as to whether or not integration flag is raised. When the integration flag is not raised (No), the processing operation proceeds to step S37. When the integration flag is raised (Yes), the processing operation proceeds to step S34. At the step S34, the host computer 10 carries out judgment as to whether or not fraction flag is raised. When the fraction flag is raised (Yes), the processing operation proceeds to step S35. When no fraction flag is raised (No), the processing operation proceeds to step S36. At the step S35, the host computer 10 generates execution codes of the data flow control section 2 which carries out picture division processing including integration processing and fraction processing. Thus, the processing operation proceeds to step S43. This picture division processing includes a processing for sending, as numeric values, data such as result obtained by performing an operation of integral processing, or the like to respective arithmetic processors of the arithmetic section 3 as occasion demands, whereby the respective arithmetic processors perform an operation on the basis of numeric values delivered thereto. Namely, execution codes of the picture division processing are generated from a source program adapted to repeatedly carry out a control such that the processor of the data flow control section 2 reads out data of small areas corresponding to the number of arithmetic processors of the arithmetic section 3 from the shared memory within the input/output section 5 to deliver them into the distributed memories within respective arithmetic processors to collect the respective proceeded data into the shared memory within the input/output section 5 after operations of respective arithmetic processors have been completed. Moreover, execution code of fraction processing is an execution code such that the number of processing repeatedly carried out with respect to picture data of the remainder area is increased by one so as to include conditional statement to inhibit write access with respect to the shared memory of the input/output section 5 of data processing result from the portion above the total number of small areas. Namely, the fraction processing means a processing such that the data flow control section 2 writes only operation results with respect to all small areas into the shared memory of the input/output section 5, and does not write operation results except for the above into the shared memory of the input/output section 5. Then, at step S36, the host computer 10 generates execution codes of the data flow control section 2 which carries out picture division processing including integration processing. Thus, the processing operation proceeds to step S43. On the other hand, when the processing operation proceeds to step S37 by judgment (No) of the step S33, the host computer 10 carries out, also at this step S37, similarly to the step S34, judgment as to whether or not fraction flag is raised. When the fraction flag is raised (Yes), the processing operation by the host computer 10 proceeds to step S38. When no fraction flag is raised (No), the processing operation proceeds to Step S39. At the step S38, the host computer 10 generates execution codes of the data flow control section 2 which carries out picture division processing including fraction processing. Thus, the processing procedure proceeds to step S43. At the step S39, the host computer 10 generates, in an ordinary manner, execution codes of the data flow control section 2. Thus, the processing operation proceeds to the step S43. On the other hand, at the step S40, the host computer 10 carries out judgment as to whether or not integration processing flag is raised. When the integration processing flag is raised (Yes), the processing operation by the host computer 10 proceeds to step S41. When no integration processing flag is raised (No), the processing operation proceeds to the step S42. At the step S41, the host computer 10 generates execution codes of arithmetic processors of the data flow control section 2 and the arithmetic section 3 including integration processing. Thus, the processing operation proceeds to step S43. At the step S42, the host computer 10 generates, in an ordinary manner, execution codes of arithmetic processors of the arithmetic section 3. Thus, the processing operation proceeds to the step S43. At the step S43, the host computer 10 carries out judgment as to whether or not generation of execution codes with respect to arithmetic processors of the data flow control section 2 and the arithmetic section 3 corresponding to all arithmetic blocks has been completed. When compiling has not yet been completed (No), the processing operation by the host computer 10 returns to the step S30 to repeat the above-described procedure. On the other hand, when compiling has been completed (Yes), the processing operation by the host computer 10 proceeds to return thus to complete the subroutine SUB3 which carries out this execution code generating processing. As stated above, there is employed an approach to carry out switching of processing in dependency upon the number of pictures delivered and the number of arithmetic processors to carry out arithmetic processing of units of small areas or one frame in respective arithmetic processors of the arithmetic section 3, and to carry out integral arithmetic processing by the arithmetic processor of the data flow control section 2 to deliver, as occasion demands, the integral arithmetic processing result to respective arithmetic processors of the arithmetic section 3 to thereby permit respective arithmetic processors to devote themselves to image processing with respect to the given area, thus making it possible to limit the kind of execution codes generated by automatically selecting a higher efficiency method to one without destroying the model which satisfies the above-described architecture. Thus, generation quantity of execution codes generated can be held down as minimum as possible, and optimization of program can be also carried out. Moreover, an approach is employed to detect identifier indicating parallel processing in the source program to make distinction between processing of the data flow control section 2 and that of the arithmetic section 3 on the basis of the identifier, and to generate execution codes with respect to a plurality of arithmetic processors of the arithmetic section 3 to thereby permit the program developer to write program without being conscious of sharing of role to respective arithmetic processors, thus making it possible to shorten time of program development. Thus, the development cost can be held down. Further, an approach is employed to carry out the above-described fraction processing in correspondence with fraction (remainder) determined with respect to the number of arithmetic processors of the number of divided small areas, thereby making it possible to carry out efficient processing with the execution code quantity being held down to much degree as compared to the conventional execution code quantity. In the image processing method, an approach is employed to undergo procedure to collect operation results at arithmetic processors of which roles are shared in dependency upon the number of divided pictures to carry out integration processing thereof to thereby permit generation quantity of execution codes in the image processing method to be reduced to much degree so that efficiency of image processing can be increased. Thus, development of short term can be made, and the development cost can be held down. Moreover, in the image processing method, an approach is employed to examine the number of pictures to be processed and the number of arithmetic processors in accordance with a processing executed, such as, for example, processing of density mean, etc. to select a processing method to generate execution codes, thereby permitting processing speed to be higher than that of the conventional image processing. Thus, e.g., also in the application program, image processing is carried out without destroying the model which satisfies architecture required as one of parallel processing, thus permitting processing speed to be higher than that of the conventional image processing. In addition, the image processing apparatus is adapted to carry out, at respective arithmetic processors of the arithmetic section, arithmetic processing of data that the data flow control section has delivered to carry out integral operation by using operation results of respective arithmetic processors collected at the data flow control section to distribute it for a second time to the respective arithmetic processors as occasion demands to thereby allow respective arithmetic processors of the arithmetic section to devote themselves to processing for calculation of given small areas to implement image processing by parallel processing to delivered picture signals, thus making it possible to carry out image processing higher than the prior art. Industrial Applicability In this invention, in compiling program prepared by using program language to allow a plurality of arithmetic processors to carry out, in parallel, image processing to generate execution codes, identification and analysis of control statement to which identifier indicating parallel processing in the image processing is attached are carried out from the prepared program, thus to generate execution codes of parallel processing to respective arithmetic processors corresponding to the analysis result and the number of plural arithmetic processors. Then, respective arithmetic processors carry out image processing on the basis of the generated execution codes to allow operation results of the respective arithmetic processors to undergo integration processing, whereby the program developer can describe (write) program without being conscious of sharing of role to respective processors. As a result, the development time of program can be shortened. Thus, burden on the program developer can be lessened, so the development cost can be held down to lower level.
|
Same subclass Same class Consider this |
||||||||||
