Archival document image processing and printing system5187750Abstract A document processing, archival storage and printout system such as for handling customer checking accounts. Original checks/documents are processed into digital image data then stored temporarily in magnetic media and transferred to optical long-term archival storage. The system retrieves and accumulates monthly groups of digital image data, then sorts one-day's worth (1/22 of 22 business-day accumulation) by account number so that printing means can print statements, each day, covering 1/22 of the total accounts existing for that month. Massive amounts of data can be accumulated and stored, for example, for 500,000 to 1,000,000 customer accounts, while the system operates to rapidly retrieve and printout sufficient customer statements each day so that each of the many customers will still receive a personal monthly updated statement during the appropriate month. Claims What is claimed is: Description FIELD OF THE INVENTION
TABLE I
__________________________________________________________________________
STATEMENT
CYCLE PRINT
NO. ACCOUNT RANGE
CYCLE DATE
DATE
__________________________________________________________________________
1 1-18,000
1/1-2/1 2/1
2 18,001-36,000
1/2-2/2 2/2 Cycle #2
3 36,001-36,000
1/3-2/3 2/3 Account Numbers
4 54,001-72,000
1/4-2/4 2/4 18,001-36,000
5 72,001-90,000
1/5-2/5 2/5 (See Table IV)
. . .
. . .
. . .
22 382,001-400,000
1/28-2/28
2/28
__________________________________________________________________________
Thus during the 22 cycles, each cycle involves the printout of 18,000 account statements on a given day of the month so that, for example, on February 1, the system prints out 18,000 statements; on February 2, it prints out 18,000 statements, and so on, until February 28 it prints out the final group of 18,000 statements which cover the month-of-January transactions. Under these assumptions, it would require on-line storage for 30 days in the jukebox 50.sub.j in order to complete the statement printing for a single month for each and every one of the customer accounts involved. The jukebox 50.sub.j has optical platters 50.sub.P which are handled by storage drive 52 and retrieval drive 54. It is assumed that as the checks are written in the captured order, the image data could be found in any one of the optical platters 50.sub.p of FIG. 2. It is further assumed that each optical platter of 50.sub.p has a capacity of 10 gigabytes, that is to say, this is 1,000,000,000 bytes (1 billion) or 10.sup.9. In the worst-case configuration noted in FIG. 2, the image storage of the check document data are placed in the optical platters 50.sub.p according to the sequence that they are captured, that is to say in the captured order. For retrieval purposes, the images for any given account range (see Table I) are retrieved from the on-line storage (jukebox 50.sub.j) and then transferred to the print server 44 by means of the archive server 30. The work flow in this system can be better understood in reference to FIG. 3 and FIG. 6B. RETRIEVAL ACTIVITY ANALYSIS: The goal is to retrieve all the data for any particular cycle from the jukebox 50.sub.j and transfer it to the print server 44. As an example, it may be helpful to look at cycle 1 (Table I) with the account range: 1 to 18,000, to observe the sequential functions. STEP 1: Loading of platters and transferring image data in binary digits to the print server 44, which is done as follows: (a1) Access the first platter (50.sub.p to platter, FIG. 2) of the month involved by loading it into the optical driver 54, FIG. 2; (b1) Transfer all the data (by addressing the index file) for the range of account (1-18K) for that day to the magnetic disk storage (30.sub.d of FIG. 2) of the archive server 30; (c1) Transfer all of the data to the print server 44 and store it in the magnetic disk 44.sub.d of FIG. 2; (d1) Load the next platter (50.sub.p) of the optical disk jukebox 50.sub.j, FIG. 2; scan the account range (1-18K), and transfer the check image binary data to the print server 44 via the archive server 30; (e1) Continue the process until all of the first (1-18K) account range has been scanned and transferred to print server 44. At this stage, it is seen that all the data for the account range (1-18,000) for cycle 1 is in the magnetic storage 44.sub.d of the print server 44. The amount of data stored in the print server 44 for "customer accounts only" would be as follows: (18,000 accounts.times.17 KB (average image size, front only).times.20 (number of checks)=6.12 gigabytes (customer) Then, for 12,500 "commercial accounts", each having a 25-KB image size: (12,500.times.25 KB (average image size front only).times.20 (number of checks)=6.25 gigabytes (commercial) The first step as illustrated in FIG's. 2 and 3, involved the loading of the optical platters and the transfer of data to the print server 44. STEP 2: The second step, or step 2, involves the sorting of data and the printing of statements. This is done by print server 44 as follows: (a2) Sort all of the index file data (done via Archive Server 30) by the account number and by sorting the checks in sequential order. This is accomplished by the "sort algorithm" discussed hereinafter. At this juncture, it is necessary to copy the "Master Print Index File" from the archival server 30 into the print server 44; (b2) After the customer account numbers are sorted, they will require decompression by the Print Server 44 before they can be printed in full copy, since the image capture unit 8.sub.i originally compressed the image data; (c2) Then the task of printing is distributed by the software in Print Server 44 to the available printers involved. The total estimated magnetic storage in disk 44.sub.d that might be required for the print server 44 to print customer, commercial, and reconciliation accounts on the very same day are indicated as follows in Table II:
TABLE II
______________________________________
Type of Account Storage Required
______________________________________
Customer 6.12 GB
Commercial 6.25 GB
Reconciliation 0.50 GB
Print Index File
0.80 GB
______________________________________
The Print Index File requires 100 bytes for each "index record" for storage for a period of 30 days. The total maximum magnetic disk storage required may be estimated at 6.25 GB+0.80 GB+overhead=7.05 GB. Added overhead would be required for the Print Server 44. In regard to doing a performance analysis wherein the "front" and the "back" of a document are in one file, the following assumptions are made: 1. The "ON-US" statement printing is done during the night. 2. The window time for printing is 10 to 12 hours. 3. Printing is done in off-business hours and in a batch mode. 4. The images of the "front" side of the check are printed in the actual statements. 5. The printer speed is set at 90 pages per minute (PPM). 6. The time required to change platters via optical retrieval drive 54 is 6 seconds. 7. The capacity of each platter 50.sub.p is 10 gigabytes (GB). 8. The "front" and the "back" images are stored together in the same files in the captured order. The front images are only retrieved for printing purposes while the back images are skipped. Now, using as an example, a CYGNET 1800 Jukebox with a Hitachi drive, the following projections can be made. The CYGNET 1800 Jukebox is manufactured by CYGNET SYSTEMS, INC. of Sunnyvale, Calif., whose address is 601 West California Avenue, Sunnyvale, Calif. 94086. The Table III hereinbelow indicates the various factors involved in the first (worst case) configuration regarding the CYGNET 1800 Jukebox and the forthcoming higher capacity jukebox drive.
TABLE III
______________________________________
FORTHCOMING
Specifications
CYGNET 1800 CYGNET DRIVE
______________________________________
Media capacity
2.6 GB/Platter
10 GB/Platter
Average Seek Time
200 ms 50 ms
Average Latency
50 ms 25 ms
Transfer Rate
440 KB/sec 800 KB/sec
Seat & Spinup Time
4.5 sec 1.5 sec
Spindown Time
3.5 sec 1.5 sec
______________________________________
Using the above-mentioned assumptions, an analysis can be made which would indicate that the total storage required for 22 days (a banking month) would come to 436.36 GB. The total "ON-US" items storage for one month would be 60 percent of 436.36 GB and this would come to 261.81 GB. Thus, the total number of optical platters required for "two months storage" would be approximately 46, which is to say that using a 0.95 effectiveness factor times required storage of 436.36 GB, times 10 would equal 45.89 or approximately 46 optical platters needed for two months. Thus using platters each holding 10 GB, then 46 platters would provide a total storage of 460 GB which could handle the required 436.36 GB for the 22-day bank month. It is estimated that the effective usage for each platter is at the level of 95 percent. The system operates such that it would always be accessing 28 platters for every given cycle of monthly statement printing. It has been estimated that the average document (customer account) image size (front and back) comes to 27.5 KB. Likewise, the estimate for the average image size for statement printing of the "front" image (customer account) only would be 17 KB. The total number of images stored on one platter would be 345,454, while the total number of images stored on "one side only" of a platter would be 172,727. The total number of "ON-US" images on one platter would be estimated at 207,272, while the "average" ON-US images/cycle/platter would come to 9,421. Thus the average ON-US images/cycle/one-side of platter would come to 4,710. The ratio of the total number of images/images-retrieved per platter would come to 36.6 and this means that approximately one in every 37 images will be retrieved. Now, in order to "retrieve" one image (on an average basis), it is necessary to move 37 images, to wait for latency, and to read the desired image from the optical platter. This requires approximately 10 milliseconds,+25 milliseconds+(17.times.1000/800) milliseconds. To retrieve images from one side of a platter (4,710 images), it is necessary to drive the platter, seat the platter and use spin-up time+spin-down time+read time which comes to an estimate of 1.5 seconds+1.5 seconds+0.05625.times.4710 which equals 267.93 seconds. Thus dividing 267.93 by 4710 results in a time of 0.0568 seconds per image on an average basis. Thus, it would take 56.8 milliseconds per image to retrieve an image from one side of a platter. Thus, in order to print a daily cycle of 18,000 customer account statements, the print time required would be 18,000 (customer accounts).times.20 (checks per account).times.0.0568 seconds (image retrieval) which comes to 20,448 seconds or 5.6 hours. Likewise, for "commercial accounts", the estimated time to retrieve one image would come to 31.25 milliseconds and to retrieve images from one side of the platter (4710 images) 1,516 seconds and which is equal to 0.421 hours, or approximately one-half hour. It is assumed that there is no additional time required to transfer all image data from the archive server 30 over to the print server 44 as the images are "transferred" to the print server 44 in the 5.6 hour "retrieval window" which is also used for the archive server. In this situation for the "reconciliation accounts", the time to "sort" 18,000 accounts covering the prior month, could occupy from a few minutes to approximately 0.5 hour. PRINT PERFORMANCE ANALYSIS: After retrieval, the system then functions to execute the printout cycle. Here the following assumptions are made: P1: Compressed check-sized images will be printed. P2: Each printout page can hold up to eight images. P3: Printing speed will operate at a speed of 90 PPM. P4: The printer is assumed to operate on an 80 percent duty cycle. It will be noted from the previous analysis, the average "retrieval time" per image for customer checks would be 56.8 milliseconds (ms). Thus the "retrieval time" for a total of 18,000 accounts, each having 20 checks within them, would come to 18.times.20.times.0.0568 which comes to 5.6 hours. SITUATION 1: FOR PRINT OUTS AVERAGING THREE PAGES PER ACCOUNT AND INCLUDING TEXTS The following analysis would occur under this first situation where the total number of pages required to be printed comes to 54,000 pages. Since there are 18,000 accounts handled per daily cycle multiplied by three pages for each account, this comes to 54,000 pages. The time required to print 54,000 pages in only "one" printer would come to 12.5 hours, that is to say, 54 divided by (90.times.60.times.0.8 duty cycle) equals 12.5 hours. The time required to print 54,000 pages with two printers would be 6.25 hours, and the time required to print 54,000 pages with three pages would come to 4.16 hours; while using four printers, this would come to 3.125 hours. P3: Printing speed will operate at a speed of 90 PPM. P4: The printer is assumed to operate on an 80 percent duty cycle. It will be noted from the previous analysis, the average "retrieval time" per image for customer checks would be 56.8 milliseconds (ms). Thus the "retrieval time" for a total of 18,000 accounts, each having 20 checks within them, would come to 18.times.20.times.0.0568 which comes to 5.6 hours. SITUATION 1: FOR PRINT OUTS AVERAGING THREE PAGES PER ACCOUNT AND INCLUDING TEXTS The following analysis would occur under this first situation where the total number of pages required to be printed comes to 54,000 pages. Since there are 18,000 accounts handled per daily cycle multiplied by three pages for each account, this comes to 54,000 pages. The time required to print 54,000 pages in only "one" printer would come to 12.5 hours, that is to say, 54 divided by (90.times.60.times.0.8 duty cycle) equals 12.5 hours. The time required to print 54,000 pages with two printers would be 6.25 hours, and the time required to print 54,000 pages with three pages would come to 4.16 hours; while using four printers, this would come to 3.125 hours. Thus, by combining the 5.6 hours required for "retrieving" 18,000 accounts with 20 checks each, plus the 12.5 hours required to "print" 54,000 pages on one printer, plus the one-half hour (0.5) required to sort 18,000 accounts, this would lead to the total clock hours for the complete print cycle to be 18.6 hours. This would complete one cycle of Table I so that 18,000 statements would be completed on February first. For commercial accounts where the total number of pages requiring printing would come to 37,500 pages, that is to say, 12,500 accounts by three pages each, the calculated total print cycle time would come to 8.6 hours which is 37,500 divided by (90.times.60.times.0.8). Likewise, for reconciliation functions where the total number of pages would be 3,000 pages, or 1,000 items.times.3 pages, the total print cycle time would come to 41.6 minutes. SITUATION CASE 2: TOTAL AVERAGE OF FOUR PAGES PRINTED PER ACCOUNT INCLUDING TEXT In this situation, the total number of pages to be printed in ON-US customer statements would be 72,000 pages and with the use of one printer, this would take 16.6 hours. With two printers, it would be 8.33 hours; and with 3 printers, this would involve 5.53 hours, while with four printers, this would only take 4.15 hours. Here the total clock hours required for printing (on one printer) would be 22.7 hours which would mean the use of 5.6 hours for retrieving 18,000 accounts with 20 checks each plus 16.6 hours which is the time required to print 72,000 pages on one printer, plus one-half hour (0.5) which is the time to sort 18,000 accounts. Likewise, using four pages of printing in commercial account statements, then for "commercial" accounts, the total number of pages required would be 50,000 pages which is 12,500.times.4 and the total print time would come to 11.57 hours. Likewise for the reconciliation account function, then for the retrieval of 1,000 documents printed on 4 pages, this would come to 4,000 pages and the total print time would be 55.55 minutes. Referring to FIG. 3, it will be seen that the check image archives are kept in the optical jukebox 50.sub.j and the check images are stored in their capture order. For one statement cycle (which covers 22-banking days), all of the check images are transferred to the print server 44 from the jukebox 50.sub.j by means of the archive server 30. The print server 44 gets a "list" of items to be printed from the host processor 6 which is also called the demand deposit account host or DDA host 6. The print server 44 retrieves check images for any statement cycle by scanning each of the platters 50.sub.p in the optical jukebox 50.sub.j for a 30-day period. Then as seen in Block 44.sub.s of FIG. 3, the sorting of check images is done according to their sequential "account number". Data is provided to print the statements required for the given printing cycle. Thus, in a typical, medium-size, modern bank, there can be provided a daily print cycle which retrieves and prints statements (of data from the past month of 22 banking days) on a daily basis to print some 18,000 account statements per day. Thus, over a 30-day month of 22 work days (bank work days), the system would be capable of printing out 396,000 account statements, or more. Referring to FIG. 3, the "ICPS" is the Image Check Processing System providing software for various capabilities. The SRM images 10.sub.i are available for reading and sorting. Likewise, the images in the SRM can be accessed for amount entry, for image data correction, and for balancing accounts. The embodiment of the storage/retrieval and print out system of the present disclosure makes use of a sorting algorithm which is graphically represented in FIG's 6A, 6B, 6C and 7. The sort algorithm involves the following steps: A. Creating an index file: This involves the following steps: (a) Create a "Print Index file" in disk 44.sub.d to keep important information about the checks which have been processed. This is placed in the magnetic disk 44.sub.d of FIG. 2. This is done by using the extraction method of the IPS, (image processing system) using the IDS disk 7. The index file in disk 44.sub.d will have fields such as: Date (of capture of document into the system); Account Number; Check Number; Amount of Check. (The capture date is placed on the original document via magnetic ink encoding.) (b) Copy the modified Print Index file from disk 44.sub.d to the Archive Server 30. Then add two more fields to that particular account file which will correlate the (i) platter number and (ii) record number. These can be received as a return value after writing the document data into the optical platter in the Jukebox 50.sub.j. This file is designated as the "Master Print Index File" (lower half FIG. 6A). (c) Build this file up to a capacity of 30 days by adding daily extractions to the original index file. In this system, it is contemplated to use only 30 days of indexing information (one monthly statement cycle) for statement printing. A brand-new print index file may be created after 30 days to be used for the next month's statement printing cycle. A total of 12 Master Print Index Files would be created for the yearly period. Table IV, shown hereinbelow, indicates the appearance of the Master Print Index file from a complete cycle. Table IV, shown hereinbelow, indicates the appearance of the Master Print Index file form a complete cycle.
TABLE IV
______________________________________
Merged Master Print Index File
Capture Platter Account Check Record
Date Number Number Number Number Amount
______________________________________
3/14 1 0001 5 1 20.00
3/14 1 0002 7 2 22.00
. . . . .
. . . . .
. . . . .
3/14 1 10,000 6 312,500
220.00
3/14 2 12,000 1 1 180.00
3/14 2 13,000 3 2 182.00
. . . . .
. . . . .
. . . . .
3/14 2 18,000 9 200,000
190.00
3/15 2 18,001 6 200,001
650.00
3/15 2 5,000 2 312,500
630.00
3/15 3 30,000 19 1 250.00
3/15 3 00,001 7 2 170.00
3/15 3 0001 7 9 25.00
. . . . .
. . . . .
. . . . .
3/15 3 32,000 18 312,500
176.00
3/15 4 33,000 17 1 177.00
3/15 4 00,001 19 2 195.00
.
.
3/16 4 0001 9 7 35.00
______________________________________
(d) Before any printing occurs, the complete index file with 30-days of information data will be transferred to the print server 44. B. Sorting the index file: The next sequence of steps for "sorting" the index file (by account number) operates as follows: (a) The index file is now sorted (via the software in server 44) by its "account number" before any digital image data will be transferred from the archive server 30 to the print server 44. After the index file has been sorted, the newly "sorted" Master Index File in server 44 will show the account information in the order as seen in Table V.
TABLE V
______________________________________
"SORTED" MASTER INDEX FILE
Account Check Platter Record
Number Number Number Number
______________________________________
1 1 1 10
1 2 2 20
. . . .
. . . .
. . . .
1 20 27 300,000
2 1 3 100
2 2 7 40,000
. . . .
. . . .
. . . .
2 19 24 200,000
2 20 26 250,000
______________________________________
The "Sorted Master Index File" on the print server 44 will follow the pattern as shown in Table V since it is transferred from the archive server 30 to the print server 44. The digital image data retrieved from the jukebox 50.sub.j, according to the platter sequence, is transferred to the print server 44. The print server 44 (with its magnetic disk buffer 44.sub.d) has a large file in the same platter sequence order just as is set up in the jukebox 50.sub.j. The large file 44.sub.d in the magnetic media of the print server 44 may take an appearance similar to the FIG. 7 upper left block. As seen in FIG. 7 in the upper left block, there are a series of account numbers such as A/C-1, CK 5, which indicates that this record is the fifth check in the Account 1. As another example, the 18,000th account is designated as A/C-18,000 and the nineteenth check is designated as CK-19. It will be noted that the various account numbers and check records are allocated to various "logical platters" so that the upper most group is stored in the logical platters 1 through 5, and similarly the lower groups of data are stored in logical platters 21-42 which corresponds to areas of the jukebox 50.sub.j. The lower portion of FIG. 7 shows the logical sequence of data according to the account number, the check number, the logical platter number, and the record number on the platter, which would then correlate with the amount of the check and the date of the check. The logical platter number in FIG. 7 is software information which relates the "physical" platter location to the check image data in magnetic disk 44.sub.d. This is the "sorted index file" for only one print cycle of 18,000 customer accounts. By using the Master Print Index File, this sorted index file can be created for every single print cycle. In FIG. 6C, the digital image data retrieved from the jukebox 50.sub.j by platter sequence is transferred to the print server 44. The magnetic disk buffer 44.sub.d of the print server 44 has a large file in the platter sequence order just as is done in the jukebox 50.sub.j. When a record is searched by platter number, a "pointer" (via software in print server 44) will be moved to the appropriate area of the magnetic disk buffer file 44.sub.d in order to access that particular record. The magnetic buffer 44.sub.d of print server 44 (FIG. 2) will contain check images (digital data) as indicated in FIG. 6C where the Sorted Print Index File shows the optical platter sequence for each account number. However, due to the situation indicated in FIG. 6E, the Sorted Print Index File of FIG. 6C is burdened with many areas of "blank disk space". In order to eliminate the time-consuming factors that this would entail, it is desirable to develop a Final Print Index File, such as indicated in FIG. 6F and FIG. 6D, which eliminates the blank disk space involved. This is accomplished by software in the Print Server 44 which operates to eliminate the blank areas (compression of data), and in so doing, replaces the "old record number" with a "new record number" as indicated in FIG. 6D. There is no time penalty involved in this conversion since this compression of data from the Sorted Print Index File to the Final Print Index File is executed during the same time period as the retrieval of data is occurring from the optical juke box 50.sub.j to the magnetic disk 44.sub.d of the print server 44. FIG. 6B indicates the step of the Compression Routine, done via software whereby blank record areas are eliminated so that the old record number in FIG. 6D is replaced with a "new" record number to form a "Final Print Index File" as shown below in Table V-A.
TABLE V-A
______________________________________
Account Check Record
Number Number Number
______________________________________
1 1 1
1 2 4
. . .
. . .
. . .
2 1 3
2 2 2
. . .
. . .
. . .
2 20 5
______________________________________
The record numbers are new and reassigned for each check number and account number. While the archive server disk 30.sub.d requires 0.80 GB to hold 30 days of data in the Master Print Index File, the print server disk 44.sub.d only requires 1.8 MB (Table IV) for holding "one day's worth" of Master Print Index File data, to enable one day's work of statement printing. It may be noted that the platter numbers shown in FIG. 7 have already been mapped to the magnetic disk 44.sub.d (FIG. 5) of the print server 44, which data is shown in FIG. 7 upper left block. (f) At this stage, the digital image data for the given monthly print cycle is now available in the magnetic buffer 44.sub.d of the print server 44. (g) The next step is the reading of the "correct record" as printing gets started. The record access is directly accomplished by the use of pointers (FIG. 6C) operated through software in the print server 44. The time to search one record will be approximately 15 milliseconds and the time to search all 18,000 accounts will be approximately 90 minutes. This can be seen as 18,000 accounts.times.20 checks (average) per account which gives us 360,000 records Then the 360,000 records.times.15 milliseconds results in 5,400 seconds, or 90 minutes, for searching all 18,000 accounts. Since the printing time, using four printers, requires approximately 8 to 10 hours, it is possible to search records for image data and print at the same time (except at the very beginning of the printing process). It is necessary to pre-process and keep the print file ready before it is possible to go ahead with the printing. After a few minutes of pre-processing, it is possible to continue to do both the pre-processing and printing at the same time, i.e., the searching and accessing of image data records can occur concurrently with printing operations. The effective time required for sorting is considered to be approximately 30 minutes. FIG. 6A, in the upper portion, shows the initial print index file which is extracted from the image processing system database in IDS 7. This is done on a daily basis wherein the corresponding platter and record number are correlated. Thus, as seen in the upper part of FIG. 6A, a correlation link is started between a given capture date, the account number, the check number which goes with the account number, and the amount of money involved. This is correlated with a platter number such as Platters 1 and 2 for the "date" of 3/14 (FIG. 6A). Then additionally, each platter number is "associated" with a given "record number" to indicate the location of that digital image check data. This is done by software in archive server 30. For example, Platter 1 has the first record at position 1 and the second record at position 2 (FIG. 6A). Then Platter 2 has the image data for check 3 of account number 13,000, located on Platter 2 at record position 2. The lower portion of FIG. 6A shows a duplicate of the information arranged similarly to the upper portion of FIG. 6A, but the two files are "merged" to Archive Server 30 (FIG. 6B) to create one Master Print Index File. The top two files of FIG. 6A are created daily and merged to create the Master Print Index File which is then placed in disk 44.sub.d. These data are maintained throughout the monthly statement cycle. Subsequently, this master file can be deleted after one monthly statement cycle has been printed; and then a new one may be created. In the overall summary, check documents are imaged into digital image data and placed in the Storage/Retrieval Modules 10 (SRM), then transferred to optical platters 50.sub.p in the jukebox 50.sub.j via Archive Server 30. The system creates an "Initial Print Index File" (FIG. B) from IDS disk 7 (FIG. 1A) with detailed checking data in digital form. The Archive Server 30 "merges" its Archive Index (of each dated digital image entry with its location of platter number and record number) with the Initial Print Index File to develop the "Merged Master Print Index File" in the Archive Subsystem 50. The Archive Server 30 and Archive Disk 30.sub.d accumulate data in the Merged Master Print Index File to cover a 30-day monthly period (of 22 business banking days). The Archive Server 30 extracts and copies one day's worth of index records, of its accumulated 30 days of data, into the Magnetic Disk 44.sub.d of the Print Subsystem 42 (FIG. 1B). The Print Server 44 "sorts" (FIG. 6C) the account numbers in the Merged Master Print Index File (holding one day's worth of the 30 days of data) to develop a set of sequential account numbers for each date of the 30-day cycle together with correlated data indicating (for each check document) the location of the platter and record number of image data now residing in disk 44.sub.d (which was copied from the jukebox 50.sub.j. Then the Print Subsystem 42 prints out N statements each day where N is equal to 1/22 of the total number of customer accounts. FIG. 4 indicates a second alternative configuration which uses a specialized drive called the LMSI drive together with the jukebox 50.sub.j. The LMSI 6100 Series may include ten drives and 50 platters which involves one drive and five platters per unit. The LMSI is a unit manufactured by Laser Magnetic Storage International (LMSI) Company located in at 2914 East Katella Avenue, Orange, Calif. 92667. In the second alternative configuration of FIG. 4, the jukebox 50.sub.j showing 120 optical platters is used for archival storage when it is necessary to retain and maintain data for over 30 days. The LMSI 6100 Series drives and platters, designated as 57 and 58 of FIG. 4, are used for the storage and retrieval of data which will only be held for a 30 day period. Since the LMSI drives have a higher transfer rate and shorter access time, it is possible to speed up the retrieval/sorting and printing processes involved. Thus the check images will be stored in the LMSI based optical platters 57 and 58 (FIG. 4) for a period of 30 days. Subsequently, the information on these platters will be transferred to the jukebox 50.sub.j after the statements have been printed. The assumptions for this configuration are the same as those for the first-mentioned design configuration, except that no time period is required to change platters, since the platters here are always connected with the drives, and the capacity of each platter is 10 GB. Additionally, in this second design configuration (FIG. 4), the drive can seek/read from one side of a platter at a time, even though it has two seek-heads on two sides of the drive. The following Table VI indicates the specifications that are projected and based on the CYGNET 1800 Jukebox which has a Hitachi Drive.
TABLE VI
______________________________________
LMSI-LD 6100
FORTHCOMING
SPECIFICATIONS
(AVL 1Q/92) LMSI DRIVE
______________________________________
Media Capacity
6.0 GB/Platter
10 GB/Platter
Average Seek Time
75 ms 50 ms
Average Latency
25 ms 25 ms
Transfer Rate 75 KB/sec 900 KB/sec
Seat and Spinup Time
1.5 sec 1.5 sec
Spindown Time 1.5 sec 1.5 sec
No. of Seek Heads
2 2
______________________________________
PERFORMANCE ANALYSIS: An analysis similar to that previously done for the first configuration would indicate, in this second configuration (FIG. 4), that the total storage required for 22 days (banking month) would be 438.36 GB. The total "ON-US" item storage for one month would come to 261.81 GB, and the total number of optical platters required for two months storage would be 46 platters. The total number of images stored on one platter would be 345,454 and the average "ON-US" images per platter would be 207,272 (60% of 345,454). The average "ON-US" images per daily (1/22 of 207,272) cycle per platter would come to 9,421 images. The ratio of the total number of images to the images retrieved would come to 36.6. This would mean that approximately one in every 37 images would be retrieved. In order to retrieve one image (on the average), it would require that the system move 37 images, wait for latency, and read the front image. This would take 53.8 millisecond per image when reading the front image. In order to retrieve images from one side of a platter (4710 images), the read time would involve 0.0538.times.4710 images which come to 253.3 seconds, or 53.8 milliseconds per image. For customer account statements, assuming 18,000 accounts each having 20 checks, and each taking a read time of 53.8 milliseconds, this retrieval would come out to 19,368 seconds or 5.3 hours. Likewise, for commercial accounts, the total time to retrieve the data for 12,500 accounts would come to 4.3 hours. For the reconciliation accounts, the time to retrieve image data from one side of the platter would come to 20.87 seconds for retrieving 333 images, and this would come 0.0627 seconds per image. Thus, the total time required to retrieve, on a daily basis, 1,000 reconciliation accounts would come to 0.34 hours. There is no significant time required to transfer all the image data from the archive server 30 to the print server 44. PRINT FUNCTION ANALYSIS: Here the retrieval time for a total of 18,000 accounts with 20 checks each would come to 5.3 hours. SITUATION--OF THREE PAGES PRINTED PER ACCOUNT: this would require the total printing of 54,000 pages and using one printer, this would take 12.5 hours. However, the total clock hours required for the printing would involve 5.3 hours for retrieval time, plus 12.5 hours for printing the 54,000 pages, plus the 0.5 hours to sort the 18,000 accounts. This would result in a total of 18.3 hours. Likewise, for commercial accounts in this second configuration the total print time would take 8.6 hours, and the reconciliation accounts would take a total print time of 41.6 minutes. SITUATION--OF FOUR PAGES OF PRINTING REQUIRED PER ACCOUNT: here the total number of pages required to be printed would be 72,000 pages, and using one printer, this would take 16.6 hours. However, with two printers, this would take 8.33 hours; with three printers, this would take 5.53 hours; and with four printers, this would take 4.15 hours. The total clock hours required for printing the four-page-per-account situation would involve a total of 22.7 hours. For commercial accounts requiring four pages per account, this would require the printing of 54,000 pages and the total print time would take 11.57 hours. Likewise, for the reconciliation accounting using a four-page account print out, the total number of pages would be 4,000, and the total print time would be 55.55 minutes. FIG. 4 shows how the SRM document image data in 10.sub.i are transferred to the archive server 30 which has additional storage on a magnetic disk 30.sub.d. The archive server can transfer the data to the LMSI drives and platters 57 and 58 which can hold 30-days worth of data. Data which is to be held for longer than 30 days would be transferred to the 120 platters shown in the jukebox 50.sub.j. In FIG. 4, the archive server 30 has a small computer systems interface connection (SCSI) to the print server 44 which has an auxiliary magnetic disk 44.sub.d. Attached to the print server is an image workstation 12 through which an operator can access image data for display. The print server 44 provides and transfers its data to the printer 46 and 48 for eventual printing of the required account information. FIG. 5 indicates the third format or design configuration for the storage/retrieval and printout system involved herein. This third design configuration involves the storing of the ON-US images and other document images "separately" in the jukebox platters 50.sub.p, FIG. 5. As seen in FIG. 5, the jukebox 50.sub.j includes two optical storage drives 52 and 54 for storage operations and a third operating drive 56 for retrieval operations. In FIG. 5, the jukebox platters 50.sub.p are shown numbered from platter 1 to platter 120. The archive server 30 connects to the storage 52, 54 and the retrieval drives 56. The platters, in addition to being connected to the print server 44 provides output to two image printers, printers 46 and 48. The storing of documents in the archive subsystem media 50 of FIG. 5 may be done as follows: (a) The checks are segregated (by code number to identify the group from MICR data) into three groups after the amount entry and the date of correction is done in the ICPS (Image Check Processing System) applications. This is done by the print server software in 44. The three groups into which the checks are sorted are: (i) ON-US items (personal and commercial accounts); (ii) Reconciliation account items; (iii) Transit items (items from other banks which are passing through the local bank); (b) The complete images of all documents will be sorted in the archive subsystem 50 shown in FIG. 5, within jukebox 50.sub.j, in the following order: (1) ON-US items will be stored in the jukebox via one of the drive units such as 52. The ON-US items are separated from the other items and are stored only in the first 28 optical disk platters which involve the image data only required for a period of 30 days. A total number of 28 disks (60 percent of 46) are required as was previously indicated in the first "worst case" designed configuration. This situation permits increase in performance, as a lesser number of platters are required to handle the sorting of check images. (2) Reconciliation account items and transit items are stored in the jukebox via a second drive such as drive 54, FIG. 5. The remaining 40 percent of information data (theses are non "ON-US" items such as items related to other outside banks) will be stored in platters numbered 29 through 46 of the jukebox 50.sub.j. PERFORMANCE ANALYSIS FOR THIRD DESIGN CONFIGURATION: Here the front and back images are located in one file. The following assumptions are made: 1. ON-US statement printing is done at night. 2. The window time for printing is 10 to 12 hours. 3. Printing is done in off-business hours and in a batch mode. 4. The images of the "front" side of the check are printed in actual statements given to the customer. 5. The printer speed is maintained at 90 PPM (pages per minute). 6. The time required to change the optical platters is 6 seconds. 7. The capacity of each of the optical platters is 10 GB. 8. The "front" images are stored together in the same files in the captured order. The front images only are retrieved for printing purposes while the back images are held in storage. The following Table VII indicates the specifications projected, for the third design alternative of FIG. 5, which are based on a CYGNET 1800 Jukebox with Hitachi drive.
TABLE VII
______________________________________
FORTHCOMING
SPECIFICATIONS
CYGNET 1800 DRIVE
______________________________________
Media Capacity
2.6 GB/Platter
10 GB/Platter
Average Seek Time
200 ms 50 ms
Average Latency
50 ms 25 ms
Transfer Rate
440 KB/sec 800 KB/sec
Seat and Spinup Time
4.5 sec 1.5 sec
Spindown Time
3.5 sec 1.5 sec
______________________________________
Thus based on the above "forthcoming" specifications, it is possible to estimate the following performance figures. PERFORMANCE ANALYSIS: The total storage required is for 22 days (a banking month) and this would require 436.36 GB. The total ON-US item storage for one month is 60 percent of this which comes to 261.81 GB. The total number of optical platters 50.sub.p (FIG. 5), required for one month would be 27 optical platters (using a 95 percent effective rate and assuming the effective usage, of each platter is 95 percent). The system will be accessing 54 platters (worst case situation) for every single daily cycle of the statement printing. The average image size (for the front and back) is 27.5 KB, while the average image size for statement printing of the "front" only would be 17 KB. The total number of images stored in one platter (using the average of 17 KB) would be 345,454, while the total number of images stored on one side of the platter would be 172,727. The total number of ON-US images on one platter 50.sub.p would be 345,456 while the average of ON-US images per daily cycle per platter would come to 15,702. (This is 345,456 divided by 22 banking days equals 15,702.) This would average out to 7,851 ON-US images per daily print cycle on one side of the platter. The ratio of the total number of images to the number of images retrieved would be 22.0 per platter. This indicates that approximately one in every 22 images will be retrieved. In order to retrieve one image, on the average, it is necessary to move 22 images, wait for the latency period, and read the front image. This would involve a total of 5 milliseconds plus 25 milliseconds plus 21.25 milliseconds which comes to 51.25 milliseconds per image read. This means that the system moves 19.5 images per second from the disk 44.sub.d to the printers 46, 48. In order to retrieve images from one side of a platter (holding 7,851 images), it is necessary to drive, seat, and spin-up the drive in addition to spin-down the drive, plus the read time, which leads to an estimate of 0.0516 seconds per image or an average of 51.6 milliseconds per image (retrieval time from one side of a platter). For customer account statements, the retrieval would involve 18,000 accounts.times.20 checks per account.times.0.0516 seconds per image which would entail 5.1 hours. For commercial accounts, the time to retrieve one document image would average out to 61.25 milliseconds per image. This means that 16.32 images per second are retrieved. Now to retrieve images form only one side of the platter (4,710 images), the time involved would be 61.5 milliseconds per retrieval of an image, on the average. Thus, the total time to retrieve 12,500 commercial accounts each having 20 checks and at the rate of 61.5 milliseconds per image--this would come to 4.2 hours. For the reconciliation accounts, the process of retrieving images from one side of the platter (1000/3 equals 333 images) and where the total number of platters per cycle (equals 28/22) equals 1.27 equals 3 sides to be accessed, this would come to a total of 23.47 seconds for retrieving 333 images--which comes to 0.0704 seconds per image. Thus, the total time required to retrieve, on a daily basis, 1,000 reconciliation accounts would come to 1,000 accounts.times.20 items per account.times.0.0704 seconds which comes to 0.39 hours. The time required to transfer all the images from the archive server 30 to the print server 44 is negligible and not counted. Thus no additional time is required for images to be transferred to the print server since they are transferred to the print server in the same 5.1-hour retrieval window used for the archives server (during customer account retrieval). Thus, the time to sort the 18,000 accounts would be only one-half hour.
______________________________________
Print Print Print Print
Window Window Window Window
Account Type
with with with with
Per Cycle 1 IPTR 2 IPTR 3 IPTR 4 IPTR
(Daily) in Hours in Hours in Hours
in Hours
______________________________________
DESIGN NO. 1 WORST CASE CONFIGURATION
(See FIG. 2)
Case 1: Average 3 pages including text [IPTR = Image Printer]
Consumer 18.6 12.35 10.26 9.25
Commercial 13.7 9.4 7.96 7.25
Reconciliation
1.60 1.26 1.14 1.09
Case 2: Average 4 pages including text
Consumer 22.7 14.4 11.63 10.25
Commercial 16.67 10.88 8.9 7.99
Reconciliation
1.47 1.19 1.10 1.02
DESIGN NO. 2 USING
LMSI DRIVE CONFIGURATION (See FIG. 4)
Case 1: Average 3 pages including text
Consumer 18.3 12.05 9.96 8.95
Commercial 13.4 9.1 7.66 6.95
Reconciliation
1.52 1.18 1.06 1.01
Case 2: Average 4 pages including text
Consumer 22.4 14.1 11.33 9.95
Commercial 16.37 10.58 8.6 7.69
Reconciliation
1.39 1.11 1.02 1.01
DESIGN NO. 3 STORING ON-US AND TRANSIT
ITEMS SEPARATE IN THE JUKEBOX (See FIG. 5)
Case No. 1: Average 3 pages including text
Consumer 18.1 11.85 9.76 8.75
Commercial 13.3 9.0 7.56 6.85
Reconciliation
1.57 1.23 1.11 1.06
Case No. 2: Average 4 pages including text
Consumer 22.2 13.9 11.13 9.75
Commercial 16.27 10.48 8.5 7.59
Reconciliation
1.44 1.16 1.07 0.99
______________________________________
Described herein has been a versatile and flexible document image storage and retrieval system suitable for groups involved with massive amounts of transactions which have to be stored, retrieved, displayed, corrected and amended, and printed out on a regular basis. One typical archival storage and retrieval system described herein can store, for example, 400,000 accounts (mid-size bank) and on each day of the banking month (of 22 days) sort the stored image data from 18,000 accounts, retrieve them, and then print them in multiple-page statements for each account. Each account will involve transactions of checks involved in each account for over the last 30-day period. This sorting, retrieving and printing can be accomplished as with time frames such as: For one printer, the complete cycle would take 18.6 hours per day for a three-page statement, and 22.7 hours per day for a four-page statement; with two printers, the complete transaction would take only 12.35 hours for a three-page statement and 14.4 hours for a four-page statement; with four printers, the entire cycle would be accomplished in 9.25 hours per day a three-page statement and 10.25 hours per day for a four-page statement. A major advantage of this system is that formerly used sorting methods would required 10-20 hours alone just for the sorting while the present system requires one-half hour or less for the sorting. This is due to the fact that no check images are sorted, but rather only the index numbers of the images are sorted, thus saving long, drawn-out time periods such as were required for the old sorting systems. The Print Index Files are short, fixed records and very easy to sort as compared with older systems which have variable records and involve large database records for each check image. While the described system is capable of variable configurations, it should be understood that the invention is encompassed by the following claims:
|
Same subclass Same class Consider this |
||||||||||
