Method for creating a toll-free number audit tape6594659Abstract A sorting method for use with a database having randomly hashed records stored therein. The records are sorted in accord with a predetermined parameter and the database exists on a storage medium. At least one stage of the method includes sorting the records in accord with a subgroup of the predetermined parameter and into record groups stored in the storage medium. Records are further sorted in later stages and may be read onto a second medium for sorting. The sorting method allows the amount of processor memory used for sorting to be limited, such that adverse effects on normal call processing are limited or nonexistent. Claims We claim: Description TECHNICAL FIELD
Bucket no. Record size range (in bytes)
1 Up to 80
2 81 to 144
3 145 to 500
. .
. .
. .
38 32,001 to 64k
39 64,001 to 128k
40 128,001 to 220k
Each record is stored in the record group that holds records of corresponding size. For example, assuming the record size for toll free number 888-777-6666 is 280 bytes, the database will include a pointer to bucket three for that number. The record for 888-777-6666 will be in bucket three. There are a large number of possible toll-free numbers. The following table represents all possible toll-free number combinations:
0 800 - X.sub.9 X.sub.8 X.sub.7 - X.sub.6 X.sub.5
X.sub.4 X.sub.3
1 888 - X.sub.9 X.sub.8 X.sub.7 - X.sub.6 X.sub.5
X.sub.4 X.sub.3
2 877 - X.sub.9 X.sub.8 X.sub.7 - X.sub.6 X.sub.5
X.sub.4 X.sub.3
3 866 - X.sub.9 X.sub.8 X.sub.7 - X.sub.6 X.sub.5
X.sub.4 X.sub.3
4 855 - X.sub.9 X.sub.8 X.sub.7 - X.sub.6 X.sub.5
X.sub.4 X.sub.3
5 844 - X.sub.9 X.sub.8 X.sub.7 - X.sub.6 X.sub.5
X.sub.4 X.sub.3
6 833 - X.sub.9 X.sub.8 X.sub.7 - X.sub.6 X.sub.5
X.sub.4 X.sub.3
7 822 - X.sub.9 X.sub.8 X.sub.7 - X.sub.6 X.sub.5
X.sub.4 X.sub.3;
where the second and third numbers must be identical, the first three digits cannot be "899" or "811," and X.sub.9 cannot be a "0" or a "1". Thus, there are 64 million possible toll-free number combinations. As explained above, there is occasionally a need to provide an audit tape of toll-free records. Currently, national standards require that the records on the audit tape be indexed by toll-free number in numerical order. However, toll-free numbers are assigned randomly and the records in the toll-free database are randomly hashed, i.e., are not sorted in numerical order. Thus, the records must be sorted in numerical order prior to writing to the audit tape. Referring to FIG. 3, therein is shown a storage medium 54 having a plurality of randomly hashed records 60 stored therein. Storage medium 54 is accessed on a continual basis for retrieving toll-free records upon request from the SCP. As illustrated, the records are not stored in numerical order. The records must be sorted in numerical order prior to writing to the audit tape. As described in detail below, the method of the present invention includes allocating space within processor memory for pre-sorting the records, reading the requested data for each record, temporarily copying the requested data for each record to a record group within the allocated space, transferring the records from each record group to a corresponding record group within a second storage medium, sorting the records in each record group, and consecutively writing each sorted record group to tape. In the presently preferred embodiment, the records are sorted in complete numerical order in two stages. As will become apparent, the records may be sorted in complete numerical order in more than two stages. The present embodiment employs a sorting algorithm that utilizes a predetermined, limited amount of processor memory for sorting. By limiting the amount of processor memory for sorting, the extent of adverse effects on call processing can be controlled. In the present embodiment, in the first sorting stage each record is copied from the database and pre-sorted to a group in processor memory. Pre-sorted groups may be written to storage medium 54 or, in an alternate embodiment, to a second storage device. The pre-sorted record groups are consecutively accessed and the records in each pre-sorted group are sorted in processor memory and written to tape, providing a list of records sorted in complete numerical order. In other embodiments, more than two sorting stages may be utilized. In the first stage of the present embodiment, the records are sorted into a number N of record groups. Each record group corresponds to a toll-number numerical subgroup. A toll-number subgroup consists of less than all ten digits of the toll-free number. For example, a subgroup may consist of the first five numbers of the toll-free number. It is noted that sorting the records in the first (pre-sort) stage by the first five digits of the toll free number will yield 640 record groups (8.times.8.times.10) in this stage. As discussed above, the amount of processor memory used for sorting may have a direct impact on whether the sorting process affects normal call processing. Thus, a definite amount of memory, referred to as M4S (memory for sorting), may be allocated for sorting. In the present embodiment, the records are first distributed, based upon a subset of the criteria for sorting, to a plurality of groups allocated within M4S. A group does not have to be large enough to contain at one time all of the records from the database that belong to that group. As a group approaches capacity, the present contents of the group may be written to a storage medium. Preferably, a second storage medium is included for receiving the contents of each group. The storage medium has space allocated for each group and each group in the storage medium is large enough to store all of the records for that group. Writing record groups from the processor memory to a second memory device requires SCP write cycles. Write cycles may affect call processing. To limit the effect on call processing due to write cycles, a variable for the permissible disk write frequency, or RTE, is designated. It may also be desirable to designate the duration of the period, PRD, over which sorting must occur. For example, it may be predetermined that for a four hour period, it is tolerable to allow 9 disk write cycles per second. Thus, RTE is 9/second and PRD is 4 hours. In the present example, as illustrated in FIG. 6, it is not necessary to include the full record for each number on the audit tape. For example, presently the national organization that regulates the distribution of toll-free numbers requires that each record on the audit tape 601 include only the toll-free number 611, the identity of the owner 612, and the most recent date 613 that the record was updated. In this case, a predetermined number of bytes may be allocated for each record:
Number of bytes Field
X number
Y identity
Z date
X + Y + Z total
In the above example, the first five digits corresponds to the identity of each record group. Thus, within each record group, a record may be identified by the last five digits of the toll-free number. In the binary system, a five digit number (base 10) may be adequately represented by 3 bytes. The number of bytes used to identify the owner is selectable. In the presently preferred embodiment, the identity of the owner is adequately represented by 5 bytes. In the present embodiment, the number of bytes provided for representing the date field is 4. Thus, each record consists of a total of 12 bytes. As explained above, pre-sorting is based upon a subgroup of digits of the toll free number. The number of subgroup digits determines the number of pre-sort record groups. The following table shows the number of pre-sort record groups required based upon the number of subgroup digits used for pre-sorting:
Number of digits
(starting from most-significant-digit) Number of pre-sort record groups
3 8
4 64
5 640
6 6400
7 64000
As can be ascertained from the above-table, as the number of digits increases, the number of pre-sort groups increases. As discussed above, M4S is predetermined. Thus, as the number of pre-sort record group increases the amount of pre-sort memory allocated for each group decreases. In the presently preferred embodiment, 4 megabytes of processor memory is allocated for pre-sorting. The following table shows the amount of memory allocated for each group for this example:
Number of digits
(starting from Number of pre-sort Memory per
most-significant-digit) record groups record group
3 8 .5 Mbytes
4 64 62.5 kbytes
5 640 6.25 kbytes
6 6400 625 bytes
7 64000 62.5 bytes
In the present embodiment, as records are written to each record group and the processor memory space allocated for a group reaches capacity, the records within the pre-sort group are written to a corresponding record group in a second memory device. Thus, the frequency of writing to the second memory device is dependent upon the amount of processor memory allocated for each record group, i.e., the number of records that can be stored in each record group before writing to the second memory device decreases as the processor memory space allocated for each record group decreases. Returning to the toll-free number database discussed above, there are 64 million possible toll-free number combinations, i.e., records. The following table shows, for the present embodiment, the relation between the amount of memory allocated per record group, the number of records that can be stored between write cycles, and the approximate total number of disk write cycles required.
Number of Approx. No.
Memory per record group records/write cycle disk writes
.5 Mbytes 41,600 1500
62.5 kbytes 5,200 12,300
6.25 kbytes 520 123,000
625 bytes 52 1,230,000
62.5 bytes 5 12,300,000
Referring to the above-two tables, it can be observed that pre-sorting in accord with fewer digits requires fewer disk write cycles, while sorting in accord with more digits requires more disk write cycles. Disk write cycles may affect (slow down) call processing. However, pre-sorting in accord with a small number of digits provides only a small number of pre-sorted groups with many records in each group. Having many records in each group may warrant additional pre-sort stages, as explained below. The relationship between the parameters discussed above may be expressed by the following formula: ##EQU1## where N=Number of record groups, and M4S=Disk Space Available for Sorting TOT=Max. Number of Records Possible RSZ=Record Size PRD=Planned Hours for Sorting Stage RTE=Permissible Disk Write Frequency (per hour) The above equation may be solved for any variable, depending upon the restrictions present within a given sorting system. For exemplary purposes, it may be assumed that M4S, TOT, RSZ, and RTE are assigned predetermined values. Thus, N or PRD may be selected and the equation solved for the other variable. Following the above example, the variables may be assigned the following values: M4S=4 Mbytes p1 TOT=64 M RSZ=12 bytes PRD=4 hours RTE=32,400 (9 disk writes per second) and N=675, which corresponds to the first five digits of the toll-free number, translating to 640 record groups. Referring to FIG. 3, 64 million randomly hashed records 60 are pre-sorted into 640 record groups 64. As each record group reaches capacity, the records therein are written to a corresponding record group in a second storage device 66. Each write cycle 68(a), 68(b), and 68(c) may be referred to as write-bucket. The approximate number of bytes per write-bucket may be expressed as WBB, where: ##EQU2## In the example provided above, WBB=6,000 bytes (approx.). The approximate number of bucket-write cycles per group may be expressed as NBW, where: ##EQU3## In the example provided above, NBW=190 (approx.). Thus, in the preferred embodiment records are pre-sorted to a plurality of record groups in a second storage device. Sorting is completed in the second storage device and the sorted database is then written to tape. In the preferred embodiment, record groups are sorted in numerical order. The records are written to an audit tape after each group is sorted. Referring to FIG. 4, the second storage device 66 includes a processor and memory for sorting the records in each record group. The second storage device processor may be the same processor that executed the pre-sorting stage or may be a separate processor. In the preferred embodiment, the records within the first record group are sorted by the SCP processor in accord with the last five digits of the toll free number. The first record group is then written to an audit tape 68. The records in the next record group are then sorted and written to tape. This process continues until all of the records in the last record group are sorted and written to tape. Returning to the example provided above, there are 100,000 records in each record group in the second storage device. As discussed above, each record consists of 12 bytes. Therefore, the final stage requires only 1.2 megabytes of processor memory, which is well below M4S (4 megabytes). In an alternate embodiment, the randomly hashed records are sorted in more than two stages. In this embodiment, the records are first sorted into a number of groups based upon a first subset of the numerical field. Then the records in each group are sorted into a plurality of subgroups. Further subgroups are created within subgroups and pre-sorting continues until the records are sorted in complete numerical order. The final stage of sorting may be identified when the memory required to sort all of the records within a pre-sort group is less than M4S. Each stage of sorting may be defined by the following equation: ##EQU4## where N=Number of record groups, and M4S=Memory Space Available for Sorting TOT=Max. Number of Records in Group RSZ=Record Size PRD=Planned Hours for Sorting Stage RTE=Permissible Disk Write Frequency (per hour) In each successive stage of pre-sorting, RSZ decreases, as the number of bytes needed to represent the remaining toll-free number digits decreases. As discussed above, the equation may be solved for any dependent variable. Whether a variable is independent or dependent may depend upon record data, characteristics of the storage medium, and restrictions present within a sorting stage. The final stage of sorting occurs when: (NDNB+DNDB).multidot.TOT.ltoreq.M4S, where, NDNB=Number of Bytes per Record Not Including the DN, DNDB=Number of Bytes Needed to Represent the Remaining DN Digits, and TOT=Number of Records in Group In an alternate embodiment, records may be pre-sorted in multiple recursive stages within storage medium 54. Each stage may be defined by the number and placement of contiguous digits to be sorted. The final stage of pre-sorting may be designated as the stage in which, when completed, the remaining unsorted digits can be sorted within M4S and written to tape. Thus, an algorithm for sorting within storage medium 54 may have one or more pre-stages and a final sorting stage. For sorting records in accord with the toll-free number, the maximum number of contiguous digits sorted at each stage is dependent upon the amount of memory available for sorting, i.e., M4S. For example, an initial pre-sorting stage may sort toll-free database records in accordance with the first five digits, as follows:
Initial Stage.sub.1 : 800-20
Initial Stage.sub.2 : 800-21
Initial Stage.sub.3 : 800-22
Initial Stage.sub.4 : 800-23
. .
. .
. .
Initial Stage.sub.638 : 888-97
Initial Stage.sub.639 : 888-98
Initial Stage.sub.640 : 888-99
The memory required for sorting the next stage is dependent upon the number of digits remaining in the designated (toll-free) number. In the above example, there are five digits that remain in the designated number. Therefore, there are 100,000 records in each initial stage. The number of bytes required for each initial stage is defined as: BYTES=REM.multidot.SRSZ where BYTES=memory required for sorting (in bytes) REM=number of records represented by remaining digits RSZ=record size In the above example, REM=100,000 and RSZ=12. Therefore, BYTES=1.2 megabytes. In the present embodiment, Initial Stage.sub.1 may be followed by one or more pre-sorting stages, if necessary, and a final sorting stage. In the above example, M4S=4 megabytes. Because the unsorted records in Initial Stage.sub.1 may be sorted within 1 megabyte, sorting may be completed in a final sorting stage without further pre-sorting stages. The records sorted in Initial Stage.sub.1 may be read out of the processor memory and written to tape. FIG. 5 illustrates one alternate embodiment in which randomly hashed records are sorted in four stages. Multiple sorting stages may be required if there is a large number of records and/or record sizes are large. In this embodiment, a database of randomly hashed records 70 are first sorted into four record groups 72(a), 72(b), 72(c), and 72(d). Next, each record group is sorted into further record groups. In the example shown, record group 72(a) is sorted into record groups 73(a), 73(b), and 73(c). In the next step, each record group is sorted into further record groups. In the example shown, record group 73(a) is further sorted into record groups 74(a), 74(b), and 74(c). In the next step, the records in each record group are sorted in numerical order. Upon completion of sorting all of the records in the final record group, all of the records are sorted in complete numerical order. Similar to the two-stage embodiment, subgroups of record groups may be transferred to a second storage medium. Thus, the present invention provides a proficient method for sorting randomly hashed records. The method can minimize adverse effects on call processing, as seen in the prior art. The method is particularly useful for providing a pre-sorted database from randomly hashed records without effecting the quality of call processing. It is to be understood that the above-described embodiments are merely illustrative of the principle of the invention and that many variations may be devised by those skilled in the art without in departing for the scope of the invention. It is, therefore, intended that such variations be included within the scope of the claims.
|
Same subclass Same class Consider this |
||||||||||
