System for database creation, maintenance and access using event marking and two-dimensional partitioning6401098Abstract A system and method that reduces access, backup, restore and processing time required by partitioning data in a database into two dimensions. The first dimension is by event processing date and the second dimension is by partition group. Once partitioned in two dimensions the data is stored in two-dimensional partitions in the form of rolling tables including an event marker. This partitioning and marking is done upon receipt of data which eliminates the need for further processing to efficiently store data. This partitioning and marking further reduces the size of data blocks that need to be handled when backing up or restoring data, deleting data, and retrieving data. Thus, extremely large volumes of data can be handled in an efficient manner. Claims What is claimed is: Description REFERENCE TO MICROFICHE APPENDIX
TABLE 1
TERM DEFINITION
CDR Call Detail Record
CSR Customer Service Representative
Billing on demand Production of a bill when customer
requests it.
ERP Event Rating and Pricing (computer
subsystem)
CBM Customer Billing Manager (computer
subsystem)
CCM Customer Care Manager (computer
subsystem)
Controller Software subsystem (part of CBM).
Finds when a billing cycle is due to begin,
initiates the production run and updates
the cycle due date.
Event Contains data related to usage (CDRs)
and other customer related charges (e.g.
one-time and recurring charges,
adjustments). Also contains the Partition
Group, Event Retention Period (indicates
if the Event is to be deleted immediately
after the production run is executed,
deleted after the retention period, (e.g. 80
days), or never deleted.
Keeping in mind the goal is to keep database partitions to a size that enables maintenance (nightly backups), and at the same time provides for fast and flexible access of the latest information. The access must be fast enough for a CSR to retrieve up-to-date records, which capture the calls customers had made up to the point of calling the CSR. Segmenting partitions by bill cycle only would still create partitions that are too large, as typically a monthly bill cycle exists. For example, assuming the system processes 30 million CDRs per day and records need to be kept for 80 days, this would result in 2.4 billion usage events. With each usage event averaging approximately 1000 bytes, the database would be 2.4 Terabytes. Using daily partitions would still result in the daily partition size of 30 Gigabytes. Such large size partitions would detrimentally impact performance and make fast backups impossible. For example, the Oracle.TM. database recommends partitioning anything larger than 2 Gigabytes. Another reason for further partitioning is that the database needs to be independent of the bill cycles, i.e. a customer could decide to change his bill cycle from, for example, the 1.sup.st of the month to the 15.sup.th of each month. Further partitioning is thus needed. The second dimension for partitioning is by group ID. The group ID has two requirements. First, the assignment of group IDs must be done in a configurable way in the production environment (e.g. initial bill cycle, random, constant, last digits of account ID etc.) Once assigned, the group ID of the account can be changed, but only on bill instance boundaries. FIG. 1 illustrates the simple case where data is logged in the same period duration (e.g. a day or a week) for each group ID. In FIG. 1, group ID is represented by the vertical axis and time by the horizontal axis. Items 10, 20, and 30 belong to Group ID "A". Items 40, 50, and 60 belong to Group ID "B". Items 70, 80, and 90 belong to Group ID "C". If group ID numbers represent business customers, it is unlikely that all business customers will have the same level of activity. Larger corporations may generate more events in a day than small ones in a week. However, in FIG. 1, all partition tables are the same size. Referring to FIG. 2, to keep the size of the partitions manageable, companies with higher numbers of events per day (Group C, items 150, 160, 170,and 180 in FIG. 2) would have shorter period of time per table than the small ones (Group A, 100 and 110 in FIG. 2). Referring to FIG. 3, It is further possible to increase the flexibility within the same group by allowing different time periods for different groups as well as the same group. For example, for Group A, items 200, 210 and 220 in FIG. 3 would partition the week into three partitions of different sizes. First, item 200 would represent the time period Saturday through Tuesday since the volume of calls is very light over the weekend and moderately low on Monday and Tuesday. Second, item 210, represents a very heavy volume on Wednesday followed by, third, item 220 that contains a moderately heavy volume on Thursday and Friday. By partitioning data as shown and discussed in FIG. 3, processing time required for backup and recovery can be kept low, because the table sizes are kept at a minimal size. For example, when the partitions are generated as per FIGS. 1-3, the "previous period" table (e.g. Item 200, Partition Group A, Partition Table 1) is backed up at night. The system still runs and safely uses "write" access, because the new events will be inserted into the "current day" table, rather than the "previous period" table being backed up. Once backed up, no further backup is necessary for an "old" partition. The following examples shown in FIGS. 4 through 7 examines and illustrates concurrency and two-dimensional partitions in the form of rolling tables in more detail using a rating and billing process. The example contains six monthly bill cycles, each with a pricing delay of one day. Pricing delay is a delta between the bill run and the bill cycle due date. The decision whether to have a pricing delay is entirely up to the user. The present invention does not require the presence of a pricing delay. For each Partition Group, the database is assumed to contain 80 days worth of data. The example has a different date for the bill cycle due date and pricing due date. Bill Cycle due date is the date the Controller will store the new bill periods (for April 1 through May 1) and set the bill cycle due date to the next month and create a new partition for it. The Pricing Due date is the start of the Bill run. The Bill run actually produced the printed bill for the customer. The bill run could happen on the same day as the Bill Cycle due date or later than the Bill Cycle due date, or on customer demand. The earliest pricing due date is equal to the cycle due date, but the pricing due date could start later. However, in practice, telecommunications companies will normally want to minimize such differences and send bills out as soon as possible.
TABLE 2
Bill
Cycle
Name Pricing Parti-
(Parti- Billing Period Due Date tion
tion Cycle Due Start and End (start of bill Table
Group) Date Dates run) Number
01 February 01 January 01-January 31 February 2 1
March 01 February 01-February 28 March 2 7
April 01 March 01-March 31 April 2 13
May 01 April 01-April 30 May 2 19
05 February 05 January 05-February 04 February 6 2
March 05 February 05-March 04 March 6 8
April 05 March 05-April 04 April 6 14
10 February 10 January 10-February 9 February 11 3
March 10 February 10-March 9 March 11 9
April 10 March 10-April 9 April 11 15
15 February 15 January 15-February 14 February 16 4
March 15 February 15-March 14 March 16 10
April 15 March 15-April 14 April 16 16
20 February 20 January 20-February 19 February 21 5
March 20 February 20-March 19 March 21 11
April 20 March 20-April 19 April 21 17
25 February 25 January 25-February 24 February 26 6
March 25 February 25-March 24 March 26 12
April 25 March 25-April 26 April 26 18
Table 2 is used to illustrate the concept of concurrency. The example contains six monthly bill cycles, each with a pricing delay of one day. For example, Bill Cycle 01 shown in FIG. 4, with a Cycle Due .sup.1 The Partition Number is in reference to specific partitions in FIGS. 4-6. Date of April 1 as shown in Table 2 above. On March 31 (before the Cycle Due Date), no Bill Cycle will be running and Events will be inserted into the highest date partitions (13 through 18) as shown in FIG. 4 and operation 430 in FIG. 9. In the discussion of the examples found in FIGS. 4-7 reference will be made to the FIGS. 8 and 9. FIG. 8 is a diagram showing the modular configuration of an embodiment of the present invention used to accomplish the two-dimensional partitioning of data. FIG. 9 is a flowchart showing the process and method used by an embodiment of the present invention to accomplish the two-dimensional partitioning of data. ERP Event Inserter 300 shown in FIG. 8 is used for all Event specific maintenance. The ERP Event Inserter 300 invokes the stored procedure 330 that creates a new partition when a new Bill Cycle is created or an existing Cycle Due Date is increased. ERP Event Inserter 300 is also used for deletion when a parameter-driven number of days (e.g. 80 days) has passed since the Production Run is distributed and the partition is no longer needed online. ERP Retriever 310 shown in FIG. 8 retrieves the Events which qualify for a Bill cycle. Controller 320 shown in FIG. 8 finds a cycle that is due to begin. It initiates the production run and updates the Cycle Due Date. Create Cycle GUI 350 shown in FIG. 8 is used for the creation of a new Cycle, e.g. for a new customer. This request is sent to the ERP Event Retriever. Referring to FIG. 9, execution of the present invention begins in operation 400 with the acquisition of a new customer on March 1 from Table 2 above. In operation 410 of FIG. 9, the Create Cycle GUI 350 creates a new Cycle for a new customer and sends a message to the ERP event inserter 300, shown in FIG. 8, to set a customer billing period including billing period start and end dates as well as the creation of a customer partition group by calling stored procedure 330 ("create partition"). In this example the billing period start date is on March 1, end date March 31 (Partition Table Number 13 in Table 2). After this set up is complete, Events are captured in the Partition Table No. 13, as illustrated by 430 in FIG. 9. It should be noted that it is possible for collected Events to be inserted in more than one Partition Table in a billing period to keep Partition Tables of manageable size. In that situation the Partition Table is incremented by ERP prior to the Cycle Due date (440, 450). In the simple example from Table 2 the Partition Table is incremented monthly, in this case from Partition Table Number 13 to 19. When the Current Cycle Due date is reached (460), the CBM Controller (320) increments the Cycle Due Date (470) to next month. (in our example from Table 2 it gets incremented from March 1-March 31 to April 1-April 30. Referring to FIGS. 5 and 9, on April 1, the Bill Cycle comes due and the Controller sets the Cycle Due Date to the next month, May 1 in operation 470 of FIG. 9. At the same time as the update to the Cycle Due Date, the Controller 320 calls the ERP Event Inserter 300, which uses the stored procedure 330 ("Create Partition") shown in FIG. 8 to create a new partition for Partition Group 01 with the date May 1 in operation 420 of FIG. 9. In operation 430 of FIG. 9, Event insertion is done by ERP Event Inserter 300 for April 1 and will now go into the new partition created in operation 450 of FIG. 9, so event insertion will be placed into partitions 14 through 19 as shown in FIG. 5. The insertion is done by ERP Event Inserter 300 in FIG. 8. Referring to FIG. 6, on April 2, the pricing due date is reached and the full billing run begins. The determination that the current date is the pricing due date is made in operation 490 in FIG. 9. At that point events are extracted from all partitions for Partition Group 01 for the Bill Period that corresponds to dates March 1 through April 1 in operation 500 shown in FIG. 9. The operation is performed by ERP Retriever 310 in FIG. 8. Qualifying retrievals are events with the same Event Sequence Number for a given Account ID. A bill cycle may contain several accounts, but each with its own sequence number. The qualifying retrievals may be found in the partitions which were created/updated during the qualifying period (as shown in examples for March 1 through March 31. Retrieval speed will be increased because of partition elimination and this is accomplished by the ERP Retriever (CBM) 310 calling stored procedures 330. Partition elimination means that in the example shown in FIG. 6, only 4 partitions out of 19 will be read. In operation 430 of FIG. 8, insertion will continue into partitions through 19 using the ERP Event Inserter 300. In addition, contention will be minimized since the only partition accessed by both insertion and retrieval is partition 19. If the creation of a new partition is coupled to the increase of sequence number in a given bill cycle, then contention can be eliminated completely. FIGS. 4 through 6 represent a simplified view of the issues to be solved. They are only a variation of FIG. 1, where the time duration of each cycle and each Partition Table is constant. However, as illustrated in FIGS. 2 and 3, real situations are more complex, as bill cycles cannot be assumed to fall neatly on the partition "time" boundary of the Partition Table. For example, from FIG. 3, Partition Group A, assuming the pricing due date is in the middle of Partition Table 3, item 220, and assuming there is a 2-day "lag" time between the Billing Period End Date and a Pricing Date (i.e., the Billing Period End date is on 4/15 and the Pricing Date is on 4/18). On 4/16 through 4/17 the system is still entering data into the same partition 220. Thus, using only the mechanism described above in FIGS. 4-6, ERP Event Inserter 300 shown in FIG. 8 would not be able to price only the events, which occurred prior to 3/16, as it would have no way to distinguish between the Events in the Billing Period ending 4/15 and those beyond, as all Events would be entered into the same Partition (item 220) Table 3. This is due to the fact that Events with the same sequence number can spill over several partitions. Hence, the mechanism described above is enhanced to take care of this situation, as discussed below. The way to solve this problem is for the ERP Event inserter 300, shown in FIG. 8, to maintain an Event Sequence Number in each partition table as done in operation 430 of FIG. 9. This number is incremented when going from the n-th run of Bill Cycle X to the n-th+1 run of the same Bill Cycle as provided in operation 480 of FIG. 9. Each event is marked with a sequence number prior to its insertion in the database 340 (i.e. in the Partition Tables) by the ERP Event Inserter 300 shown in FIG. 8. Such a sequence number can also be thought of as a bill number. For example, assuming the same case as in the previous paragraph above, i.e. in FIG. 3 Partition Group A (items 200, 210 and 220) represents a set of new customers, whose Bill Cycle starts on 3/16 and finishes on 4/15. Further assuming that Partition Table 1, item 200, starts on 3/16, but Partition Table 3, item 220, finishes on 4/20 and the Pricing Due Date is on 4/18. The following sequence of events will occur. 1. All events for one given customer in Partition Group A Partition Table 1, 2 (items 200 and 210 in FIG. 3) are marked with Event Sequence 1 by ERP Event Inserter 300 shown in FIG. 8 in operation 430 of FIG. 9. 2. All events for the same given customer in Partition Group A, Partition Table 3 (item 220 in FIG. 3) are marked with Event Sequence number 1 up to and inclusive of 4/15 by ERP Event Inserter 300 shown in FIG. 8 in operation 430 of FIG. 9. 3. All events for the same given account in Partition Group A, Partition Table 3 (item 220 in FIG. 3) are marked with Event Sequence number 2 starting on 4/16 by ERP Event Inserter 300 shown in FIG. 8 in operation 480 of FIG. 9. 4. On 4/18 ERP Retriever 310 shown in FIG. 8 reads data from Partition Group A, Partition Tables 1, 2 and 3 (items 200, 210, 220 in FIG. 3), and bills only Events with Event Sequence Number "1" for that given account (and of course other accounts with other sequence numbers due in that same Partition Group A). This is done in operation 500 of FIG. 9 by the ERP retriever 310 shown in FIG. 8. All the foregoing Event Sequence numbers are created just prior to the Events insertion in the Partition Tables (database 340 shown in FIG. 8) by ERP Event Inserter 300 shown in FIG. 8. In a second Bill Cycle a month later ERP will read data from Partition Tables 3, 4, 5 (not shown in any table), and bill Events marked with an Event Sequence Number "2". This will mean that only the events marked "2" in Partition Table 3 are included. In this example, this corresponds to events starting 4/16. Once pricing is done, the "old" event sequence number is the link between the account, the summary events and the single event belonging to one bill. The increment of the Event Sequence Number is done by the CBM Controller 320 shown in FIG. 8, and is based on date and time. This Event marking combined with horizontal/vertical partitioning has several positive side-effects which are discussed in detail below. Performance optimization is seen since the database is only written to once (traditional systems go back to the database at billing time and mark the events then, which puts an unnecessary load on the machine. It also complicates backups, as the tables need to be backed up for the second time now that the bill number has increased, which hits performance for the second time). By keeping track on Event sequences the system knows which events have been billed and which are yet to be billed without physical modification of the database records. By keeping track on Event sequences a given bill can be easily retrieved (through "read" database access) without an overhead of a complex selection algorithm. The incrementation of the sequence numbers is coordinated with bill production, i.e. ERP understands which sequence number goes to each bill instance. Flexibility is increased since it is possible to change bill cycles easily to swiftly react to either business circumstances, or load balancing among different bill cycles, as creating partitions is independent of the bill cycle. Different accounts can be on different bill cycles (e.g. Account ID 100 can be on bill cycle 2 while Account ID 200 can be on bill cycle 11). As the system is bill cycle independent, system set up can be driven by production requirements. For example creation of partitions can be done daily, every two days etc., depending on the system administration needs. If production requirements change, e.g. instead of creating a new partition every two days these need to be created daily, the more frequent partition creation can be implemented without impacting rating and billing. System administration (e.g. backup, restores) and ongoing operation requirements do not clash. For example, a system may be set up to create a new partition daily at a fixed time, e.g. at 2 am. When this time is reached, ERP continues rating and inserting new events into this new partition, while the "old" partition can be set "off-line" and backed up safely. All of these activities are independent of the bill cycle. The extreme flexibility may be illustrated by supporting threshold billing, where a customer gets a bill when a certain threshold in dollar amount reached, rather than on a particular date. For example, a bill may be generated for a particular customer each time the bill reaches $10,000.00. With some very large customers this amount may be reached every few days. Thus, that customer would have many bills issued per month. So in this situation, there is no "bill cycle" at all. Thus, the concept of bill cycles is not required for the partitioning/sequence number concept to be employed. Architecture FIG. 7 is an example of a 3-tier architecture, which supports this invention. Database partitioning is done on the database server 1200. Applications using such partitions run either solely on the application server 1100 (e.g. the batch applications ERP and CBM)), or on the application server 1100 as well as the PC client 1000. The example of the latter is the CSR responding to a customer query will use the Customer Care Manager (CCM) subsystem to retrieve data about the events (i.e. phone calls) from the database server 1200. Thus ,"Application" on the application server 1100 can be ERP, CBM or CCM. Application views 1001 on the PC client 1000 would relate to the CCM subsystem, used by the CSR. The architecture shown in FIG. 7 is called a 3-tier architecture. The main advantage is that if the application server 1100 is overloaded, the customer can simply add additional application servers 1100 to the network without rewriting any application software. In the typical telecom environment millions of transactions per day are likely to occur. If the invention was used in a very small environment, it could use the PC Client system 1000 and one server, in which situation the "business logic" shown on the application server 1100 would run on the PC client system 1000. Such a configuration is called 2-tier architecture. The problem with a 2-tier architecture is that if the PC client server 1000 is overloaded, there is nothing that can be done to spread the load (other than to perhaps add more memory to the PC, or buy faster PCs, none of which may solve the problem). A 2-tier system would be extremely limited and would not be able to handle the possible load anticipated. Referring to FIG. 7, the PC client system 1000, Application server 1100 and Database server 1200 use TCP/IP 1007 as the communication protocol to communicate amongst each other. The PC client system 1000 uses a Windows NT Operating System 1006. Application programs using Microsoft Visual C++ and Microsoft Foundation Classes (MFC) 1004 are run on the PC client system 1000. All application logic resides on the application server 1100. Communications can be between the PC client system 1000 and the Application Server 1100, between the Application Server 1100 and the Database Server 1200. All communications to the Database Server 1200 goes through the Applications Server 1100. Still referring to FIG. 7, both synchronous and asynchronous communications are supported. The communications protocols, methods etc. are provided by ACL 1103 ("AMS Class Libraries"). ACL 1103 represents a set of common functions used by all applications. Examples of functions provided are database access (read, write), communications access, and messaging. Still referring to FIG. 7, all data resides on the database server 1200. As the invention described is an object oriented (OO) system, the translation between objects and the relational database on the database server 1200 is done through a persistence layer (not shown). This layer is responsible for "mapping" objects to database tables. The persistence layer is part of ACL (not shown). Note that a typical system is installed on either a Local Area Network (LAN) or Wide Area Network (WAN), supporting hundreds of clients and tens of application servers and database servers. The following table 3 serves to define all items that appear in FIG. 7.
TABLE 3
Layer Description
ACL 1103 AMS Class Library which provides
infrastructure support for server-based
processing.
ACL Common GUI ACL classes that provide infrastructure
1002 support on a PC client.
Application 1101 The CCB server-based application software.
This layer includes the implementation of the
business objects defined in the CCB object
model designs.
Application That part of the CCB on-line application
View 1001 software that provides a user interface.
Common Domain These objects provide common classes that
Objects 1102 can be leveraged in different parts of the
application to provide support for common
services and functions.
HP C++ 1107 C++ programming language.
HP-UX 1108 UNIX for HP servers.
Iona Orbix .TM. 1005 CORBA 2.0 Object Request Broker (ORB)
Message Queuing This provides guaranteed delivery for
1106 messages sent between processes. CCB
uses a custom approach for CCB 2.0. In a
later CCB version, Arcor may wish to replace
this by a third-party product like IBM's MQ
Series.
MS Visual C++ .TM., Microsoft C++ compiler and Microsoft
MFC 1004 Foundation Class libraries.
OracleTM 1202 Oracle client and server software.
Stored Procedures Application-speciflc Oracle stored
1202 procedures
TCP/IP 1007 Network Communication protocol.
Tools 1003 On servers, this includes third-party
products. For ERP, Tools h++ in ACL are
used. For creating bills in CBM, ISIS
Papyrus is used.
Alternate Embodiments Although the current implementation runs under HP/UX operating system on the database server 1200 and the application server 1100, and under Windows NT on PC client 1000, the description of the invention is not limited to any specific technical implementation or software platform. It could run in n-tier environment or even on a mainframe. Similarly, although the database used is Oracle.TM. 1202, the invention described could be implemented on non-Oracle databases (e.g. with Sybase each rolled table could be a separately named table). The invention is not limited to the terms and examples included in this description. The approaches described for the invention describe the preferred implementation according to the inventor, but the invention second dimension partitioning can be implemented in different ways. Also, the invention is not limited to the telecommunications industry. The invention can be, applied to any other industry that requires a fast access to high volume of data at the database end, combined with the ability to do maintenance (backups, restores, etc). The present invention has been described with respect to a system which reduces access, backup, and processing time required by partitioning data in a database by partition group and then further partition each partition group by event processing date into two-dimensional partitions in the form of rolling tables, plus the use of event sequence numbers. As discussed above, although the present invention is implemented in a 3-tier client-server architecture, as would be appreciated by a person of ordinary skill in the art it could be implemented on any architecture including a mainframe. All the examples discussed above relate to large business customers. However, the present invention also may be used for small business customers or residential customers as well. In the case of small business or residential customers the telecommunication company would group customers to partitions, rather than having each residential customer (or small business customer) having their own partition. The many features and advantages of the invention are apparent from the detailed specification and, thus, it is intended by the appended claims to cover all such features and advantages of the invention which fall within the true spirit and scope of the invention. Further, since numerous modifications and changes will readily occur to those skilled in the art, it is not desired to limit the invention to the exact construction and operation illustrated and described, and accordingly all suitable modifications and equivalents may be resorted to, falling within the scope of the invention.
|
Same subclass Same class Consider this |
||||||||||
