|
|
|
Manipulating data structure (e.g., compression, compaction, compilation) |
Method and system for parallel processing of database queries6968335
Abstract
A system and methods for parallel processing of queries to one or more databases are described herein. One or more databases may be distributed among a subset of slave nodes of a global-results processing matrix. A query to the database may be generated using a query-based high-level programming language. The query-based source code then may be converted to intermediary source code in a common programming language and then compiled into a dynamic link library (DLL) or other type of executable. The DLL is then distributed among the slave nodes of the processing matrix, whereupon the slave nodes execute related portions of the DLL substantially in parallel to generate initial query results. The initial query results may then be provided to master node of the global-results processing matrix for additional processing, whereby the master node is adapted to execute one or more associated portions of the DLL on the initial query results.
Claims
1. A system for processing at least one query to a database, the system comprising:
a first processing matrix comprising:
a master node of a first type having a processor and receiving at least one executable compiled from a query; and
a first plurality of slave nodes of a first type each comprising a processor and means for storing data, each first type slave node storing a portion of the database, each first type slave node processor adapted to execute a first portion of the at least one executable on its stored database portion to generate query results, the plurality of first type slave nodes executing the first executable portion substantially in parallel to collectively generate a set of initial query results; and
a second processing matrix receiving the initial query results generated by the first processing matrix and comprising:
a second plurality of slave nodes of a second type each comprising a processor and means for storing a portion of the initial query results, each second type slave node processor adapted to execute a second portion of the at least one executable on its stored initial query results portion to generate a portion of intermediary query results, the plurality of second type slave nodes executing the second executable portion substantially in parallel to collectively generate a set of intermediary query results.
2. The system of claim 1, wherein the second processing matrix further comprises a master node of a second type comprising a processor and means for storing data, the second type master node receiving and storing the intermediary query results generated by the second plurality of slave nodes, the second type master node processor adapted to execute a third portion of the at least one executable on the intermediate query results to generate a resultant set of query results.
3. The system of claim 2, wherein the second plurality of slave nodes receives at least the second portion of the at least one executable from the second master node.
4. The system of claim 1, wherein the second processing matrix further comprises a master node of a second type comprising a processor and means for storing data, the second master node being adapted to receive at least one other executable compiled from an other query, wherein at least a portion of the at least one other executable is executed on the intermediary query results to generate a resultant set of query results.
5. The system of claim 4, wherein the at least one other executable portion is executed by the second master node.
6. The system of claim 4, wherein the at least one other executable portion is executed by the second plurality of slave nodes.
7. The system of claim 4, wherein the other query is based on a user query submission and the system generates and compiles the other query into the at least one other executable, executes the at least one other executable, generates the resultant set of query results, and presents the resultant set of query results in an essentially real-time fashion.
8. The system of claim 7, wherein the resultant set of results is presented to the user having made the query submission.
9. The system of claim 1, wherein each second type slave node stores a substantially different portion of the initial query results and the second plurality of slave nodes collectively stores substantially all of the initial query results.
10. The system of claim 1, further comprising a query server operably connected to at least one of the first processing matrix and the second processing matrix, the query server being adapted to:
generate intermediary source code from query source code, the query source code representing at least one database operation on the database and wherein the query source code is formatted based in part on a query-based programming language; and
compile the intermediary source code to generate the at least one executable.
11. The system of claim 10, wherein the intermediary source code is one of a group consisting of: C source code, C++ source code, Fortran source code; and Pascal source code.
12. The system of claim 10, further comprising a query agent operably connected to the query server and at least one of the first processing matrix and the second processing matrix, the query agent being adapted to:
submit the at least one executable to at least one of the first processing matrix and the second processing matrix for execution; and
monitor execution of the at least one executable.
13. The system of claim 12, further comprising a scheduling service module operably connected to the query agent and the query server and being adapted to:
schedule a time of execution of the at least one executable based on at least one characteristic of the at least one executable; and
direct the query agent to submit the executable to at least one of the first processing matrix and the second processing matrix at the scheduled time.
14. The system of claim 13, wherein the at least one characteristic of the at least one executable includes one of a group consisting of: an expected execution time of the at least one executable; a priority of the at least one executable; a type of the at least one database operation represented by the at least one executable; and a number of slave nodes expected to execute the at least one executable.
15. The system of claim 12, further comprising a work-unit reporting module operably connected to the query agent and being adapted to store at least a portion of the resultant query results.
16. The system of claim 10, further comprising a query builder module operably connected to the query server and being adapted to facilitate a client in generating the query source code using the query-based language.
17. The system of claim 16, wherein the query builder module includes a graphical user interface.
18. The system of claim 16, wherein the query builder module is further adapted to display at least a portion of the resultant query results.
19. The system of claim 16, wherein the query builder module is further adapted to submit a representation of the query source code to the query server over a network.
20. The system of claim 1, further comprising a staging zone adapted to distribute the database among the first plurality of slave nodes, each first type slave node receiving a substantially different database portion.
21. The system of claim 20, wherein data of the database is distributed randomly among the first type slave nodes.
22. The system of claim 20, wherein the staging zone is further adapted to distribute the database using a hash function on a record field of each record of the database.
23. The system of claim 1, wherein the at least one executable includes at least one dynamic link library (DLL).
24. The system of claim 23, wherein each slave node of the first processing matrix further includes a software agent being adapted to execute a portion of the at least one DLL, wherein a software agent at the first master node is configured to execute a second portion of the DLL.
25. The system of claim 23, wherein each slave node of the second processing matrix further includes a software agent being adapted to execute at least one portion of the at least one DLL.
26. The system of claim 1, wherein the first slave nodes are substantially homogeneous.
27. The system of claim 1, wherein the second slave nodes are substantially homogeneous.
28. The system of claim 1, wherein the first type slave nodes and the second type slave nodes are substantially homogeneous.
29. The system of claim 1, wherein the first type slave nodes, the first master node, the second type slave nodes, and the second master node are substantially homogeneous.
30. The system of claim 1, wherein the processors of one or more of the second master node, second type slave nodes, first master node, and first type slave nodes are of the type found in general purpose, single-user microcomputers.
31. The system of claim 1, wherein the at least one database operation represented by the first portion of the at least one executable includes one of a group consisting of: linking, matching, filtering, scoring, sorting, counting, and joining.
32. The system of claim 1, wherein the at least one database operation represented by the second portion of the at least one executable includes one of a group consisting of: sorting, linking, collating, filtering, counting, deduping, joining, appending, merging, purging, non-hierarchical linking, neural-net scoring, and formatting.
33. The system of claim 1, wherein the plurality of first type slave nodes includes between ten slave nodes and one thousand slave nodes.
34. The system of claim 1, wherein the plurality of first type slave nodes includes between fifty slave nodes and 500 slave nodes.
35. The system of claim 1, wherein the plurality of first type slave nodes includes between 100 slave nodes and 300 slave nodes.
36. The system of claim 1, wherein the plurality of first type slave nodes includes at least one thousand slave nodes.
37. The system of claim 1, wherein each first type slave node stores a substantially different portion of the database and the first plurality of slave nodes collectively stores substantially all of the database.
38. The system of claim 1, wherein the at least one executable represents at least one database operation.
39. The system of claim 38, wherein the first and second portions of the at least one executable both represent at least one database operation.
40. The system of claim 1, wherein the first type slave node means for storing includes memory and at least one disk storage.
41. The system of claim 40, wherein the first type slave node stores the portion of the database at least in memory.
42. The system of claim 1, wherein the second type slave node means for storing includes memory and at least one disk storage.
43. The system of claim 42, wherein the second type slave node stores the portion of the initial query results at least in disk storage.
44. A system for processing at least one query to a database, the system comprising:
a general-purpose (GP) query processing matrix comprising:
a GP master node having a processor and receiving at least one executable compiled from a query; and
a plurality of GP slave nodes each comprising a processor and means for storing data, each GP slave node storing a portion of the database, each GP slave node processor adapted to execute a first portion of the at least one executable on its stored database portion to generate query results, the plurality of GP slave nodes executing the first executable portion substantially in parallel to collectively generate a set of initial query results; and
a global-results (GR) processing matrix receiving the initial query results generated by the general-purpose processing matrix and comprising:
a plurality of GR slave nodes each comprising a processor and means for storing a portion of the initial query results, each GR slave node processor adapted to execute a second portion of the at least one executable on its stored initial query results portion to generate a portion of intermediary query results, the plurality of GR slave nodes executing the second executable portion substantially in parallel to collectively generate a set of intermediary query results; and
a GR master node comprising a processor and means for storing data.
45. The system of claim 44, wherein the GR master node receives and stores the intermediary query results generated by the plurality of GR slave nodes, the second type master node processor adapted to execute a third portion of the at least one executable on the intermediate query results to generate a resultant set of query results.
46. The system of claim 44, wherein the second plurality of slave nodes receives at least the second portion of the at least one executable from the second master node.
47. The system of claim 44, wherein the GR master node is adapted to receive at least one other executable compiled from an other query, wherein at least a portion of the at least one other executable is executed on the intermediary query results to generate a resultant set of query results.
48. The system of claim 47, wherein the at least one other executable portion is executed by the GR master node.
49. The system of claim 47, wherein the at least one other executable portion is executed by the plurality of GR slave nodes.
50. The system of claim 47, wherein the other query is based on a user query submission and the system generates and compiles the other query into the at least one other executable, executes the at least one other executable, generates the resultant set of query results, and presents the resultant set of query results in an essentially real-time fashion.
51. The system of claim 50, wherein the resultant set of results are presented to the user having made the user query submission.
52. The system of claim 44, wherein each second type slave node stores a substantially different portion of the initial query results and the second plurality of slave nodes collectively stores substantially all of the initial query results.
53. The system of claim 44, further comprising a query server operably connected to at least one of the GR processing matrix and the GP processing matrix, the query server being adapted to:
generate intermediary source code from query source code, the query source code representing at least one database operation on the database and wherein the query source code is formatted based in part on a query-based programming language; and
compile the intermediary source code to generate the at least one executable.
54. The system of claim 53, wherein the intermediary source code is one of a group consisting of: C source code, C++ source code, Fortran source code; and Pascal source code.
55. The system of claim 53, further comprising a query agent operably connected to the query server and at least one of the GR processing matrix and the GP processing matrix, the query agent being adapted to:
submit the at least one executable to at least one of the GR processing matrix and the GP processing matrix for execution; and
monitor execution of the at least one executable.
56. The system of claim 55, further comprising a scheduling service module operably connected to the query agent and the query server and being adapted to:
schedule a time of execution of the at least one executable based on at least one characteristic of the at least one executable; and
direct the query agent to submit the executable to at least one of the GR processing matrix and the GP processing matrix at the scheduled time.
57. The system of claim 56, wherein the at least one characteristic of the at least one executable includes one of a group consisting of: an expected execution time of the at least one executable; a priority of the at least one executable; a type of the at least one database operation represented by the at least one executable; and a number of slave nodes expected to execute the at least one executable.
58. The system of claim 55, further comprising a work-unit reporting module operably connected to the query agent and being adapted to store at least a portion of the resultant query results.
59. The system of claim 53, further comprising a query builder module operably connected to the query server and being adapted to facilitate a client in generating the query source code using the query-based language.
60. The system of claim 59, wherein the query builder module includes a graphical user interface.
61. The system of claim 59, wherein the query builder module is further adapted to display at least a portion of the resultant query results.
62. The system of claim 59, wherein the query builder module is further adapted to submit a representation of the query source code to the query server over a network.
63. The system of claim 44, further comprising a staging zone adapted to distribute the database among the GP slave nodes, each GP slave node receiving a substantially different database portion.
64. The system of claim 63, wherein data of the database is distributed randomly among the GP slave nodes.
65. The system of claim 63, wherein the staging zone is further adapted to distribute the database using a hash function on a record field of each record of the database.
66. The system of claim 44, wherein the at least one executable includes at least one dynamic link library (DLL).
67. The system of claim 66, wherein each slave node of the GP processing matrix further includes a software agent being adapted to execute a portion of the at least one DLL, wherein a software agent at the GP master node is configured to execute a second portion of the DLL.
68. The system of claim 66, wherein each slave node of the GR processing matrix further includes a software agent being adapted to execute at least one portion of the at least one DLL.
69. The system of claim 44, wherein the GP slave nodes are substantially homogeneous.
70. The system of claim 44, wherein the GR slave nodes are substantially homogeneous.
71. The system of claim 44, wherein the GP slave nodes and the GR slave nodes are substantially homogeneous.
72. The system of claim 44, wherein the GP slave nodes, the GP master node, the GR slave nodes, and the GR master node are substantially homogeneous.
73. The system of claim 44, wherein the processors of one or more of the GR master node, GR slave nodes, GP master node, and GP slave nodes are of the type found in general purpose, single-user microcomputers.
74. The system of claim 44, wherein the at least one database operation represented by the first portion of the at least one executable includes one of a group consisting of: linking, matching, filtering, scoring, sorting, counting, and joining.
75. The system of claim 44, wherein the at least one database operation represented by the second portion of the at least one executable includes one of a group consisting of: sorting, linking, collating, filtering, counting, deduping, joining, appending, merging, purging, non-hierarchical linking, neural-net scoring, and formatting.
76. The system of claim 44, wherein the plurality of GP slave nodes includes between ten slave nodes and one thousand slave nodes.
77. The system of claim 44, wherein the plurality of GP slave nodes includes between fifty slave nodes and 500 slave nodes.
78. The system of claim 44, wherein the plurality of GP slave nodes includes between 100 slave nodes and 300 slave nodes.
79. The system of claim 44, wherein the plurality of GP slave nodes includes at least one thousand slave nodes.
80. The system of claim 44, wherein each first type slave node stores a substantially different portion of the database and the first plurality of slave nodes collectively stores substantially all of the database.
81. The system of claim 44, wherein the at least one executable represents at least one database operation.
82. The system of claim 81, wherein the first and second portions of the at least one executable both represent at least one database operation.
83. The system of claim 44, wherein the first type slave node means for storing includes memory and at least one disk storage.
84. The system of claim 83, wherein the first type slave node stores the portion of the database at least in memory.
85. The system of claim 44, wherein the second type slave node means for storing includes memory and at least one disk storage.
86. The system of claim 85, wherein the second type slave node stores the portion of the initial query results at least in disk storage.
87. A system for processing at least one query to a database, the system comprising:
a general-purpose (GP) query processing matrix comprising:
a GP master node comprising a processor, memory, disk storage, and a network interface, the GP master node receiving at least one executable compiled from a query; and
a plurality of GP slave nodes each comprising a processor, memory, disk storage, and a network interface, each GP slave node storing a portion of the database, each GP slave node processor adapted to execute a first portion of the at least one executable using the stored database portion to generate query results, the first portion of the at least one executable representing at least one database operation, the GP slave node executing the first executable portion substantially in parallel with other GP slave nodes to collectively generate a set of initial query results; and
a global-results (GR) processing matrix receiving the initial query results generated by the general-purpose processing matrix and comprising:
a plurality of GR slave nodes each comprising a processor, memory, disk storage, and a network interface, each GR slave node storing a portion of the initial query results, each GR slave node processor adapted to execute a second portion of the at least one executable on the stored initial query results portion to generate a portion of intermediary query results, the second portion of the at least one executable representing at least one database operation, the GR slave node executing the second executable portion substantially in parallel with other GR slave nodes to collectively generate a set of intermediary query results; and
a GR master node comprising a processor, memory, disk storage, and a network interface, the GR master node receiving and storing the intermediary query results generated by the GR slave nodes, the GR master node processor adapted to execute a third portion of the at least one executable on the intermediate query results to generate a refined set of query results, the third portion of the at least one executable representing at least one database operation.
88. The system of claim 87, wherein the GP slave node stores the portion of the database at least in memory.
89. The system of claim 87, wherein the GR slave node stores the portion of the initial query results at least in disk storage.
90. The system of claim 87, wherein the GR master node stores the intermediary query results at least in disk storage.
91. A system for processing at least one query to a database, the system comprising:
a query server for receiving and processing a query;
a general-purpose (GP) query processing matrix having a plurality of processing nodes comprising:
a GP master node having a processor and at least one data storage device, the GP master node being adapted to receive at least one executable compiled from a query;
a plurality of GP slave nodes operably connected to the GP master node, wherein each GP slave node is adapted to:
store a substantially different portion of the database in memory at the slave node; and
execute a first portion of at least one executable using the stored database portion to generate query results, the first portion of the at least one executable representing at least one database operation to be performed on the stored database portion; and
wherein the plurality of GP slave nodes collectively generates a set of initial query results; and
a global-results (GR) processing matrix operably connected to the query server and including:
a plurality of GR slave nodes, each GR slave node being adapted to:
store a substantially different portion of the initial query results on disk storage of the slave node; and
execute a second portion of the at least one executable using the stored initial query results portion to generate a portion of intermediary query results, wherein the plurality of GR slave nodes execute the second portion substantially in parallel to generate a set of intermediary results; and
a master node having a processor and at least one storage device and being operably connected to the plurality of slave nodes.
92. The system of claim 91, wherein the GR master node receives and stores the intermediary query results generated by the plurality of GR slave nodes, the GR master node processor being adapted to execute a third portion of the at least one executable on the intermediate query results to generate a resultant set of query results.
93. The system of claim 92, wherein the plurality of GR slave nodes receives at least the second portion of the at least one executable from the GR master node.
94. The system of claim 91, wherein the GR master node is adapted to receive at least one other executable compiled from an other query, wherein at least a portion of the at least one other executable is executed on the intermediary query results to generate a resultant set of query results.
95. The system of claim 94, wherein the at least one other executable portion is executed by the GR master node.
96. The system of claim 94, wherein the at least one other executable portion is executed by the plurality of GR slave nodes.
97. The system of claim 94, wherein the other query is based on a user query submission and the system generates and compiles the other query into the at least one other executable, executes the at least one other executable, generates the resultant set of query results, and presents the resultant set of query results in an essentially real-time fashion.
98. The system of claim 97, wherein the resultant set of results is presented to the user having made the query submission.
99. A system for processing at least one query to at least one database, the system comprising:
a query server being adapted to:
generate intermediary source code from query source code, the query source code representing least one database operation using the database and wherein the query source code is formatted based in part on a query-based programming language; and
compile the intermediary source code to generate at least one executable;
a general-purpose query processing matrix operably connected to the query server and including:
a plurality of slave nodes, each slave node storing in memory a different portion of the database and being adapted to execute a first portion of the at least one executable using the stored database portion to generate a portion of initial query results, the first portion of the at least one executable representing at least one database operation on the stored database portion, the first portion being executed by the slave node substantially in parallel with an execution of the first portion by other slave nodes;
at least one level of collator nodes, each collator node of each level storing in memory query results from a lower level;
a global-results processing matrix including:
a plurality of slave nodes, each slave node storing on disk storage a different portion of the database and being adapted to execute a first portion of the at least one executable using the stored database portion to generate a portion of initial query results, the first portion of the at least one executable representing at least one database operation on the stored database portion, the first portion being executed by the slave node substantially in parallel with an execution of the first portion by other slave nodes; and
a master node operably connected to the plurality of slave nodes and being adapted to:
store the initial query results on the disk storage of the master node; and
execute a second portion of the at least one executable on the stored initial query results to generate final query results, the second portion of the at least one executable representing at least one database operation on the initial query results.
100. A method for processing at least one query to a database distributed among a plurality of slave nodes, each slave node storing a substantially distinct database portion on disk storage, the method comprising the steps of:
receiving a query in a query-based language source code and compiling at least one executable from the query source code;
executing, at each slave node, a first portion of the at least one executable using the stored database portion to generate a portion of initial query results, the first portion of the at least one executable representing at least one database operation on the stored database portion, the first portion being executed by the slave node substantially in parallel with an execution of the first portion by other slave nodes;
storing the initial query results to disk storage of a master node; and
executing, at the master node, a second portion of the at least one executable using the stored initial query results to generate resultant query results, the second portion of the at least one executable representing at least one database operation on the initial query results.
101. The method of claim 100, further comprising the steps of:
generating intermediary source code from query source code, the query source code representing least one database operation using the database and wherein the query source code is formatted based in part on a query-based programming language; and
compiling the intermediary source code to generate the at least one executable.
102. In a global-results processing matrix including a master node and a plurality of slave nodes, each of the master node and slave nodes including a processor and disk storage, a computer readable medium, the computer readable medium comprising:
a first set of executable instructions being adapted to manipulate the processor of each slave node to execute a first portion of the at least one executable using the a portion of a database stored at the disk storage of the slave node to generate a portion of initial query results, the first portion of the at least one executable representing at least one database operation on the stored database portion, the first portion being executed by the slave node substantially in parallel with an execution of the first portion by other slave nodes; and
a second set of executable instructions being adapted to manipulate the processor of the master node to execute a second portion of the at least one executable using the initial query results to generate final query results, the second portion of the at least one executable representing at least one database operation using the initial query results.
103. The computer readable medium of claim 102, further comprising a third set of executable instructions being adapted to manipulate a processor to:
generate intermediary source code from query source code, the query source code representing least one database operation using the database and wherein the query source code is formatted based in part on a query-based programming language;
compile the intermediary source code to generate the at least one executable; and
provide the at least one executable to the master node.
104. The computer readable medium of claim 102, wherein the at least one executable includes at least one dynamic link library (DLL).
Description
FIELD OF THE INVENTION
The present invention relates generally to database management and more particularly to parallel processing of database queries in a parallel processing system.
BACKGROUND OF THE INVENTION
The rapid increase in the amount of data generated by companies, agencies, and other organizations has taxed the capabilities of current relational database management systems (RDMSs). To illustrate, some organizations have access to databases having hundreds of millions, and even billions, of records available through a RDMS. In such RDMSs, certain database operations (e.g., database joins, complex searches, extract-transform-load (ETL) operations, etc.) can take minutes, hours, and even days to process using current techniques. This processing lag often prevents access to the data in a timely manner, thereby inhibiting the client in its use of the requested information.
In response to the increasing lag time resulting from increased database sizes, software manufacturers and data mining/storage companies have strived to create more efficient RDMSs and data query techniques. In particular, a number of database management systems have been developed to implement parallel processing for performing database management and database operations.
A typical parallel-processing RDMS implementation includes using a symmetric multiprocessing (SMP) system for database operations. In general, SMP systems incorporate a number of processors sharing one or more system resources, such as memory or disk storage. The data representing the database(s) is stored in the memory and/or disk storage shared by the processors. Each processor is provided a copy of the database operation to be performed and executes the database operation on the data in parallel with the other processors.
While SMP systems have the potential to improve the efficiency of database operations on large databases by removing the processor as the bottleneck, current implementations have a number of limitations. For one, the shared memory/disk storage often becomes the limiting factor as a number of processors attempt to access the shared memory/disk storage at the same time. Simultaneous memory/disk storage accesses in such systems typically result in the placement of one or more of the processors in a wait state until the memory/disk storage is available. This delay often reduces or eliminates the benefit achieved through the parallelization of the database operation. Further, the shared memory/disk storage can limit the scalability of the SMP system, where many such systems are limited to eight processors or less.
Another limitation common to SMP database systems is the cost of implementation. SMP systems, as a result the underlying architecture needed to connect multiple processors to shared resources, are difficult to develop and manufacture, and are, therefore, often prohibitively expensive. In many cases, the SMP database systems implement a proprietary SMP design, requiring the client of the SMP database system to contract with an expensive specialist to repair and maintain the system. The development of operating system software and other software for use in the SMP database system is also often complex and expensive to develop.
The performance of parallel processing database systems, SMP or otherwise, is often limited by the underlying software process used to perform the database operation. In general, current parallel-processing database systems implement one or more interpreted database-enabled programming languages, such as Simple Query Language (SQL), Perl, Python and the like. In these systems, the database operation is constructed as one or more instructions in the interpreted programming language and the set of instructions are submitted to the SMP system. The SMP system, in turn, typically provides one or more of the instructions to each of the processors. Each processor implements an interpreter to interpret each instruction and generate the corresponding machine-level code. Instruction sets constructed using an interpreted language typically are transformed into a parse tree. The interpreter (executed by the processor) then "walks-down" the parse tree and, at each node, instructs the processor to execute a predefined library code segment associated with the syntax at the node.
It will be appreciated by those skilled in the art that the use of an interpreted language is inherently inefficient from a processing standpoint. For one, the step of interpreting and then executing a predefined library code segment at run-time often requires considerable processing effort and, therefore, reduces overall efficiency. Secondly, interpreters often use a predetermined machine-level code sequence for each instruction, thereby limiting the ability to optimize the code on an instruction-by-instruction basis. Thirdly, because interpreters consider only one node (and its related child nodes) at a time, interpreters typically are unable to globally optimize the database operation by evaluating the instructions of the database operation as a whole.
Current techniques for data storage in conventional parallel-processing database systems also exhibit a number of limitations. As noted above, current parallel-processing database systems often implement shared storage resources, such as memory or disk storage, which result in bottlenecks when processors attempt to access the shared storage resources simultaneously. To limit the effects of shared storage, some current parallel-processing systems distribute the data of the database to multiple storage devices, which then may be associated with one or more processing nodes of the database system. These implementations, however, often have an inefficient or ineffective mechanism for failure protection when one or more of the storage devices fail. When a failure occurs, the storage device would have to be reinitialized and then repopulated with data, delaying the completion of the database operation. Additionally, the data may be inefficiently distributed among the storage devices, resulting in data spillover or a lack of proper load-balancing among the processing nodes.
Accordingly, improved systems and techniques for database management and access would be advantageous.
SUMMARY OF THE INVENTION
The present invention mitigates or solves the above-identified limitations in known solutions, as well as other unspecified deficiencies in known solutions. A number of advantages associated with the present invention are readily evident to those skilled in the art, including economy of design and resources, transparent operation, cost savings, etc.
The present invention provides a number of systems and methods for efficiently processing database operations on a relatively large database. In at least one embodiment, a database management system including one or more query servers, one or more query agents, and a computing matrix are used to process one or more queries submitted by a client. The computing matrix may comprise one or more of a global-results processing matrix, a general-purpose query processing matrix, and an index-based query processing matrix. Each processing matrix may comprise a plurality of interconnected processing nodes, at least a portion of which are adapted to process in parallel. In at least one embodiment, each of the processing nodes is a "shared nothing" processing node having a separate processor, memory, disc storage(s), and network interface. Further, in one embodiment, each processing node includes components from widely-available general-purpose, single-user microcomputers, such as a microcomputer motherboard, processor, random access memory (RAM), hard drive, network interface card (NIC), and the like.
The client preferably provides a set of query-based programming instructions representative of the desired query. The query server then may be adapted to convert the query-based programming instructions to source code in a high-level programming language (e.g., C++), which the query server may then optimize for more efficient execution. The query server then compiles the source code to generate one or more executables in machine-level code, such as a dynamic link library (DLL) or a fully-linked "program."
After generating the executable, the query server can provide the executable(s) to the query agent. In the event that the database operation(s) represented by the executable are not relatively processor-intensive, the query agent can be adapted to execute the executable(s) itself. Alternatively, or in addition, the query agent can provide the executable to one or more of the processing matrices of the computing matrix for processing. Upon receipt of the executable at a processing matrix, a subset of the processing nodes of the processing matrix execute one or more portions of the executable in parallel on the portion of the database at each processing node. The results of the execution may then be returned to the client, stored, or provided to another processing matrix for additional processing.
Also disclosed are a system and method for failure recovery in a multiple processing node system in accordance with at least one embodiment of the present invention. Each node can be adapted to store a backup copy of its database portion and/or results to disk storage or memory of at least one other node. In the event of a failure of a node, the replacement node can be adapted to transfer or copy the backup copy of the database portion of the failed node from the failed node's neighbors to the replacement node's disk storage or memory in between database operations. Before the transfer or copy of the backup copy is completed, the replacement node can be adapted to perform database operations in part on the portion of the backup copy the replacement node has already received and in part on the backup copy stored at the neighboring node(s).
In accordance with one embodiment of the present invention, a system for processing at least one query to a database is provided. The system comprises a a first processing matrix comprising a master node of a first type having a processor and receiving at least one executable compiled from a query and a first plurality of slave nodes of a first type each comprising a processor and means for storing data, each first type slave node storing a portion of the database, each first type slave node processor adapted to execute a first portion of the at least one executable on its stored database portion to generate query results, the plurality of first type slave nodes executing the first executable portion substantially in parallel to collectively generate a set of initial query results. The system further comprises a second processing matrix receiving the initial query results generated by the first processing matrix, the second processing matrix comprising a second plurality of slave nodes of a second type each comprising a processor and means for storing a portion of the initial query results, each second type slave node processor adapted to execute a second portion of the at least one executable on its stored initial query results portion to generate a portion of intermediary query results, the plurality of second type slave nodes executing the second executable portion substantially in parallel to collectively generate a set of intermediary query results.
In accordance with another embodiment of the present invention, a system for processing at least one query to a database is provided. The system comprises a general-purpose (GP) query processing matrix comprising a GP master node having a processor and receiving at least one executable compiled from a query and a plurality of GP slave nodes each comprising a processor and means for storing data, each GP slave node storing a portion of the database, each GP slave node processor adapted to execute a first portion of the at least one executable on its stored database portion to generate query results, the plurality of GP slave nodes executing the first executable portion substantially in parallel to collectively generate a set of initial query results. The system further comprises a global-results (GR) processing matrix receiving the initial query results generated by the general-purpose processing matrix. The GR processing matrix comprises a plurality of GR slave nodes each comprising a processor and means for storing a portion of the initial query results, each GR slave node processor adapted to execute a second portion of the at least one executable on its stored initial query results portion to generate a portion of intermediary query results, the plurality of GR slave nodes executing the second executable portion substantially in parallel to collectively generate a set of intermediary query results. The GR processing matrix further comprises a GR master node comprising a processor and means for storing data.
In accordance with yet another embodiment of the present invention, a system for processing at least one query to a database is provided. The system comprises a general-purpose (GP) query processing matrix having a GP master node comprising a processor, memory, disk storage, and a network interface, the GP master node receiving at least one executable compiled from a query and a plurality of GP slave nodes each comprising a processor, memory, disk storage, and a network interface, each GP slave node storing a portion of the database, each GP slave node processor adapted to execute a first portion of the at least one executable using the stored database portion to generate query results, the first portion of the at least one executable representing at least one database operation, the GP slave node executing the first executable portion substantially in parallel with other GP slave nodes to collectively generate a set of initial query results. The system further comprises a global-results (GR) processing matrix receiving the initial query results generated by the general-purpose processing matrix and having a plurality of GR slave nodes each comprising a processor, memory, disk storage, and a network interface, each GR slave node storing a portion of the initial query results, each GR slave node processor adapted to execute a second portion of the at least one executable on the stored initial query results portion to generate a portion of intermediary query results, the second portion of the at least one executable representing at least one database operation, the GR slave node executing the second executable portion substantially in parallel with other GR slave nodes to collectively generate a set of intermediary query results, the GR processing matrix further includes a GR master node comprising a processor, memory, disk storage, and a network interface, the GR master node receiving and storing the intermediary query results generated by the GR slave nodes, the GR master node processor adapted to execute a third portion of the at least one executable on the intermediate query results to generate a refined set of query results, the third portion of the at least one executable representing at least one database operation.
In accordance with an additional embodiment of the present invention, a system for processing at least one query to a database is provided. The system comprises a query server for receiving and processing a query and a general-purpose (GP) query processing matrix having a plurality of processing nodes including a GP master node having a processor and at least one data storage device, the GP master node being adapted to receive at least one executable compiled from a query. The GP query processing matrix further includes a plurality of GP slave nodes operably connected to the GP master node, wherein each GP slave node is adapted to store a substantially different portion of the database in memory at the slave node and execute a first portion of at least one executable using the stored database portion to generate query results, the first portion of the at least one executable representing at least one database operation to be performed on the stored database portion. The plurality of GP slave nodes collectively generates a set of initial query results. The system further comprises a global-results (GR) processing matrix operably connected to the query server and including a plurality of GR slave nodes, each GR slave node being adapted to store a substantially different portion of the initial query results on disk storage of the slave node and execute a second portion of the at least one executable using the stored initial query results portion to generate a portion of intermediary query results, wherein the plurality of GR slave nodes execute the second portion substantially in parallel to generate a set of intermediary results. The GR processing matrix further comprises a master node having a processor and at least one storage device and being operably connected to the plurality of slave nodes.
In accordance with another embodiment of the present invention, a system for processing at least one query to at least one database is provided. The system comprises a query server being adapted to generate intermediary source code from query source code, the query source code representing least one database operation using the database and wherein the query source code is formatted based in part on a query-based programming language and compile the intermediary source code to generate at least one executable. The system further comprises a general-purpose query processing matrix operably connected to the query server and including a plurality of slave nodes, each slave node storing in memory a different portion of the database and being adapted to execute a first portion of the at least one executable using the stored database portion to generate a portion of initial query results, the first portion of the at least one executable representing at least one database operation on the stored database portion, the first portion being executed by the slave node substantially in parallel with an execution of the first portion by other slave nodes. The general-purpose query processing matrix further comprising at least one level of collator nodes, each collator node of each level storing in memory query results from a lower level. The system further comprises a global-results processing matrix including a plurality of slave nodes, each slave node storing on disk storage a different portion of the database and being adapted to execute a first portion of the at least one executable using the stored database portion to generate a portion of initial query results, the first portion of the at least one executable representing at least one database operation on the stored database portion, the first portion being executed by the slave node substantially in parallel with an execution of the first portion by other slave nodes. The global-results processing matrix further includes a master node operably connected to the plurality of slave nodes and being adapted to store the initial query results on the disk storage of the master node and execute a second portion of the at least one executable on the stored initial query results to generate final query results, the second portion of the at least one executable representing at least one database operation on the initial query results.
In accordance with yet another embodiment of the present invention, a method is provided for processing at least one query to a database distributed among a plurality of slave nodes, each slave node storing a substantially distinct database portion on disk storage. The method comprises the steps of receiving a query in a query-based language source code and compiling at least one executable from the query source code and executing, at each slave node, a first portion of the at least one executable using the stored database portion to generate a portion of initial query results, the first portion of the at least one executable representing at least one database operation on the stored database portion, the first portion being executed by the slave node substantially in parallel with an execution of the first portion by other slave nodes. The method further comprises the steps of storing the initial query results to disk storage of a master node and executing, at the master node, a second portion of the at least one executable using the stored initial query results to generate resultant query results, the second portion of the at least one executable representing at least one database operation on the initial query results.
In a global-results processing matrix including a master node and a plurality of slave nodes, each of the master node and slave nodes including a processor and disk storage, a computer readable medium is provided in accordance with an additional embodiment of the present invention. The computer readable medium comprises a first set of executable instructions being adapted to manipulate the processor of each slave node to execute a first portion of the at least one executable using the a portion of a database stored at the disk storage of the slave node to generate a portion of initial query results, the first portion of the at least one executable representing at least one database operation on the stored database portion, the first portion being executed by the slave node substantially in parallel with an execution of the first portion by other slave nodes. The computer readable medium further comprising a second set of executable instructions being adapted to manipulate the processor of the master node to execute a second portion of the at least one executable using the initial query results to generate final query results, the second portion of the at least one executable representing at least one database operation using the initial query results.
BRIEF DESCRIPTION OF THE DRAWINGS
The purpose and advantages of the present invention will be apparent to those of ordinary skill in the art from the following detailed description in conjunction with the appended drawings in which like reference characters are used to indicate like elements, and in which:
FIG. 1 is a schematic diagram illustrating an exemplary parallel-processing database management system in accordance with at least one embodiment of the present invention.
FIG. 2 is a schematic diagram illustrating an exemplary system for monitoring a work state of the system of FIG. 1 in accordance with at least one embodiment of the present invention.
FIG. 3 is a flow diagram illustrating an exemplary method for performing one or more database operations using the system of FIG. 1 in accordance with at least one embodiment of the present invention.
FIG. 4 is a flow diagram illustrating an exemplary method for generating a compiled executable from a set of query-based language instructions in accordance with at least one embodiment of the present invention.
FIG. 5 is a flow diagram illustrating an exemplary method for generating a second compiled executable from a first executable having at least one embedded query-based language instruction in accordance with at least one embodiment of the present invention.
FIG. 6 is a block diagram illustrating an exemplary graphical client interface for creating a query from a query-based programming language in accordance with at least one embodiment of the present invention.
FIGS. 7A and 7B are schematic diagrams illustrating an exemplary general-purpose query processing matrix of the system of FIG. 1 in accordance with at least one embodiment of the present invention.
FIG. 8 is a flow diagram illustrating an exemplary operation of the general-purpose query processing matrix of FIGS. 7A and 7B in accordance with at least one embodiment of the present invention.
FIGS. 9A and 9B are schematic diagrams illustrating an exemplary global-results processing matrix of the system of FIG. 1 in accordance with at least one embodiment of the present invention.
FIGS. 10A and 10B are flow diagram illustrating exemplary operations of the global-results processing matrix of the system of FIG. 9 in accordance with at least one embodiment of the present invention.
FIGS. 11A and 11B are flow diagrams illustrating exemplary methods for sorting data across multiple nodes of the global-results processing matrix of FIG. 9 in accordance with at least one embodiment of the present invention.
FIG. 12 is a schematic diagram illustrating an exemplary implementation of a homogeneous agent at each node of a processing matrix for executing at least part of an executable.
FIGS. 13A and 13B are schematic diagrams illustrating an exemplary system for providing failover protection in the system of FIG. 1 in accordance with at least one embodiment of the present invention.
FIG. 14 is a schematic diagram illustrating an exemplary system for distributing database data within the system of FIG. 1 in accordance with at least one embodiment of the present invention.
FIG. 15 is a flow diagram illustrating an exemplary method for distributing database data using the system of FIG. 14 in accordance with at least one embodiment of the present invention.
FIG. 16 is a schematic diagram illustrating an exemplary hardware architecture for the system of FIG. 1 in accordance with at least one embodiment of the present invention.
FIG. 17 is a flow diagram illustrating an exemplary method for configuring the system of FIG. 1 using the hardware architecture of FIG. 16 in accordance with at least one embodiment of the present invention.
DETAILED DESCRIPTION OF THE INVENTION
The following description is intended to convey a thorough understanding of the present invention by providing a number of specific embodiments and details involving parallel processing of database queries. It is understood, however, that the present invention is not limited to these specific embodiments and details, which are exemplary only. It is further understood that one possessing ordinary skill in the art, in light of known systems and methods, would appreciate the use of the invention for its intended purposes and benefits in any number of alternative embodiments, depending upon specific design and other needs.
A processor is generally understood in the art to include any of a variety of digital circuit devices adapted to manipulate data or other information by performing one or more tasks embodied as one or more sets of instructions executable by the digital circuit device. Processors typically include some form of an arithmetic logical unit (ALU) adapted to perform arithmetic and/or logical functions, internal memory resources such as registers, cache, on-chip random access memory (RAM) or read only memory (ROM), and the like, and a control unit adapted to load instructions and/or data from external memory and/or the internal memory resources and execute the instructions using the ALU and other processor resources as appropriate. Processors can be adapted for general processing, such as a central processing unit (CPU) of a personal computer, or processors can be adapted to perform more specific functions, such as a digital signal processor (DSP) used in, for example, cellular phones. Examples of processors include microprocessors (also known as central processing units or CPUs), microcontrollers, and the like. An exemplary general-purpose processor suitable for use in at least one embodiment of the present invention includes the Pentium® III processor operating at, for example, 1.26 gigahertz (GHz) available from Intel Corporation of Santa Clara, Calif.
A database generally is understood in the art to include one or more data sets arranged in any of a variety of ways known to those skilled in the art, such as one or more tables having one more records. A database operation generally includes any primitive transform supported at the database layer, such as a sort operation, a join operation, a select operation, and the like. A database operation may be viewed as analogous to a single instruction in SQL. For example, the "SELECT" instruction in SQL represents a database operation whereby data in the target database meeting the criteria specified in the "SELECT" SQL command is located and output to the client in the specified format. In this case, the "SELECT" command represents a database operation. By extension, a query typically includes a sequence of one or more database operations intended to provide a desired result using the data of the a data dictionary and/or one or more databases.
Referring now to FIG. 1, an exemplary database management system 100 for processing queries to one or more databases is illustrated in accordance with at least one embodiment of the present invention. In the illustrated example, the system 100 includes a query server 102, a query agent 104, a query builder module 106, a repository 110, a naming services module 112, a scheduling services module 114, and a computing matrix 116. The computing matrix 116 can comprise one or more parallel-processing matrices, such as a global-results processing matrix 118, a general-purpose query processing matrix 120, an index-based query processing matrix 122, and the like. Although the illustrated exemplary embodiment includes one of each type of processing matrices 118-122, any number and/or combination of processing matrices may be implemented in accordance with at least one embodiment of the present invention.
In at least one embodiment, the system 100 is adapted to receive and process one or more queries received from one or more clients. Queries submitted by clients can include, for example, linking, matching, filtering, scoring, simple searching, neural net scoring, data sorting, merge operations, purge operations, heuristic propensity scoring, data formatting, extract-transform-load (ETL) operations, and the like. Queries submitted by a client to the query server 102 preferably are formatted using a query programming language having specified syntax and structure, similar to high-level programming languages such as C++. This programming language, referred to herein as Enterprise Control Language (ECL), can include actions (also referred to as "functions"), constants, variables, expressions and operations, keywords, workflow services, and the like. To illustrate, to generate a list of people sorted by age, the simple query formatted in ECL as "T:=SORT(Person, Person.age)" could be generated, where the attribute "T" represents the resulting record set of people sorted by age, "SORT" represents the sorting function, "Person" represents the record set of people, and "Person.age" represents the attribute defining the age field of each "Person" entry of the record set "Person". In other embodiments, the query can be described using any of a variety of techniques and/or programming languages as appropriate. For example, rather than using the ECL language, a client could generate a query using SQL or Perl and submit the SQLPerl query to the query server 102 for processing.
In at least one embodiment, the query builder module 106 is adapted to facilitate the client in generating queries. The query builder module 106 can include software executed on, for example, client computer 108 and can implement a graphical client interface (GUI) to receive client input. To illustrate, the query builder module 106 could include software adapted to receive command-line input in the format of the ECL language or other appropriate programming language. Alternatively, the query builder module 106 could include a GUI used by the client to enter one or multiple lines of ECL language or other query-based language representing one or more queries. In another embodiment, the query builder module includes an XML template generated by the query server 102 and displayed on, for example, a web browser at the client computer 108. Using this displayed template, a client may input one or more queries in the input fields provided.
Regardless of the technique used to input a desired query to the query builder module 106, the query builder module 106 is adapted to generate a representation of the query (query representation 132) and provide the representation to the query server 102. The query representation 132 can take any of a variety of forms. As noted above, in one embodiment the query builder module 106 is implemented as an XML web page, whereby the client can submit queries to the query server 102 via a network, such as the Internet. In this case, the query builder module 106 could receive the query input from the client, generate a hypertext markup language (HTML) or extensible markup language (XML) document representing the query input, and transmit the document to the query server 102 for processing using, for example, the Simple Object Access Protocol (SOAP). Alternatively, the query builder module 106 could include a stand-alone software program or integrated utility executed by the client computer 108, whereby the query provided from a client is transmitted to the query server 102. For example, the query may be transmitted as a text file having the set of high-level programming language instructions representative of the query (one embodiment of the query representation 132). Various implementations of the query builder module 106 are discussed below with reference to FIG. 6.
Upon receipt of the query representation 132 from the query builder 106, the query server 102, in one embodiment, is adapted to convert the query representation 132 into intermediary source code, such as source code segment structured in C, C++, Fortran, Pascal, and the like. The query server 102 then compiles the intermediary source code to generate one or more executables (i.e., the executable machine code representation of the source code). The executable(s) preferably include dynamically-linked executables, such as dynamic link libraries (DLLs), parts or all of which can be executed dynamically by another executable (such as a homogenous agent, discussed below). Alternatively, the executable(s) could include a fully linked executable or a shared library. For purposes of explanation, a particular implementation of the executable as a DLL is described herein. The generation of one or more executables for execution by the computing matrix 116 is discussed in greater detail below with reference to FIGS. 3-5. For explanatory purposes, an exemplary implementation wherein a single DLL representing an entire query is generated and processed by the system 100 is illustrated herein. Using the guidelines provided herein, those skilled in the art can adapt the system 100 for generation and processing of multiple DLLs or other types of executables for a single submitted query.
In the course of generating a DLL, the query server 102 may utilize one or both of the repository 10 and the naming services module 112. As discussed in greater detail herein, an ECL-based query submitted by the query builder 106 may include one or more attributes, where attributes can include client- or system-defined values, actions, expressions, and the like. Attributes also may be nested. To illustrate, consider the following ECL coding sequence for determining those people represented in a financial record set that have more than five credit accounts: In the first line, the attribute "CountTrades" implements the action "COUNT" and is defined as a total number of credit accounts (i.e., "Trades") associated with a record entry. In the second line, the attribute "IsBigSpender" implements a boolean expression and the "CountTrades" attribute and is defined as all entries of a record set having more than five credit accounts. In the third line, the "OUTPUT" action is used to output the last names of those entries of the record set "Person" having more than five credit accounts.
In the course of creating the ECL-based, attributes defined in the submitted query can be added to the repository 110. During the compilation of an ECL-based query into a DLL, the query server 102 can access the definitions of those attributes included in the ECL-based query from the repository 110. The repository 110 therefore can be viewed as a database or library of attributes used by clients to generate ECL queries and by the query server 102 in the generation of the corresponding DLL.
The repository 110 can be implemented in any of a variety of ways. The repository 110 could include a file server for a plurality of files, each file having the definition of one or more attributes. Preferably, however, the repository 110 is implemented as a structured query language (SQL) or an XML query language (XQL) database server, such as the Adaptive Server Enterprise available from Sybase, Inc. of Dublin, Calif.
Domain Name Service (DNS) often is used to translate domain names into Internet Protocol addresses for the corresponding network devices. In a similar manner, the naming services module 112 is adapted to translate the names of various data sets or databases referenced in a query into the actual location of the referenced name. To illustrate using the previous exemplary ECL code sequence, the query server 102 could submit the name "Persons" representative of the "persons" data set to the naming services module 112. The naming services module 112 could search its database for the physical location of the data set (e.g., a file located at "datasetspersons.sql") corresponding to the name "Persons" and return this file location to the query server 102. The query server 102 then can incorporate the location into the DLL compiled from the submitted query. Alternatively, as discussed in greater detail below, the compiled DLL can include a generic reference that the naming services module 112 resolves at runtime when the DLL is executed by one or more of the processing matrices 118-122. As with the repository 110, the naming services module 112 can be implemented in any of a variety of ways, preferably as a SQL or XQL database server.
In at least one embodiment, the system 100 includes a plurality of query servers 102 and/or a plurality of query agents 104 to process multiple queries. The scheduling services module 114, in one embodiment, is adapted to prevent one or more queries (represented by DLLs) from being submitted to one or more components of the computing matrix 116 while those components are occupied processing another database operation. Accordingly, the query server 102 can be adapted to submit a scheduling request to the scheduling services module 114 after generating a DLL representing a submitted query. The scheduling request can include an estimated execution time of the DLL in whole or in part, a priority indicator, an indicator of the number and/or type(s) of processing matrices needed to process the DLL, and the like. After submitting the scheduling request, the query server 102 may then submit the DLL (DLL 150) to the query agent 104 for processing.
Using the submission request information, the scheduling services module 114 determines the next available time that the query can be processed and generates a token associated with the scheduling request. The token is provided to the query agent 104 having the corresponding DLL 150, either directly or via the query server 102. The query agent 104 then informs the scheduling services module 114 that it has received the token and requests that the scheduling services module 114 notify the query agent 104 when it has permission to proceed. At the designated time, the scheduling services module 114 notifies the query agent 104 to proceed with the submission of the DLL 150 to the computing matrix 116. In the event that the processing of a previously submitted DLL is running ahead of or behind schedule, the scheduling services module 114 can adjust the submission time of the next DLL accordingly.
In at least one embodiment, the computing matrix 116 includes one or more types of parallel-processing processing matrices adapted to perform various database operations. In the illustrated embodiment, the computing matrix 116 is shown having three processing matrices (or sub-matrices): a general-purpose query processing matrix 120 adapted to perform database operations on preferably hierarchical data, an index-based query processing matrix 122 customized for index-based queries, and a global-results processing matrix 118 adapted to perform various operations on a large amount of data, such as sorting, collating, counting, duplicate record resolution (i.e., "deduping"), joining, appending, merging, purging, non-hierarchical linking, formatting, and the like. The processing matrices 118-122 are discussed in greater detail with reference to FIGS. 7-17. Although a particular configuration of processing matrices is illustrated, the computing matrix 116 can include any number and combination of processing matrices 118-122 as appropriate without departing from the spirit or the scope of the present invention.
Depending on the particular query, the query agent 104 can provide the DLL 150 to a specific type of processing matrix or the query agent 104 can use multiple processing matrix types in sequence or in parallel to process the query represented by the DLL 150. To illustrate, consider a query to a state's motor vehicle registration database resulting in a list of all registered drivers who own a black automobile, sorted by last name. This query requires at least two operations: identifying the registered drivers who own a black car in the database and sorting the identified registered drivers by last name. Since the general-purpose query processing matrix 120, in one embodiment, is particularly well suited for identification analysis, the query agent 104 can direct the general-purpose query processing matrix 120 to perform the identification operation of the DLL 150 and to provide the results to the global-results processing matrix 118. The query agent 104 then can direct the global-results processing matrix 118 to perform the sorting operation of the DLL 150 on the results generated by the general-purpose query processing matrix 120. Alternatively, two DLLs could be generated, one representing the identification operation and one representing the sorting operation, the former assigned to the general-purpose query processing matrix 120 and the latter assigned to the global-results processing matrix 118. The results (i.e., the sorted list) from the global-results processing matrix 118 then can be provided back to the query agent 104 for storage and/or delivery to the client via, for example, the query builder module 106. In a similar manner, the results from an operation performed by the index-based processing matrix 122 can be provided to the global-results processing matrix 118 for additional processing.
In some instances, the query agent 104 can be adapted to process the DLL 150 in whole or in part prior to or after receiving permission from the scheduling services module 114. The processing performed by the query agent 104 using the DLL 150, in at least one embodiment, is dependent on the type of query represented by the DLL. For relatively simple queries involving a few database operations on a relatively small data set, the query agent 104 can be adapted execute the DLL 150 itself. For more complex queries, the query agent 104 is adapted to submit the DLL 150 or some derivative of the DLL 150 to one or more of the processing matrices 118-122 of the computing matrix 116 for processing. The query agent 104 also can be adapted to report various events to the scheduling services module 114, such as time of submission of the DLL 150, status of the processing of the DLL 150, time of completion, errors, and the like.
The query agent 104 can submit the DLL 150 to the processing matrices 118-122 of the computing matrix 116 in a variety of ways. For queries involving the global-results processing matrix 118, the query agent 104 can provide the DLL 150 directly to the processing matrix 118. In at least one embodiment, however, the general-purpose query processing matrix 120 and the index-based query processing matrix 122, are adapted simulate the operation of, for example, a SQL server wherein the query agent 104 submits an SQL or XQL query to one or both of the processing matrices 120, 122 for execution. The SQL/XQL query can be embedded in the DLL 150 by the query server 102, extracted by the query agent 104, and then provided to the processing matrix 120/processing matrix 122. Upon receipt of the SQL/XQL query, the master node of the processing matrix 120/122 is adapted to generate another executable (e.g., another DLL) from the embedded SQL/XQL instructions. The master node then provides the newly generated DLL to a subset of the processing nodes of the processing matrix 12/122 for execution. Alternatively, the query agent 104 can be adapted to extract the embedded SQL/XQL instructions from the DLL 150 and compile a new DLL 152 from the extracted SQL/XQL instructions. The DLL 152 then may be submitted to the processing matrix 120/processing matrix 122 for execution.
The results of a database operation by the computing matrix 116 can be managed in a variety of ways. Depending on the query, the results can remain in data storage or memory of the processing matrices, especially when the results are known or expected to be used in subsequent database operations. The results can be forwarded to the query agent 104 for further processing and/or the results can be stored in a common work-unit storage module (as discussed in greater detail with reference to FIG. 2). The results also could be transmitted back to the client by the query agent 104 via, for example, as a file transferred over a network.
Once the execution of a submitted query has been completed, the query agent 104 can be adapted to report to the scheduling services module 114. The scheduling services module 114 can adjust the scheduling of subsequent queries, if necessary, and then notify the next scheduled query server that its DLL can now be submitted to the computing matrix 116 for processing. Part of the scheduling process may include determining which processing matrices of the computing matrix 116 should be used for the optimum utilization of the system. To illustrate, the computing matrix 116 may implement two global-results processing matrices 118, each having five nodes, a global-results processing matrix 118 having 20 nodes, and a global-results processing matrix 118 having one hundred nodes. It will be appreciated that the use of the hundred node processing matrix 118 to perform a database operation suitable for a five node processing matrix 118 is relatively inefficient or at least consumes system resources that could be used to satisfy another query. Accordingly, the scheduling services module 114 can be adapted to analyze the processing demands of all submitted requests to determine the most appropriate allocation of the database operations among the processing matrices as well as the timing of their submission.
Referring now to FIG. 2, an exemplary system 200 for recording the state of the system 100 is illustrated in accordance with at least one embodiment of the present invention. The system 200 includes a work-unit reporting module 202 preferably connected to one or more of the query server 102, the query agent 104, the query builder module 106, the scheduling services module 114, the index-based query processing matrix 122, the general-purpose query processing matrix 120, and the global-results processing matrix 118, as well as other components of the system 100 as appropriate. The work-unit reporting module 202 preferably includes a read-write data store, such as a transactional-hierarchical database server implemented on one of the nodes of the system 100. In at least one embodiment, the work-unit reporting module 202 is adapted to maintain a work-unit (e.g., work-units 204-208) for each query submitted to the system 100 for processing. The work-unit for a query can include a log of the various events in the system 100 while processing the query, messages between components, and if the results of the query are of an acceptable size, the work-unit can include the results of the query. Alternatively, the query results may be stored elsewhere, such as in a data store (not shown), on nodes of one or more of the processing matrices 118-122, and the like. In this case, the related work-unit can store references to the storage locations of the query results. For example, if the query results are stored as a file on a networked device, the file reference could include the network address of the networked device and the filename of the file storing the query results.
When a client submits a query (e.g., through query builder module 106), the work-unit reporting module 202, in one embodiment, creates a new work-unit associated with the submitted query. The query can be included with the work-unit. As the query server 102 processes the query to generate a DLL, the query server 102 can submit various events to the work-unit reporting module 202 for inclusion with the stored work-unit. For example, the query server 102 can be adapted to perform a syntax check on the query and report the result of the syntax check to the work-unit reporting module. Likewise, the query server 102 can report to the work-unit reporting module 202 when the DLL is generated and when the DLL is submitted to the computing matrix 116 (FIG. 1) for processing. The query server 102 also can submit an estimate of the processing time required for the query at each processing matrix of the computing matrix 116 expected to be used in processing the DLL.
Errors or events during the processing of the DLL by the query agent 104 and/or the computing matrix 116 can be reported to the work-unit reporting module 202 for inclusion in the associated work-unit. Such events and errors can include, but are not limited to, a failure of a node of a processing matrix, the transfer of results between processing matrices 118-122, data integrity errors, the time of completion of the query, and the like. Further, the results of the query can be stored with the corresponding work-unit if the data is of an acceptable size, as well as a description of the results, such as the number of records returned, the size of the data, and the like.
In addition to maintaining a log of the events involved with the processing of a query, the work-unit reporting module 202 can be adapted to facilitate communication between the components of the system 100. To illustrate, rather than sending a DLL directly to the query agent 104, the query server 102 instead can write the DLL to the associated work-unit on the work-unit reporting module 202. Subsequently, a query agent 104 can obtain the DLL from the corresponding work-unit at the module 202 for further processing. Likewise, one or more of the processing matrices 118-122 may be adapted to store a completion indicator in the corresponding work-unit when the processing matrix completes its portion of the processing for the query as well as an indicator of the location of the results. The other components of the system 100 can be adapted to access the work-unit to determine if a portion of a query has been completed and the location of the results, if any.
Referring now to FIG. 3, an exemplary method of operation of the system 100 of FIG. 1 is illustrated in accordance with at least one embodiment of the present invention. The exemplary method 300 initiates at step 302 wherein a query is generated and submitted to the query server 102. As note above, the query preferably is represented as ECL source code generated using, for example, the query builder module 106 (FIG. 1). The generation of ECL-based queries is discussed in greater detail below with reference to FIG. 6. Alternatively, the query can be structured using one or more conventional programming languages useful in programming queries, such as SQL, XQL, Java, Perl, C, C++, Fortran, and the like. After the query is generated, it can be formatted into a format suitable for transmission to the query server 102 (FIG. 1), such as an XQL, XML, HTML file, or text file. The formatted query then is transmitted to the query server 102.
At step 304, the query server 102 receives the query and compiles a DLL 330 (or multiple DLLs) from the submitted query. The step 304 preferably includes a series of substeps 402-412, illustrated with reference to FIG. 4. In the event that a query is submitted by a client using an XML template, the query server 102 converts the input of the XML template to an ECL-based query at step 402. At step 404, the query server 102 (FIG. 1) performs a syntax check on the submitted query to ensure that the format of the query is in compliance with the guidelines of the query language (ECL, SQL, XQL, etc.) used to prepare the query. Furthermore, the syntax check can include determining that all attributes, actions, and the like are defined. As noted above, the system 100 can include the repository 110 (FIG. 1). Client-defined attributes can be stored in the repository 110 and then retrieved by the query server 102 when performing a syntax check to confirm that the attributes are properly defined.
At step 406, the definitions for the client-defined attributes are substituted into the query by the query server 102. To illustrate, if the query included the code line: where "COUNT" is a counting operation, the query server 102 could check the repository 110 for the definition of the attribute "BigDog." The attribute "BigDog" could be defined in the repository 110 as: Accordingly, at step 406, the query server 102 could substitute the definition of the attribute "BigDog" into the query, resulting in: This process can be repeated as necessary for some or all of the attributes of the query. Likewise, the process can be repeated recursively for nested attributes.
At step 408, the query server 102 converts the ECL-based (or SQL/XQL-based) query to intermediary source code in a conventional high-level or medium-level programming language, such as C++, Java, Perl, Fortran, Assembler, and the like. For ease of discussion, the use of the C++ programming language is discussed herein. The query, in one embodiment, is converted to the intermediary source code by using predefined code segments for the components of the query, where the code segments have been developed and refined for execution by the parallel-processing matrices of the computing matrix 116 (FIG. 1). The use and reuse of predefined code segments by the query server 102 often gives incentive for increased scrutiny and review, oftentimes resulting in the code segments used by the query server being more efficient and reliable.
To convert the query to the intermediary source code using predefined code segments, the query server 102 converts the source-code instructions of the submitted query into a parse tree (also known as a syntax tree). The query server 102 then analyzes each node as it traverses the parse tree. At each node, the query server 102 selects the most appropriate predefined code segment based on the analysis of the node. General methods for converting source code using parse trees are well known to those skilled in the arts (see generally, Daniel Friedman, et al., Essentials of Programming Languages, (3 ed., McGraw Hill, 1992) and Alfred Aho, et al., Compilers: Principles, Techniques, and Tools, (Addison-Wesley, 1986)). Additionally, the query server 102 can be adapted to optimize the parse tree using a number of graph optimization methods, such as well-known graph coloring techniques (see generally, Michael Molloy, et al., Graph Colouring and the Probabilistic Method, (Springer Verlag, 2001)).
As described in greater detail below, in one embodiment, each of a subset of the processing nodes of the general-purpose query processing matrix 120 and/or the index-based query processing matrix 122 are adapted to function as a pseudo-SQL database, each processing node of the subset having as its own database a portion of a larger database. Accordingly, the query server 102 can embed one or more SQL statements representative of database operation(s) to be performed by these processing nodes. Upon receipt of the DLL having one or more embedded SQL statements, the master node (discussed below) of the processing matrix 120/122 can be adapted to generate intermediary source code from the embedded SQL statement(s), compile the intermediary source code to generate an executable (e.g., a DLL), and provide the DLL to the subset of processing nodes for execution. Accordingly, step 408 can further include the step of embedding one or more SQL instructions into the intermediary source code as appropriate. The SQL instructions can be embedded in the predefined code segments, added subsequently, and the like. The processing of a DLL having embedded SQL statements is discussed in greater detail with reference to FIG. 5.
At step 410, the intermediary source code preferably is optimized using any of a variety of optimization techniques, such as copy propagation, dead code elimination, reduction variables, strength reduction, and the like. Appendix A illustrates an exemplary ECL-based query and the exemplary intermediary source code (in C++) generated from the ECL-base query based on steps 402-410 as described above.
At step 412, the intermediary source code is compiled by the query server 102 into a DLL (or other type of executable). Since the intermediary source code, in one embodiment, is generated using a common high-level or mid-level programming language (e.g., C++), the query server 102 can utilize a widely-available (i.e., off-the-shelf) compiler to compile the high-level source code. Exemplary compilers for the C++ language that may be implemented by the query server 102 include, for example, the GNU C++ compiler available from http://www.gnu.org, Borland® C++ Compiler 5.5 available from Borland Software Corporation of Scotts Valley, Calif. and Microsoft® Visual C++.NET compiler available from Microsoft Corp. of Redmond, Wash.
As noted above, queries submitted to a conventional database system often are in the form of an interpreted query language, such as SQL. The client formats a query using SQL and submits the SQL query to a conventional database system, which then employs an SQL interpreter to interpret the code of the SQL query. As the SQL interpreter traverses the parse tree representing the submitted SQL query, the SQL interpreter passes execution to a library representative of the particular portion of the parse tree under consideration. As a result, there is considerable delay as the SQL identifies the proper library, the processor performs a context switch between the interpreter and the library, and performs another context switch between the library and the interpreter when the library has finished executing. Furthermore, the SQL interpreter generally considers only the local portion of the parse tree when selecting a library function for execution and therefore is often unable to optimize the overall process of the query. By adapting the query server 102 to generate an intermediary source code representation of the submitted ECL-based, optimize the intermediary source code, and then compile the intermediary source code into one or more executables, the efficiency problems associated with queries formatted using interpreted query languages can be avoided. Further, by using predefined code segments, considerably improved efficiency, accuracy, and reliability may be achieved compared to custom source code manually generated for each database operation or query.
Referring again to FIG. 3, step 304 of the method 300 continues with the query server 102 providing the DLL 330 to one or more of the processing matrices 118-122 of the computing matrix 116 (FIG. 1) via the query agent 104. Those processing matrices of the computing matrix 116 selected to receive the DLL 330, as well as the order in which the processing matrices receive the DLL 330, is based at least in part on the query submitted. Should the query involve relatively minimal processing, such as searching for the lowest value of 1,000 data entries, the query agent 104 can process the DLL 330 by itself at step 306. As such, the query agent 104 can be viewed as a relatively low-powered component of the computing matrix 116. The results of the execution of part or all of the DLL 330 by the query agent 104 are processed at step 308 and, at step 310, the results may be provided to the client via, for example, the query builder module 106 (FIG. 1), stored in the corresponding work-unit at the work-unit processing module 202 (FIG. 2), stored to disk or tape, provided to one or more of the processing matrices for additional processing, and the like.
In some instances, the submitted query can involve database operations using certain fields that are indexed by the index-based query processing matrix 122 (FIG. 1). Accordingly, the query agent 104 can provide the DLL 330 to the index-based query processing matrix 122 at step 310. The index-based query processing matrix 122 can provide the results of the database operation(s) to the global-results processing matrix 118 at step 318 and/or provide the results to the query agent 104 at step 312.
Some or all of the operations of a submitted query may involve the analysis of relatively large amounts of data. Examples of such database operations can include, but are not limited to, sorting, collating, counting, cleansing, duplicate record resolution (i.e., "deduping"), joining, appending, merging, purging, cleansing, non-hierarchical linking, formatting, and the like. In this case, the query agent 104 can provide the DLL 330 to the general-purpose query processing matrix 120 (FIG. 1) at step 314, whereupon the DLL 330 is executed by the processing matrix 120. The general-purpose query processing matrix 120 is discussed in greater detail with reference to FIGS. 7 and 8.
As with the index-based query processing matrix 122, the results of the execution of the DLL 330 at the general-purpose processing matrix 120 can be stored to disk or tape, provided to the client via the query agent 104, stored to the corresponding work-unit at the work-unit processing module 202, and the like (step 316). In some instances, however, it may be desirable to process the query on multiple processing matrices, where the results generated by one processing matrix are provided to another for additional processing. Particularly, many queries involve one or more database operations performed by the general-purpose query processing matrix 120 and/or the index-based query processing matrix 122 followed by one or more database operations performed by the global-results processing matrix 118 on the results from the processing matrices 120/122. To illustrate, an exemplary submitted query could include a sequence of two database operations. The first operation could include identifying those people having an age greater than thirty years from a criminal records database. The second operation could include sorting the identified people by last name. Accordingly, the identifying operation could be performed by the general-purpose query processing matrix 120 and the identified results provided to the global-results processing matrix 118 in no particular order. The global-results processing matrix 118 then could perform the sort operation on the results provided from the processing matrix 120.
Accordingly, at step 320 the results from one or more database operations performed by the general-purpose query processing matrix 120 are provided to the global-results processing matrix 118. The results can be provided in any of a variety of ways. Preferably, the results stored in the memory of a node of the general-purpose query processing matrix 120 are transferred to the disk storage of a corresponding node of the global-results processing matrix 118. Alternatively, the results could be transferred to storage and the general-purpose query processing matrix 120 could provide a reference to the storage location of the results to the global-results processing matrix 118 directly or via the work-unit processing module 202.
In addition to, or rather than, using two or more types of processing matrices to process a query, the system 100 can be adapted to process the query using two or more of the same type of processing matrices in sequence or in parallel. For example, a query could include two database operations, one operation to identify records having a certain characteristic in one database, and the other operation to identify records having a certain characteristic in another database. Accordingly, the query agent 104 could provide the DLL 330 to one processing matrix 120 to select the records from the first database and provide the DLL 330 to another processing matrix 120 to select the records from the second database. In another example, a query could include two database operations, one operation to identify records of a large database having a certain characteristic, and another operation to identify those records identified by the first operation as having a second characteristic. In this case, the query agent 104 could be adapted to supply the DLL 330 to a first processing matrix 120 having a relatively large number of processing nodes to identify the records having the first characteristic. The identified records and the DLL 330 then could be supplied to a second processing matrix 120 to identify those records from the first processing matrix 120 that have the second characteristic.
Some or all of the database operation(s) of a submitted query may be beneficially performed by the global-results processing matrix 118, either separately or in conjunction with the results generated by another processing matrix of the computing matrix 116. Accordingly, the query agent 104 can provide the DLL 330 to the global-results processing matrix 118. At step 322, the global-results processing matrix 118 can execute some or all portions of the DLL 330 using the results generated by another processing matrix, data previously distributed to the nodes of the global-results processing matrix 118, or a combination thereof. At step 324, the results of the execution of the DLL at the global-results processing matrix 118 can be stored to disk or tape, provided to the client via the query agent 104, stored to the corresponding work-unit at the work-unit processing module 202, provided to another processing matrix of the computing matrix 116, and the like. The operation of the global-results processing matrix 118 is discussed in greater detail with reference to FIGS. 9 and 10.
Referring now to FIG. 5, an exemplary method 500 for generating a second DLL from a DLL having embedded SQL instructions is illustrated in accordance with at least one embodiment of the present invention. As noted above, certain processing nodes of the processing matrix 120/122 may be adapted to function as individual database systems on their individual portion of a database. Accordingly, it may be beneficial to embed in the DLL 500 supplied to the matrix 120/122 one or more SQL statements 502, 504 representative of the database operation(s) the processing nodes are to perform on their respective database portion. The master node of the matrix 120/122 may then implement exemplary method 500 to generate a second DLL for use by the certain processing nodes.
Method 500 initiates at step 506, whereby the master node (master node 702, FIG. 7) of the matrix 120 (or matrix 122) is adapted to identify and extract the SQL statements 502, 504 from the DLL 500. At step 508, the SQL statements are converted into a parse tree and the master node traverses the parse tree to generate intermediary source code (e.g., C++ source code), preferably using predefined code segments as with step 408 (FIG. 4). At step 510 (analogous to step 410, FIG. 4), the intermediary source code is optimized and then compiled into machine-level code at step 512 (analogous to step 412, FIG. 4). The newly generated DLL may then be provided to the subset of processing nodes for execution, as discussed in greater detail with reference to FIGS. 7 and 8.
Referring now to FIG. 6, an exemplary implementation of the query builder module 106 (FIG. 1) is illustrated in accordance with at least one embodiment of the present invention. As discussed above, the query builder module 106 can include any of a variety of interfaces adapted to receive query input from a client. In one embodiment, the query builder module 106 includes a GUI 602 adapted to facilitate the programming of a client using ECL.
In the illustrated example, the GUI 602 includes a query list window 604, an ECL reference list window 606, a query code window 608, a results display window 610, and a variety of client-selectable objects (i.e., "buttons"), such as open button 622, new button 624, send button 626, syntax button 628, clear button 630, save button 632, export button 634, and clear button 636. The query list window 604 includes a graphical listing of queries previously generated and/or submitted for processing. The query code window 608 is adapted to graphical display the ECL code associated with a query listed in the query list window 604. To open a previously-generated query, the client may select one of the queries listed in the query list window 604 by, for example, selecting the corresponding query name listed in the window 604 with a mouse, keyboard, or other client-input device. Alternatively, the client could select the open button 622 locate and load a previously-generated query.
To generate or modify a query, a client can use the ECL code window 608 to add, delete or modify the ECL code representing the query. The ECL reference list window 606 can be used to navigate the attributes, actions, constants, operators, and other elements of ECL. Further, the GUI 602 can be adapted to include an element of ECL in the ECL code displayed in the window 608 when the element is selected from the ECL reference list window 606 using a client-input device.
After generating or modifying ECL code 640 representative of part or all of a desired query, the client can select the syntax button 628 to direct the query builder module 106 (FIG. 1) to perform an ECL syntax check on the ECL code 640 in the ECL code window 608. If the syntax is correct and the client is satisfied with the query, the client can select the send button 626 to submit a representation of the ECL code 640 to the query server 102 (FIG. 1) for processing as a query. Alternatively, the client can select the clear button 630 to clear the ECL code 640 from the ECL code window 608.
In some instances, a submitted query may be formatted to return certain results to the client. These results can be received by the query builder module 106 and the results (results 650) then displayed in the appropriate format in the results display window 610. In at least one embodiment, the GUI 602 is adapted to provide for display (in the same window 610 or a separate window) the raw data associated with an element of the results selected by the client. The client may chose to save the results by selecting the save button 632, export the results as a particular file type (e.g., a Microsoft Excel spreadsheet) by selecting the export button 634, or clear the displayed results from the window 610 using clear button 636.
It should be understood that the results may be displayed in a variety of ways, which may be user-definable or user-selectable, e.g., subject profile, composite report, summary report, continuous string, and others. Additional tools may be provided to tenable the user to manipulate, edit, and perform other tasks, on the results. The user may also edit the search parameters, perform additional searches or take other desirable actions.
The GUI 602 may be further understood by considering the following example. In this example, a client desires to display an unsorted list of the people having entries in the "Persons" data set by the person's city and by the person's first name. The client can select the "OUTPUT" action from the Actions section (generally represented as one of Action—1—Action—3) of the ECL reference list window 606, whereby the "OUTPUT" action includes an ECL action directing the output of entries in an identified data set that meet indicated criteria. The client can identify the "Persons" data set by selecting it from the data set section of the ECL reference list window 606 and indicate the certain criteria (i.e., output by city and first name) by selecting the "Person.per_full_city" and "Person.per_first_name" fields of the "Persons" database as listed in the attributes section of the ECL reference list 606. The resulting ECL code 640 would then be:
The client could check that the syntax is correct by selecting the syntax button 628 and then submit the ECL code 640 to the query server 102 for processing by selecting the send button 626.
The query server 102 then generates a DLL representing the submitted query and provides the DLL to the query agent 104 (FIG. 1) for processing by the computing matrix 116. The query agent 104 then supplies the results to the query builder module 106, whereby the city and last name of each person of the "Persons" data set are displayed as a two-column matrix (results 660) in the results display window 610.
Referring now to FIGS. 7A, 7B, and 8, an exemplary implementation and operation of the general-purpose query processing matrix 120 are illustrated in accordance with at least one embodiment of the present invention. In the illustrated embodiment of FIG. 7A, the processing matrix 120 includes a plurality of interconnected processing nodes 702-720 operating in parallel. Each node includes at least one processor and memory accessible by the processor(s) of the node. Each node also may include one or more storage devices, such as disk storage, tape drives, and the like. In a preferred embodiment, a processing node includes a common general-purpose, single-user microcomputer configuration having a motherboard, one or more processors, random access memory (RAM), one or more disk drives, a network interface, as well as various support components, such as read only memory (ROM), direct memory access (DMA) controller, various busses, and the like. An exemplary implementation could include, for example, a general-purpose, single-user microcomputer motherboard having an Intel® Pentium® III processor and 2 GB of RAM; two 32 GB EIDE or SCSI hard disk drives; and an Ethernet network interface card (NIC).
The nodes of the processing matrix 120 preferably are logically arranged in an n-ary tree structure of N levels. The node at the root of the tree is designated as the master node and each node at the bottom level of the tree structure is dedicated as a slave node. Those nodes at intermediate levels of the tree between the top level and the bottom level are designated as collator nodes. In the illustrated example, the processing matrix 120 includes three levels, where the master node 702 is located at the first level, collator nodes 704-708 are located at the second level, and slave nodes 710-720 located at the third level. Alternatively, if the processing matrix 120 included, for example, four levels, the nodes 710-720 also would be collator nodes and the children of the nodes 710-720 would then be the slave nodes. Note that although FIGS. 7A, 7B illustrates an exemplary implementation of the processing matrix 120 having a three-level tree structure where the parent to child ratio for the master node is 1:3 and 1:2 for the master node collator nodes, respectively, any number of tree levels and/or any ratio or combination of ratios of parent node to children nodes may be implemented without departing from the spirit or the scope of the present invention.
In one embodiment, the master node 702 is adapted to prepare the processing matrix 120 for processing a DLL/SQL query received from the query agent 104; to distribute the DLL to its children; and to process the results supplied from its children. The slave nodes of the processing matrix 120 can be viewed as the "workhorses" of the processing matrix 120 by performing the processing-intensive operations of the submitted query. Each collator node between the slave nodes and the master nodes manages the results from its children and then provides the results of its processing to its parent node, which may include another collator node or the master node. The master node then processes the results from its children nodes.
In at least one embodiment, each node of the processing matrix 120 executes the same software application, referred to herein as a "homogenous agent" or "HomAgent". In one embodiment, the HomAgent is adapted to receive a DLL; dynamically link to a specified portion of the DLL while operating; and execute the specified portion of the DLL. It will be appreciated, however, that after executing multiple DLLs in this manner, there is the potential for corruption of the memory space of the HomAgent. Accordingly, in another embodiment, rather than linking to and executing the specified portion, the HomAgent invokes another process to link to and execute the specified portion of the DLL. For ease of discussion, reference to the HomAgent executing a DLL or performing another act also extends to the execution of the DLL or the execution of the act by a process invoked by the HomAgent, unless otherwise noted.
The relationship between the HomAgent and the DLL can be viewed as analogous to the relationship between, for example, a word processor application and a device driver (i.e., a type of DLL) for a printer. When the word processor is directed to output a document to a printer for printing, the word processor invokes generic print commands. These generic print commands in turn are dynamically linked to the printer-specific device driver that directs the operation of the printer. As such, the word processor can be adapted to print to a plurality of different printers by engaging device drivers specific to each printer. In the same manner, the HomAgent allows each node to perform a wide variety and combination of operations by using generic commands that are dynamically linked to specific portions of the DLL. The operations coded in different entry portions of the DLL determine the specific operations performed by a particular HomAgent. The HomAgent is discussed in greater detail with reference to FIGS. 12-13.
In at least one embodiment, each slave node 710-720 operates essentially as a separate database management system on a respective portion of one or more databases 742. Accordingly, in one embodiment, the global-results processing matrix 118 segments the database 742 into separate database portions 750-760 and then distributes the portions 750-760 among the slave nodes 710-720 prior to the processing of one or more database operations on the database 742. Any of a variety of distribution techniques may be implemented to distribute the data of the database 742. The data of the database 742 may be, for example, equally distributed among the nodes 710-720 by providing the first x records of the database 742 to node 710, the next x records of the database 742 to the node 712, and so on. In this example, x represents the total number of records divided by the number of slave nodes (six in this case), across which the records are to be distributed.
In many instances, however, it is desirable to randomly, rather than sequentially, distribute the data of the database 742 across the nodes 710-720. Accordingly, the global-results processing matrix 118 can be adapted to use of one or more hash functions on one or more fields of the records of the database 742. For example, the database 744 could represent a credit history database, each record of the database having a social security number field, a name field, an address field, and a number of credit-related fields. In this example, the records could be distributed among the nodes 710-720 using a hash function keyed to the social security number associated with each record. The distribution of the database 744 is illustrated in greater detail with reference to FIGS. 14 and 15.
In at least one embodiment, the data portions 750-760 of the database 742 are stored in the memory of the corresponding slave node (memory 730-740), which preferably comprises random access memory (RAM). The slave nodes then perform database operation(s) using the data distributed into their memories. It will be appreciated that memory accesses typically are much faster than disk storage accesses, and are often at least two to three orders of magnitude faster. Accordingly, database operations performed by the slave nodes typically can be performed much faster than those performed by conventional database query systems that process queries from data stored in non-volatile storage, such as hard disk, tape, optical disk, and the like. The distribution of data into node memory from one or more databases is discussed in greater detail below with reference to FIGS. 14-15.
FIGS. 7B and 8 illustrate an exemplary operations 800 of the general-purpose query processing matrix 120. Using the exemplary method 300 (FIG. 3), the query server 102 generates a DLL 700 and provides the DLL 700 to the master node 702 of the processing matrix 120. In the illustrated example, the DLL includes three portions A-C, each portion to be executed by processing nodes of a specified level of the tree. The HomAgent at the master node 702 (or a process invoked by the HomAgent), upon receipt of the DLL 700, is configured to execute portion A of the DLL 700 (step 801, FIG. 8). Portion A may direct the HomAgent of the master node 702 to generate a new DLL from SQL instructions embedded in the DLL 700 (method 500, FIG. 5) and provide the new DLL to the collators 704-708 (step 802, FIG. 8). Alternatively, portion A may direct the HomAgent of the master node 702 to directly transfer a copy of the DLL 700 to each of the collators 704-708. For ease of discussion, subsequent reference to the DLL 700 refers to either the original DLL 700 from the query agent 104 or the DLL 700 generated by the master node 702 from the original DLL unless otherwise indicated.
Upon receipt of the DLL 700 (or a newly generated DLL), the HomAgent at each collator node 704-708 is adapted to execute portion B of the DLL 700 substantially in parallel (steps 804-808, FIG. 8), where portion B may direct the HomAgent of each collator node 704 to provide a copy of the DLL to each of the collator node's children nodes. The step of providing the DLL from parent node to its children nodes is repeated until the DLL is received by the slave nodes at the lowest level of the tree, in this case, the slave nodes 710-720. The HomAgent at each of the slave nodes 710-720, in turn, is configured to execute portion C of the DLL 700 substantially in parallel (steps 810-820, FIG. 8). In this case, the portion C of the DLL 700 represents the one or more database operations to be performed by the slave nodes 710-720 on their respective database portions. This portion of the DLL typically includes the processor-intensive operations of the submitted query, such as performing complex calculations, locating certain data in the data set at each node, evaluating complex boolean expressions, and the like, all on a relatively large number of data set entries.
In one embodiment, the slave nodes 710-720 transmit their results in parallel to one or more the global-results processing matrices 118 (steps 840-850, FIG. 8). As discussed in greater detail below, in one embodiment the global-results processing matrix 118 is implemented as a two-level tree having a single master node and a plurality of slave nodes. Accordingly, the slave nodes 710-720 of the general-purpose query processing matrix 120 can be adapted to directly transfer their results to one or more slave nodes of the global-results processing matrix 118. The results from a slave node of the general-purpose query processing matrix 120 can be allocated to the slave nodes of the global-results processing matrix 118 in any of a variety of ways. With consideration to the storage capacity of the slave nodes of the processing matrix 118, the results from each of slave nodes 710-720 can be distributed among some or all of the slave nodes of the processing matrix 118, all of the results could be concentrated in one or more slave nodes of the processing matrix 118, subsets of the slave nodes 710-720 could be associated with each of the slave nodes of the processing matrix 118, and the like.
Method 800 typically is implemented in a query wherein the results of one or more database operations by the general-purpose query processing matrix 120 receive further processing by the global-results processing matrix 118. To illustrate, consider the following exemplary query: where the operation "JOIN" results in the generation of a new dataset "j" that represents the union of the entries of the dataset "Persons" having an "age" value greater than 20 and those entries of the "Cars" dataset having a "color" value equal to "blue". In this example, the computing matrix 116 of system 100 (FIG. 1) includes two general-purpose query processing matrices 120 and a global-results processing matrix 118. Accordingly, the exemplary query above could be constructed by the query server 102 (FIG. 1) into three database operations:
The first "FETCH" operation being assigned for processing by one of the general-purpose query processing matrices 120 and the second "FETCH" operation being assigned for processing by the other general-purpose query processing matrices 120. The results of the "FETCH" operations by the processing matrices 120 are provided to the global-results processing matrix 118, whereupon the global-results processing matrix joins the results into a single data set "j".
The operation of the processing matrix 120 may be better understood by considering the following example. In this example, a query for the last names of the ten oldest people in a motor vehicle registration database of 60,000 entries is submitted to the processing matrix 120. At a prior time, the 60,000 records of the database 742 are randomly, but evenly, distributed among the memories 730-740 of the slave nodes 710-720, each memory storing 10,000 records. A DLL 700 representing the query is generated by the query server 102 (FIG. 1) and then provided to the processing matrix 120, where the DLL 700 then is distributed down the tree levels of the processing matrix 120 to the HomAgents of the slave nodes 710-720. Upon receipt of the DLL 700, the HomAgents of the slave nodes 710-720 (or processes spawned by the HomAgents) each execute the portion of the DLL 700 associated with the slave nodes, whereby each HomAgent is directed by the portion of the DLL 700 to identify the ten oldest people from the 10,000 entries stored in the memory of the slave node. Each slave node returns ten entries corresponding to the ten oldest people in the slave node's portion of the database to its parent collator node.
The results from the slave nodes are stored in the memory of the parent collator node. The HomAgents at the collator nodes 704-708 then each execute the collator portion of the DLL 700 substantially in parallel, whereby the HomAgent is directed to identify and return ten entries corresponding to the ten oldest people of the twenty entries received from its child slave nodes (ten entries from each slave node). The identified entries of the ten oldest people at each collator are stored in the memory of the master node 702. As directed by the master node entry portion of the DLL 700, the HomAgent at the master node 702 then identifies the ten entries corresponding to the ten oldest people of the thirty entries received from the collator nodes 704-708 and provides these entries to the query agent 104 for transmission to the client and/or stores these ten entries in the corresponding work-unit, e.g., work-unit 202 of FIG. 2. The master node portion of the DLL 700 also could direct the HomAgent of the master node 702 to perform one or more additional operations on the ten entries before transmitting them to the query agent 104, such as sorting the ten entries by last name.
Referring now to FIGS. 9 and 10, an exemplary implementation and operation of the global-results processing matrix 118 is illustrated in accordance with at least one embodiment of the present invention. In the illustrated embodiment of FIG. 9, the global-results processing matrix 118 includes a bi-level tree architecture having a master node 902 connected to one or more slave nodes 912-918. Additionally, each slave node preferably is connected to at least one other slave node via a network and more preferably is connected to every other slave node of the processing matrix 118. As with the processing matrix 120, in at least one embodiment, each processing node of the processing matrix 118 executes the same HomAgent software application.
As noted above, in one embodiment, the results generated by one or more processing matrices 120/122 are stored to the slave nodes 912-918 for further processing by the global-results processing |