Method and device for model resolution and its use for detecting attacks against computer systems7043755Abstract The present invention relates to a method and device for model resolution and its use for detecting attacks against computer systems. The device comprises adapter software for translating the information from the log file, formulated in the specific language of the machine, into a language understandable by the interpreter, an interpreter receiving the information from the adapter and receiving the formulation of the specification in the temporal logic in a specification formula in order to expand this formula and fill in the table and the stack of worked subformulas described above resulting from the scanning of the machine's log file, and a clause processing algorithm for resolving the Horn clauses using the information from the table and the stack of worked subformulas, this clause processing algorithm generating an output file or generating an action. Claims What is claimed is: Description FIELD OF THE INVENTION
Another object is to provide great flexibility. This object is achieved through the fact that the method uses a temporal logic for the formulation of the specification. According to another characteristic, the table is a matrix and is indexed in columns by the subscripts of the formulas appearing in the negative part of the Horn clauses, and the lines are the Horn clauses exactly. According to another characteristic, the table is preferably represented in the form of a sparse matrix, the columns being represented by means of chained lists and the lines remaining implicit. According to another characteristic, an optimization of the expansion of the formulas is obtained through a hash table in order to ensure that the same formula is not expanded more than once in each record. According to another characteristic, the log file is scanned only once from beginning to end. Another object is to offer a device that makes it possible to implement the method. This object is achieved through the fact that the high-performance specification resolution device comprises:
BRIEF DESCRIPTION OF THE DRAWINGS Other characteristics and advantages of the present invention will emerge more clearly through the reading of the following description, given in reference to the attached drawings, in which: FIG. 1 represents a schematic view of the hardware and software elements that make it possible to implement the method. FIG. 2 represents the contents of the table, of the counters of the formulas or subformulas present in the negative parts of the clauses, and of the stack, and their evolution during the implementation of the method. DESCRIPTION OF THE PREFERRED EMBODIMENTS FIG. 1 represents the elements required to implement the method according to the invention. The log file (1) is generally present in all the machines and can be the network log file when the machine is connected to a network, or a system log file, or any other file in which one wishes to verify a specification. A machine is understood to mean a computer comprising storage means, means for reading and for executing a program, and means for interacting with a user (for example screen, keyboard, mouse) and means for connecting to the network. This file communicates with an adapter (2), which is a software program for translating the information contained in the log file and expressed in the specific language of the machine into a high-level language understandable by an interpreter (3). The interpreter (3) also receives from a module (4) the formula of the specification to be verified, expressed in a temporal logic. This interpreter (3) performs the expansion of the formula into subformulas and the scanning of each record Ei (Annex 2) of the log file (1) in order to generate, by means of this expansion and this scanning, a resulting table and stack expressing Horn clauses stored in a memory (5). The concept of a Horn clause is well known to one skilled in the art, and is described for example in Goubault-Larrecq, Jean and Mackie, Ian, "Proof Theory and Automated Deduction," published by Kluwer, 1996). This table and this stack are used by a clause processing algorithm (6), which receives a start order from the interpreter (3) once the latter has filled in the table (5) containing a table of counters (7) and a stack (18), after having scanned all the records Ei of the file. This algorithm will look for the resolution of the specification for all of the records. When the completed scan of the record file (1) is detected, the clause processing algorithm generates either an output file or an action of the system or the machine. In an optimization of the method according to the invention, the phase for filling in the table (5) and the stack (18) and the phase for processing the clauses are performed concomitantly, so that the clause processing algorithm can generate the output file or the action of the system or the machine as soon as possible, and generally before the detection of the completed scan of the record file (1). To provide a better understanding of the method implemented, the latter will be explained with the help of an example whose formulas appear in an annex at the end of the specification. First of all, a log file is a set of records E=E1, . . . EN), as represented in Annex 2. Each record Ei comprises a certain amount of information such as the date, the operation in question, the machine, a result, a subject, this list being non-limiting. Thus, E1 indicates that the machine user has tried to connect but has failed. To formulate a specification, as represented in Annex 1, that can be detected or resolved, a specification formula in a temporal logic is used. This formula is described according to the following formula production in the grammar of the BNF format well known to one skilled in the art (Aho, Alfred V., Sethi, Ravi and Ullman, Jeffrey D., Compilers: Principles, Techniques and Tools, Addison-Wesley, 1986):
The operators between formulas are the operator " ##CHR4## " for expressing a logical "OR", "U" for expressing the formulation "until", and "W" for expressing the formulation "waiting for", "O" for expressing the formulation "on the next line, which exists", "Õ" for expressing the formulation "on the next line, if it exists", "⋄" for expressing the formulation "on the current line or on a subsequent line", for expressing the formulation "on the current line and on every subsequent line." This notation is well known to one skilled in the art, (see for example Manna, Zohar and Pnueli, Amir, The Temporal Logic of Reactive and Concurrent Systems Specification, Springer, 1992). Thus, the temporal formulation F=F1 W F2 allows for an easy formulation of a specification to be verified. Let us assume that the operator has entered, by means of a man-machine interface (4) that allows the generation of a temporal formula, a temporal formula like the one appearing in Annex 1. The interface (4) will translate this formula in Annex 1 into a temporal formula where F and H are atomic formulas in which F represents {op="connection", result="failed", etc.} and H represents {op="connection", result="success", etc. Furthermore, let us assume that the log file (1) contains the records E1 through E3 represented in Annex 2. First, the interpreter (3) performs an expansion of the formula for each record E1, E2, E3, as represented in Annex 6, by generating subformulas for each record in order to deduce from them Horn clauses that express the logical implications that exist between a formula and its subformulas, and the possibility of satisfying the atomic formulas, as represented in Annex 6. Thus, for the record E1, the formula is expanded into the subformula F to which the clause (f2) corresponds, into the subformula ⋄H to which the clause (f2) ##CHR5## (f3)→(f1) means that if the configuration f2 is verified and the configuration f3 is verified, then the configuration f1 is verified. The clause f7→f3 means that if that the configuration f7 is verified, then the configuration f3 is too. Furthermore, during the expansion of the formulas by the interpreter (3), the latter stored in a stack (18) the positive clauses corresponding to the formulas that can be satisfied. Thus, at the end of the expansion, the formulas f2 and f8 are in the stack (181), as shown in FIG. 2, and the table of the counters of negative literals in the clauses of the table is constituted by the information represented by the reference (71) in this figure. In the resolution phase, the clause processing algorithm (6), when it is activated by the interpreter once the latter has filled in the tables (5, 7 and 18), after having examined the lines of the records in the log file, will begin by examining the top of the stack (18) and extracting from it the information that the configuration f8, in this case, is satisfied. The algorithm then examines, in the table (5), the clauses that have this configuration in the negative part, in this case the configuration f7, and deduces from them the counter that it must decrement. The counter (72) represents the evolution in the counter (71) of the counter that is associated with the formula represented in the positive part. The algorithm decrements the corresponding counter, in this case that of the configuration f7, and places at the top of the stack the value "7" of the configuration that is true, as shown in the box (182), which represents the evolution of the stack (18), while the column (72) represents the evolution of the counter. Next, the clause resolution algorithm will proceed by iteration, searching for the clauses that have the configuration f7 in the negative part in order to deduce from them that the configuration f3 is true and decrement the counter corresponding to this line of configurations, as shown in the column (73). The algorithm (6) continues in this way until the stack (18) is empty or contains configurations already processed, and the only configuration f1 that verifies the specification is obtained in the stack (185). The expansion algorithm avoids unnecessarily replicating identical configurations, represented by their pointers, by establishing a hash table. The hash table data structure and the associated algorithms are well known to one skilled in the art, (see for example Knuth, Donald Erwin, The Art of Computer Programming, Vol. 3, "Sorting and Searching," Addison-Wesley, Second Edition, 1998). Furthermore, it is also possible to achieve optimizations in the expansion of the formulas, in order to avoid several steps. Thus, instead of expanding the formula ⋄F into F ##CHR6## G where either F or G can be evaluated as false in the current state, the expansion of the formula is halted. The method developed by the invention has an advantage over the known method of the prior art, in which a truth table like the one represented in Annex 4 is first established for each atomic formula, then secondly, truth tables (Annex 5) are established for the non-atomic subformulas using the truth table of Annex 4. The model verification is then performed in two stages. First, it verifies whether the atomic formulas are true or false, which requires a scanning of the states for each formula, then secondly, in order to establish the truth of the subformulas, it is necessary to see how each atomic formula behaves in each state, which amounts to performing several scans of the records. This means performing backward returns in the log file with all the ensuing read and set operations which, given the large size of a log file, can be very time-consuming. The method developed by the invention is much more high-performance and economical, in terms of size and the memory required to store the intermediate states. To provide a better understanding of the algorithm, we will describe it briefly, then present it formally. The specification file, Fs, is considered to be a finite set of formulas Fs whose syntax and semantics are defined above. Let us use the notation F for the set of all the formulas whose syntax and semantics are defined above, and (R1, . . . , R|N|)(with N equal to the number of records in the file) for the log files. Log files are files that record everything that happens in a system (for example, a file that traces the users' connections to and disconnections from the machines). A record is a function R with a finite domain and codomain from Σ* to Σ*, or the set of character strings R:Σ*→Σ* Let us use the notations dom(R) and codom(R), respectively, for the domain of R and the codomain of R.
A log file is therefore a (finite) set of records R1, . . . , R|N|. Let "Current" and "Next" be sets of formula representations (in the remainder of the description, "formula" will be used to mean "formula representation"); Current is the set of formulas to be examined in the current state and Next is the set of formulas that must be examined in the next state. In each state, the set "Current" is the union of the set "Next" and the formulas Fs associated with the current state. That is what step 2) of the algorithm says. The current state is represented by the integer i; 1≦i≦|N|. The "log" file is scanned in one pass, and during this scan, in each state, i.e. in each record of the file, the formulas of the set Current that are verified are revealed, and those that contain future operators are added to the set "Next" so they can be examined in the next state. That is what the "Expand" procedure in step 3) of the algorithm does. This procedure extracts the subformulas from each formula recursively, stores the logical implications that concern them in the form of Horn clauses in a matrix M (for example, for a formula F=F1 ##CHR7## F2, we have the clauses F1→F and F2→F). and for those that are atomic, if they are verified in the current state (which is what the "match" procedure appearing in "Expand" looks for), it stores them in a stack (Stack), which is a stack of formula representations. Once all the formulas have been expanded in the current state, those that are resolvable are resolved with the help of the matrix and the stack (this is what the "resolve_matrix" procedure in step 4) of the algorithm does). Thus, as a result of the atomic formulas that have been resolved and the clauses, all the formulas that are verified are stored in the file "ResForm" (which is a set of formula representations). These steps are iterated until the end of the "log" file (as seen in step 4) of the algorithm). Finally, when the entire log file has been scanned, the "Satis" procedure of step 5) compares the formulas of the file ResForm, which are all formulas verified in a certain state but which are subformulas of formulas of the specification file, to the formulas of the specification file, in order to see which ones are verified, and in which state(s). Here is the algorithm itself:
We will now define the various procedures used in the algorithm. "Expand(f)" procedure, where f is a formula representation. For greater clarity, this procedure will be presented with the help of a table whose meaning will now be explained:
"match(f)" procedure, where f is a formula representation In the case of form(f):
Notation: ρ(x) is a partial function of the set of variables V to the set of character strings Σ*
Notation: E is the environment constituted by the pairs whose first component is taken from the set of variables and whose second component is taken from the set of character strings
Insert-clause procedure (H), where H is a Horn clause having one or two formula representations in the negative part: Notation: If M is a matrix m x n, m, n ∈ N, let mi,f be the element of the ith line indexed by f, and likewise mf,i and mf1,f2 In the case of H:
resolve-matrix procedure
Satis: If Stack≠stack-empty then:
It should be clear to those skilled in the art that the present invention allows embodiments in many other specific forms without going outside the field of application of the invention as claimed. Consequently, the present embodiments should be considered as examples, but can be modified in the field defined by the scope of the attached claims. ANNEX Annex 1
Annex 5
Annex 7
Annex 6
|
Same subclass Same class Consider this |
|||||||||||||||||||||||||
