System and method for classifying vehicles6828920Abstract A system and method have been provided for classifying electronic signatures, obtained through the detection of a vehicle with a single loop inductive sensor, into one of a plurality of vehicle classification groups. A neural networking process is able to learn the plurality of vehicle classifications. In response to an electronic signature stimulus, the neural networking process is able to recall the classification group corresponding to the signature. Claims We claim: Description BACKGROUND OF THE INVENTION
TABLE 1
DATA STRUCTURE
Byte Description Length (bytes)
1 Header 1
2 Loop id 1
3 Gap 1
4 Speed 1
5 headway 1
6 signature 3
The CPU 402 is not limited to any particular design or architecture. Obviously, a CPU with a higher operating speed multi-threading capability for the simultaneous processing of multiple channels, and an architecture with integrated functions (fewer commands) permits the signature analysis to be performed more quickly and simultaneously on multiple channels. In turn, a faster CPU may permit a more detailed or more complex analysis algorithm. In one aspect of the invention, a Motorola DSP 56300 24 bit processing family device is used, in particular the 56362 which operates as a 100 or 120 MHz processor. This processor is capable of 100 or 120 MIPS (2 56 bit MAC.fwdarw.20 MIPS or 120 MOPS) and permits parallel 24.times.24 bit MAC 1 in6 instruction (1 clock cycle/instruction), Hardware nested do loops, 24 bit internal data buss, 2 k.times.24 bit on chip Program RAM, 11 k.times.24 bit on chip Data RAM, 12 k.times.24 bit on chip Data ROM, and 192.times.24-bit bootstrap ROM. Alternately, a PowerPC 700CX processor (EBM) can be used operating at 550 MHz. The PowerPC device permits multi-threading, has a 32-bit data bus expandable to 64-bits, 32 k of L1 Cache, 256 of L2 Cache, and 32-64 bit registers for the floating unit. Other processors, or updated versions of the above-mentioned example processors could be adapted for the same purpose by those skilled in the art. FIG. 6 is a diagram illustrating the allotted time processing requirements using a DSP and a PowerPC processor. Neural networks originated as attempts to mimic the function of animal nervous systems, implemented as either hardware or software. While many network configurations are possible, they share the common features of being built up from simple processing elements and of being inherently parallel in operation by virtue of massive interconnectivity among large numbers of these elements. Neural networks are nonparametric and make weak or no assumptions about the shapes of the underlying distributions of the data. They have been successfully used as classifiers, multidimensional function approximators, and are a natural choice for data and multi-hypothesis fusion applications. A neural network process was selected for the problem of classifying vehicle signatures because of its large decision space and its large feature space. The feature spaces have nonlinear boundaries that distinguish the different classes. The advantages and limitations of neural networks are often complementary to those of conventional data processing techniques. The neural networks have been shown to be most useful in providing solutions to those problems for which: there is ample data for network training; it is difficult to find a simple first-principles or model based solution; and the processing method needs to be immune to modest levels of noise in the input data. Moreover, calculation of the output of a trained neural network represents, in essence, several matrix multiplications. Thus, the model encoded in the network during the training process may be calculated quickly and with a minimum of computing power. This is a huge advantage of the neural approach and makes it particularly suitable for real-time applications and where the speed of processing is important. FIGS. 7a through 7c illustrate characteristics of a multilayer perceptron neural network. FIG. 7a depicts a neural network assembled by interconnecting layers of processing elements; FIG. 7b depicts a single processing element with multiple inputs x.sub.i, input weights W.sub.i, bias .theta., and output function a; and FIG. 7c depicts a sigmoid function (an example of function f as shown in FIG. 7b. As shown in FIG. 7a, the network consists of a large number of interconnected processing elements. As shown schematically in FIG. 7b, a processing element typically has many inputs that are processed into one or a few outputs. In FIG. 7a, the processing elements have been organized into three layers of processing nodes--two "hidden" layers and an output layer (the input elements are fan-out nodes rather than processing nodes and are not counted as a layer). This is a feed-forward configuration--connections run from an element in one layer to an element in the next layer in the direction of input to output. At the processing element level, each input x.sub.i is multiplied by an associated weight W.sub.i, and the sum of weighted inputs and a constant bias .theta. is passed through a "squashing" function to the output. A typical sigmoid squashing function is shown in FIG. 7c. The squashing function accomplishes two important ends: it bounds the output value, and it introduces a nonlinearity. Due to the nonlinearity of the sigmoid applied at the processing elements, neural networks can capture a highly nonlinear mapping between the input and the output. Neural networks are not so much programmed as trained by example. Training requires a set of "exemplars"--examples of inputs of known types, and their associated outputs. Inputs are presented to the network, processing elements perform their calculations, and output layer "activations" (the output values) result. An error measure is formed from the root-mean-square (rms) of all differences between activations and "truth" values (i.e., the known output of the mapping being trained for). Corrections to all the interconnection weights are estimated, and the weights are adjusted with the intent of lowering the overall rms error. The training process consists of repeating this cycle until the error has been reduced to an acceptably low level. The most popular algorithm for adjusting the weights is back-propagation, a gradient descent technique that seeks to minimize the total sum of the squared differences between the computed and desired responses of the network. Other techniques, including genetic algorithms, the conjugate gradient, and refinements of the back-propagation algorithm, are available and may be used to shorten the training time. There are many important properties that a classifier must possess. These properties fall into two categories: learning and recall. "Learning" refers to how a system acquires and explains the class decision boundaries that are formed. "Recall" refers to the operation of the classifier once the decision boundaries have been formed (i.e., after training). These desirable properties are summarized in Table 2. FIGS. 8a and 8b illustrate a simple two-dimensional feature space example of learning nonlinear decision boundaries. For example, Feature 1 can be the length of an object and Feature 2 can be the weight of an object. The "circles" plotted represent one category of objects and the "boxes" can represent a different category of objects. FIG. 8a depicts a two dimensional feature space example, and FIG. 8b depicts a linear decision boundary that separates the two object categories. One way to separate (classify) the two categories of objects is to draw a line between them (linear decision boundary) as shown in FIG. 8b.
TABLE 2
Desirable Classifier Properties.
Desirable Classifier Learning Properties
Nonlinear The ability to learn nonlinear decision boundaries is
Classification an important property for a classifier to have.
The decision boundaries for the collision
avoidance problem can be extremely complex
and, when extending this problem to a high-dimensional
feature space, this capability becomes critical.
Classify In complex systems, a single class can be represented
Multimodal by many different feature vectors. It is desirable to
Feature Space have a classifier that can handle these various feature
Distributions vector realizations a single class may exhibit.
Automatic The classifier will need to handle a massive amount
Learning of data. As such, the classifier should be able to
automatically learn class decision boundaries from the
data with minimal human intervention.
Incremental The classifier will need to be updated regularly and
Learning quickly. Many classifiers require complete retraining
when new data is added. Complete retaining can be slow
and require a great deal of storage for all the feature
vectors, yet is typically done off-line and can easily
be accommodated.
Minimal All classifiers have some number of tuning parameters
Tuning that are used to fine-tune the learning process.
Parameters It is important that there be as few parameters as
possible. Furthermore, the behavior that results from
the adjustment of these parameters should be well
understood.
Verification The ability to explain the decision-making process is an
and Validation important property for real-world systems. Because of the
nature of the collision avoidance system problem, this
capability is intensified.
Minimize Mis- The classifier should be capable of minimizing
classifications the misclassification rate when two classes overlap.
Desirable Classifier Recall Properties
Graded A classifier should be able to report the degree to
Membership which a feature vector belongs to each of the classes
in the system.
Novelty One interpretation of graded membership is the ability
Detection to perform novelty detection. Novelty detection refers
to the ability to determine if the current feature vector
sufficiently matches any of the known classes.
Incomplete The classifier system will perform feature extraction
Data from available data, but the data might be incomplete.
A classifier should be capable of making a decision
when a reasonable number of features are missing.
Class Some classifiers have the ability to generalize, or
Generalization increase the size of, class decision boundaries
During Recall during recall. This is desirable when the training
data does not represent test data well and when
(re)training time intervals are lengthy.
Confidence The ability to weight the confidence in extracted
Weighting feature metrics is a desirable property for some
classifiers. Some features are more reliable than
others. Feature metrics with greater confidence can
lead to decisions that are more reliable.
FIGS. 9a and 9b illustrate a "real world" problem that makes the implementation of neural networks difficult. FIG. 9a shows another simple two dimensional feature space example. Yet in this example, the best decision boundary to separate the two classes is not a line but an ellipse (nonlinear decision boundary) as shown in FIG. 9b. When extending this problem to a higher dimensional feature space, the capability to learn nonlinear decision boundaries often becomes critical to achieving good performance. Table 3 provides a listing of notable vector classifiers with a discussion of how well they meet each of the properties discussed in Table 2. Two classifiers not listed in Table 3, the k.sup.th -Nearest Neighbor and the Fisher Linear Discriminant, can be grouped under "classical" pattern recognition techniques, yet should still be considered as valid potential solutions to a classification problem. The classifiers listed in Table 3 are neural network classifiers, with the multilayer perceptron being one of the most widely studied and used in practice. The disadvantage column describes some traits, such as "processing missing and weighted features," as "difficult." Nevertheless, these difficulties can be overcome via model-based approaches to training or by selecting appropriate neural network parameters. Neural networks have added a new dimension to solving classification problems. Classical pattern recognition techniques have been used in the past by a small community, but since the advent of neural networks, many disciplines in science and engineering have ventured into this area because of the ease in training and implementing neural networks and also the powerful properties they exhibit. Many types of networks lend themselves to efficient parallel processing implementations with reasonable computational and memory requirements. They can be implemented by writing a neural network program to run on a personal computer and they can be implemented in hardware as a chip embedded with software instructions. FIGS. 10 and 11 illustrate differing parsing systems for partitioning feature space. There are many types of neural networks that have been applied to many different problems. Yet they can be placed into two broad categories: clustering neural networks and error criteria minimization neural networks. Clustering neural networks attempt to parse up a feature space using some set of basis functions. FIGS. 10 and 11 are a good example of parsing the feature space into two sections. FIG. 10 depicts ten radial basis units to partition the feature space, and FIG. 11 depicts four elliptical basis units to partition the feature space. A good example of a clustering neural network is a Basis Function Classifier (BFC). Error criteria minimization neural networks operate on a training database and attempt to minimize the classification error between a true class vector and the neural network output. The most widely known network of this type is the Multilayer Perceptron (MLP). As opposed to discussing neural networks in general, we will present some detail on the two above mentioned neural networks regarding architecture and training methods. Table 3 shows a brief comparison of these classifiers. The BFC provides useful information about how the decision boundaries are drawn. Real-world automatic classification systems, especially those that make decisions that lives and pocketbooks depend on, should be able to explain why a decision was made. Knowing these decision boundaries allows the basis function classifier to easily identify objects or events that are novelties, that is, different from the training set data. Novelty detection can be useful in flagging events not yet encountered. The MLP, in general, does not provide decision boundary information. The only way to obtain it is through extensive testing, and with a high-dimensional feature space, the task is all the more difficult. The BFC uses a basis function (a popular choice is a multivariate Gaussian density) that may be a poor basis function for the feature space; the MLP does not have this limitation and can draw any nonlinear decision boundary. The basis function classifier has a well-understood recall (during testing) parameter that allows the generalization of decision boundaries, the MLP does not. The MLP often requires less memory and is often more computationally efficient than the BFC. The basis function classifier and MLP classifiers are similar as well. Both can learn nonlinear decision boundaries and have training parameters that aid in generalizing decision boundaries. Both also have a graded membership capability that enables them to report the degree to which a feature vector belongs to each of the classes in the system.
TABLE 3
Comparison of Basis Function Classifier
and the Multilayer Perceptron Classifier
Classifier Brief Description Advantages Disadvantages
Basis Determines the H Able to create The basis
Function best mean vectors nonlinear decision function selected
Classifier needed to represent boundaries. may be a poor
Neural the feature space Provides decision choice for the
Network spanned by a given boundary feature space.
set of input information. Clustering neural
vectors, uses the Provides a graded networks
mean vectors as the membership and degenerate to a
center of a basis novelty detection. k.sup.th nearest
function, and then Decision boundary neighbor
forms linear generalization classifier if all
combinations of parameters during events in the
these to make training and recall. classifier are very
classification Framework allows unique (k-basis
decisions. the use of any basis units).
function type.
Multilayer A possible nonlinear Able to create No generalization
Perceptron mapping between nonlinear decision parameters during
(MLP) feature vectors and boundaries. recall.
Neural classes is learned Approaches Bayes Does not provide
Network by performing a decisions. decision
gradient descent Provides a graded boundary
in error space membership. information.
using the back- Decision boundary Not able to
propagation generalization perform novelty
algorithm. parameters during detection.
training.
With respect to the classification of vehicles, the MLP neural network processing method has generally been found to be most optimal considering the hardware available, practical software implementations, and the problems to be solved. The MLP process reduces the computational burden in using fewer multiply and addition operations than other neural network processes such as elliptical Basis Units. MLP has a structure that makes for easily implementable Dot product operations. However, as mentioned above, the other neural network processes have advantages that may make them more attractive for the solution of particular problems, as advances are made in hardware/software processing. FIG. 12 is a block diagram of a multilayer perceptron neural network. This network has two functional layers of processing between the input and output, yet is often called a "three layer network" because the input is counted as a layer. It shows graphically the feed-forward operations of a two-layer network. The feed-forward operation for each node is given by y=sgm(w.sup.T x+w.sub.bia) (1) where w is the K.infin.1 adaptive weight vector, x is the K.infin.1 input vector, and w.sub.bias is the adaptive bias weight, y is the output, and ##EQU1## The most widely used and known training algorithm for MLP's is backpropagation. Before describing the algorithm, first some notation is provided for an MLP with three functional layers. The square error derivative associated with the jth mode in layer 3 is defined as .delta..sub.j.sup.(3) =sgm'(s.sub.j.sup.(3))(d.sub.j -y.sub.j)for j=1 to N.sub.3 (3) where d.sub.j is the desired response from node j, N.sub.3 is the number of nodes in layer 3, and ##EQU2## The square error derivative associated with the j'th node in layer 2 is defined as ##EQU3## where N.sub.2 is the number of nodes in layer 2. The square error derivative associated with the j"th node in layer 1 is defined as ##EQU4## where N.sub.1 is the number of nodes in layer 1. Some trainers are designed so that a weight update occurs after all training templates are presented to the network (form of batch processing). The square error derivatives calculated in the trainer are actually the average of all the template's square error derivatives, e.g., ##EQU5## The instantaneous gradient vector estimate for node j in layer 3 with inputs from layer 2 is defined as ##EQU6## The instantaneous gradient vector estimate for node j' in layer 2 with inputs from layer 1 is defined as ##EQU7## The instantaneous gradient vector estimate for or node j" in layer 1 with inputs from layer 0 (input vector) is defined as ##EQU8## The most significant improvements are obtained by changing the way the weights update. The weight update equation for the original trainer at iteration k (layer and node notation dropped for convenience) is given by w.sub.k+1 =w.sub.k +.DELTA.w.sub.k (11) where .DELTA.w.sub.k =.alpha..delta..sub.k X.sub.k (12) and .alpha. is a fixed parameter for all weights and is called the learning rate. Practical .alpha. values range from 0.01 to 1.0. A simple improvement to speed up training is the implementation of an adaptive learning rate for each weight. The learning rate update equation is given by .alpha..sub.k+1 =.kappa..alpha..sub.k if .gradient..sub.k.gradient..sub.k-1 >0 .alpha..sub.k+1 =.lambda..alpha..sub.k if .gradient..sub.k.gradient..sub.k-1 <0 (13) where .kappa. is a constant greater than unity (typically 1.02) and .lambda. is a constant less than unity (typically 0.9). If the past and present instantaneous gradient estimates are of the same sign, this indicates that a minimum lies ahead and the learning rate should increase to speed up the learning. If the past and present instantaneous gradient estimates differ in sign, this indicates that a minimum is being jumped over and the learning rate should decrease to recover quickly. As known in the art, other methods to speed up MLP training are QuickProp, Delta-Bar-Delta, and ALECO. FIG. 13 is a flowchart depicting a method for identifying a vehicle. Although the method is depicted as a sequence of numbered steps for clarity, no order should be inferred from the numbering unless explicitly stated. The method begins with Step 1300. Step 1302 generates electronic signatures in response to receiving data from a single sense point. Step 1304 analyzes the signatures. Step 1306 classifies vehicles in response to analyzing the signatures. Step 1301 electrically senses vehicles at the single sense point. Generating electronic signatures in Step 1302 includes generating electronic signatures in response to sensing vehicles. Electrically sensing vehicles at the single sense point in Step 1301 includes sub-steps. Step 1301a supplies an electrical signal. Step 1301b generates a field at the first sense point in response to the electrical signal. Step 1301c measures changes in the electrical signal in response to changes in the field. Generating electronic signatures in Step 1302 includes generating electronic signatures in response to the measured changes in the field. In some aspects of the invention, electrically sensing vehicles at a single sense point in Step 1301 includes using a single loop inductive sensor as the sense point. Supplying an electrical signal in Step 1301a includes supplying an electrical signal to the inductive loop. Generating a field in response to the electrical signal in Step 1301b includes generating a field with the electrical signal supplied to the inductive loop. FIG. 14 is a flowchart illustrating additional details of the method of FIG. 11. The method begins with Step 1400. Step 1402 electrically senses vehicles at the single sense point using a single loop inductive sensor. Step 1402a supplies an electrical signal to the inductive loop. Step 1402b generates a field with the electrical signal supplied to the inductive loop. Step 1402c measures changes in the electrical signal in response to changes in the field. Step 1404 generates electronic signatures in response the measured changes in the field received from the single sense point sensing vehicles. Step 1406 analyzes the signatures. Step 1408 establishes a plurality of vehicle classification groups. Step 1410 selects a vehicle classification group in response to each analyzed signature. In some aspects of the invention, establishing a plurality of vehicle classification groups in Step 1408 includes establishing vehicle classifications selected from the group including passenger vehicles, two-axle trucks, three-axle vehicles, four-axle vehicles, five or more axle vehicles, buses, and motorcycles. In some aspects, establishing a plurality of vehicle classification groups in Step 1408 includes establishing vehicle classifications based upon criteria selected from the group including vehicle length, which is related to the number of axles, and the proximity of the vehicle to the ground (the loop), which is an indication of weight. Step 1401 learns a process to form boundaries between the plurality of vehicle classification groups. Analyzing the signatures in Step 1406 includes recalling the boundary formation process. Selecting a vehicle classification group in Step 1410 includes making a decision to associate a signature with a vehicle classification group. Step 1412 converts the classified vehicle into a symbol. Step 1414 supplies the symbol for storage and transmission. In some aspects of the invention, learning and recalling a process to form boundaries between the plurality of vehicle classification groups in Steps 1401 and 1406 includes using a multilayer perceptron neural networking process. Step 1411a determines vehicle lengths in response to vehicle classifications. Step 1411b calculates vehicle velocities following the determination of vehicle length. In some aspects of the invention, analyzing signatures in Step 1406 includes determining vehicle transition times across the single sense point. Calculating vehicle velocities in Step 1411b includes calculating velocities in response to the determined vehicle lengths and the determined vehicle transition times. A system and method have been provided for identifying vehicles with a single inductive loop. Examples have been given of highway applications, but the invention is generally applicable to any system that seeks to identify passing objects with an inductive, or alternate sensing detector. Other variations and embodiments will occur to those skilled in the art.
|
Same subclass Same class |
||||||||||
