System and method for personalized information filtering and alert generation6381594Abstract A search engine that forms a compact representation of a plurality of user queries to efficiently find desired information in an information network. The search engine comprises a profile processor having logic to receive the queries from the users and a search module. The search module is coupled to the profile processor and has logic to receive the information content, to combine the user queries into a master query, and to match the master query with the information content to determine matching content. The search engine also includes logic to analyze the matching content to determine if any of the queries has been satisfied. Claims What is claimed is: Description FIELD OF THE INVENTION
1. police + sting Used to find all information content that includes
the keyword "police" and the keyword "sting."
2. python - monty Used to find all information content that includes
the keyword "python" but not the keyword
"monty."
3. "great barrier reef" Used to find all information content that includes
the phrase "great barrier reef."
User Profile FIG. 4 show a block diagram of a user profile 400 constructed from information provided by a user to the profile processor 204. The user profile 400 contains several types of information relating to the user and the user's desired search criteria. For example, the user profile 400 contains user identification information 402 which may include the user's network address and a time stamp for one or more queries contained in the user profile. The user profile 400 also contains user contact information 404. The user contact information 404 may include rules that are use to contact the user regarding the results of any particular query. For example, the user may submit a query and request to be notified of the results by fax, email, or by an automated voice message to a landline or wireless telephone. Any type of communication mechanism may be used to contact the user regarding a search result. The rules included in the contact information may also specify a time for reporting the search result. For example, the user may wish to be notified regarding a search result in one hour, later that day, or even periodically every day. Thus, it is possible for the user to enter specific contact rules to set up a variety of notification scenarios regarding the results of any particular query. The user profile 400 also includes user queries 406. The user queries are in the form of strings that include keywords and Boolean expressions specifying the information desired by the user. Query Pre-Processing In addition to matching strings, the search engine compares numbers that indicate quantities or prices. For example, a user may want to be alerted when an auction item reaches a certain price or when a shopping item drops below a certain price. For example, if a user would like to query for when the price of an item is less than $25.00, a query could be created as follows: price<2500 Since exact price matching for a very large number of users is a computationally expensive operation, one embodiment of the present invention reduces exact price matching in some instances to the matching of price ranges. Each price range is represented by a predefined keyword for which the above keyword matching techniques will apply. Price range matching may be applied to query properties where additional precision is not relevant, thus maintaining efficiency without losing accuracy. Matching Numbers and Prices.paragraph. The prices of items in a product category typically fall into a general price range. For example, desktop computers may range in price from $500 to $2500. The price range may be partitioned into (N+2) intervals, where N can be any reasonably small number (e.g. 10). For example, if N is 4, then the six intervals of pricing for the price range of desktop computers would be [0, 500], [500, 1000], [1000, 1500], [1500, 2000] [2000, 2500] [2500, infinity]. Other numeric al characteristics, such as price ranges above or below a specific price, can be encoded with (2N+2) keywords, where each keyword includes an interval limit. For example, the interval limit (below.sub.-- 1000) means that the price is $1000 or below and the interval limit (above.sub.-- 1000) means that the price is $1000 or above. Therefore, referring to the example of the desktop computer, if N is 4, it is possible to form (2N+2) keywords to represent 10 price ranges for desktop computers as follows. 1. computer_desktop_price_below.sub.-- 500 2. computer_desktop_price_below.sub.-- 1000 3. computer_desktop_price_below.sub.-- 1500 4. computer_desktop_price_below.sub.-- 2000 5. computer_desktop_price_below.sub.-- 2500 6. computer_desktop_price_above.sub.-- 500 7. computer_desktop_price_above.sub.-- 1000 8. computer_desktop_price_above.sub.-- 1500 9. computer_desktop_price_above.sub.-- 2000 10. computer_desktop_price_above.sub.-- 2500 Suppose the user chose to be notified when the price range of a desktop computer is between 1000 and 1500. It is possible to create a user profile formed by only two keywords connected by the Boolean AND operator, as follows: computer.sub.13 desktop_price_above.sub.-- 1000 AND computer_desktop price_below.sub.-- 1500 Therefore, for a given (N), the price of each product item that comes into the search system can be expanded into (N+1) or (N+2) keywords. For example, if a desktop computer sells for $1395, it can be automatically expanded into (N+1) keywords as follows. computer_desktop_price_above.sub.-- 500 computer_desktop_price_above.sub.-- 1000 computer_desktop_price_below.sub.-- 1500 computer_desktop_price_below.sub.-- 2000 computer_desktop_price_below.sub.-- 2500 As a result, the $1395 desktop computer will match the example user criteria since it contains both of the specified keywords, namely: computer_desktop_price_above.sub.-- 1000 AND computer_desktop_price_below.sub.-- 1500 In a case where the price is on the boundary of the specified range, it is possible to expand the price to (N+2) keywords. For example, if the price of the desktop computer in the above example was $1000, the following (N+2) keywords are derived: computer_desktop_price_above.sub.-- 500 computer_desktop_price_above.sub.-- 1000 computer_desktop_price_below.sub.-- 1000 computer_desktop_price_below.sub.-- 1500 computer_desktop_price_below.sub.-- 2000 computer_desktop_price_below.sub.-- 2500 One advantage of doing the above expansion to the user query is that it can occur dynamically during matching time, while the query size in the profile stays small. Therefore, the cost of the required expansion space remains a constant; it is always (N+1) or (N+2) keywords. To summarize, once (N) is selected, price interval keywords can be defined. Both the user queries and the price of items in received documents can be mapped to those interval keywords, so that it is possible to detected when the price and query keywords match. Query Normalization One embodiment of the present invention operates to normalize queries that are input by users as part of the pre-processing stage. Normalization may also occur on queries that have been expanded as described in the pricing examples above. In general, any query can be represented by a series of"conjunctions" connected by the Boolean OR operator. For example, given the following search query: keyword1 AND (keyword2 OR keyword3) the following normalized query containing two conjunctions can be created: (keyword1 AND keyword2) OR (keyword1 AND keyword3) The portions of the normalized query in parenthesis represents conjunction portions. Depending on the query, the conjunction portions may contain one or more keywords and include the Boolean operators AND or NOT. Search Processor Operation FIG. 5 shows a detailed block diagram of the search processor 206 constructed in accordance with the present invention. The search processor is used to process all the user queries into a master query, which is matched with the incoming information content stream. The search processor 206 includes a processor 502, a query hash 504, a keyword hash 506, and a conjunction hash 508, all located in a shared memory 510. The search processor 206 also includes a private query hash 512 and a private conjunction hash 514, both located in a private memory 516. The shared memory and the private memory may be formed from different memories or from a single memory. Query Hash The processor 502 receives the user queries from the profile processor 204 via input 518, performs the query expansions and normalizations as necessary and creates the master query by filling in the hash tables (506, 504, 508, 512, 514) in the shared 510 and private 516 memories. After the master query is created, the processor 502 receives information content via input 520 and matches the information content with the master query. The results are output to the memory 210 and the notification processor 208 via the output 522. The following description will reference the following exemplary user queries from four users, which are shown below as conjunctions having keyword "kw" entries. For example, kw1 and kw2 can represent price interval keywords as demonstrated above, while query4 searches for an exact numerical match to kw5. User1 (query1): (kw1 AND NOT kw2) User2 (query2): (kw2 AND kw3) User3 (query3): (kw2 AND "pw1 pw2 pw3") User4 (query4): (kw5<100) Keyword Hash Table FIG. 6 shows hash tables in the shared 510 and private 516 memories completed from the exemplary user queries 602 defined above. The keyword hash table 506 includes a Keyword column 604 where each keyword in the user queries is entered. Each keyword in the keyword hash table is associated with a conjunction pointer found in a ConjunctionPtrList column 606. The conjunction pointers point to all query conjunctions that use that particular keyword. The conjunction pointers are also associated with a NOT Flag indicator that indicates if the keyword in a particular conjunction was used with the NOT attribute. In this case, a zero means the keyword was used without the NOT attribute, while a one indicates that the keyword was used with the NOT attribute. In addition, a Value 607 parameter is associated with each conjunction, wherein a value is included for exact number matching. For example, query4 conjunction1, which corresponds to kw5 and includes a value of 100 as shown at 609. Each keyword in the Keyword column 604 is further associated with one of five different keyword types shown in a Types column 608. The keyword types are:
regular (r) This type keyword has a unordered list of
Conjunction pointers.
lessThan (lt) This type keyword has conjunction pointers sorted
in increasing order of value.
lessOrEqual (le) This type keyword has conjunction pointers sorted
in increasing order of value
greaterThan (gt) This type keyword has conjunction pointers sorted
in decreasing order of value.
greaterOrEqual (ge) This type keyword has conjunction pointers sorted
in decreasing order of value.
null This type keyword is the first word of a phrase.
With respect to query4, the type for kw5 is "lt" as shown at 611. Each keyword in the Keyword column is further associated with a phrase length value shown in a MaxPhraseLength column 610. This column has entries that represent how many words are included in a keyword phrase, with a phrase length value of zero being assigned to a one-word keyword phrase and a phrase length value of 1 being assigned to a two-word keyword phrase, and so forth. FIG. 6 also shows the query hash table 504. The query hash table 504 associates the user queries in a query ID column 612, with conjunctions in a ConjunctionPtr column 614. FIG. 6 also shows the conjunction hash table 508, which is used to assemble information about every conjunction. Every conjunction is represented by a ConjunctionID 616, which is associated with a counter default 618 that stores the number of keywords in each conjunction. A WordPointerList column 620 contains a pointer to the keywords in the WordsHash table 506 for each conjunction. The Keywords hash, Query hash and Conjunction hash can be stored in a shared memory 510, so that several matching processes can read the hash contents concurrently. The query hash and conjunction hash have corresponding tables in a private memory 516. A private query hash 626, includes a match column 628 that is used during processing to indicate when a query in the query ID column 629 matches any incoming information content. A private conjunction hash 630, includes an Eval counter column 632 that is used during processing to keep track of the number of keywords found in the incoming information content for each conjunction. The records in the private memory contain state information that is local to the execution of each process, so that it is possible to have several private memories in use during operation of the invention. Each matching process performs the matching of incoming documents against the master query and stores information during the matching in its associated private memory. In order to increase the throughput of a matching processor, multiple matching processes can be executed simultaneously. In this case, each process uses its private memory for non-shareable state information. Search Processing (Keyword and Phrase Matching) FIG. 7 shows a search method 700 for searching incoming information content in accordance with the present invention. When starting a matching process for each incoming document, the Private Query Hash 626 and the Private Conjunction Hash 630 are created in the following manner: Read lock the Query Hash table 404 to prevent changes during creation of private memory; Iterate over all queries and create the Private Conjunction Hash 630 for each conjunction; and Release Read lock on the Query Hash 404. After creating the private memory, the searching method provided in FIG. 7 is used to match all user queries with each document of the incoming information content. At block 702, the search method 700 begins by receiving information content which may comprise, for example, a stream of documents relating to real-time weather reports or auction information. At block 704, the incoming documents are filtered to remove duplicate words. At block 706, the EvalCounter 632 is set to 0 for all query conjunction entries and Match flag 628 is set to zero for all queries. At block 708, a check is made to determine if any words remain to be matched. This check determines a condition where an entire document has been checked and, if no words remain to be checked, results in a branch to block 710, which is discussed in detail below. At block 712, a word is retrieved from the filtered document for matching. At block 714, a test is performed to determine if the retrieved word is in the word hash table 406. If the word is not in the word hash table, the method proceeds to block 708 to look for the next word. If the word is in the word hash table, the method proceeds to block 736. At block 736, a test is performed to determine if the keyword is part of a phrase. If the maxphraselength parameter associated with the keyword is zero, then the keyword is not part of a phase, and so the method proceeds to block 716. If the maxphraselength is greater than zero, the keyword is part of a phrase, and so the method proceeds to block 738. At block 738, a phrase is built from the original unfiltered document by starting at the current keyword and including additional words until the phrase has a length equal to the maxphraselength associated with the keyword. The method then proceeds to block 740. At block 740, the newly constructed phrase is substituted for the word retrieved from the filtered document at block 712. The method then proceeds to block 714 where the test at that block determines if the phrase is in the keyword hash. If the phrase is found, the associated maxphraselength will be zero and so the method will flow through the test at block 736 and proceed to block 716. At block 716, the first entry in the conjunction pointer list 606 associated with the keyword (or phrase) is retrieved. At block 718, the NOT flag associated with the conjunction pointer is tested. If the not flag is set, the method proceeds to block 720, where the Eval counter entry for the conjunction is set to 255. This indicates that this conjunction has not been matched. At block 722, if the NOT flag associated with the conjunction is not set, then the Eval counter is incremented by 1, which indicates that a match occurred between the keyword and the conjunction. At block 724, a test is made to determine if there are any more conjunctions entries associated with the keyword. If there are not, the method proceed to block 708 to retrieve the next word in the document. If there are additional conjunction entries, the method proceeds to block 716 to get the next entry in the list for the or each word in the received article a test is made to determine if the word is in the keyword hash table. At block 710, after each word in the document has gone through the matching process, the method proceeds here to analyze the results. At this block, the conjunction Eval counter and default value for a selected conjunction associated with a selected query are retrieved. For example, referring to query1, in the query hash table, the Eval counter and default counter for conjunction Query1_Conj1 is retrieved. At block 726, a test is made to determine if the Eval counter is equal to the counter default. If the Eval counter is not equal to the counter default, then the conjunction has not been satisfied and the method proceeds to block 730. If the Eval counter matches the counter default, then the conjunction has been satisfied and the method proceeds to block 728. At block 728, the match flag for the query is set to one since the keywords specified by the conjunction were matched in the document. Thus, the query has at least one conjunction that matches the information in the document. The method then proceeds to block 732. At block 730, if the Eval counter did not match the default counter, a test is performed to determine if any more conjunctions are associated with the current query. If there are more conjunctions to be tested, then the method proceeds to block 710 to test these conjunctions. If there are no more conjunctions associated with the query then the method proceeds to block 732 to process other queries. At block 732, a test is performed to determine if there are any additional queries to be tested. If so, the method proceeds to block 710. If all queries have been tested, the method proceeds to block 734. At block 734, notification messages are sent to users whose queries have a match value equal to 1. The notification can be immediate or delayed as required by the notification rules as discussed in other sections of this document. Exact Number Matching Exact number matching can be used when a user query is searching for an exact price instead of a price within a specified range. In one embodiment, sorted lists are used for the exact number matching. When the list is sorted in increasing order, it is simple to step through the list from the beginning to determine all users that have signed up for an alert upon finding the desired exact value (in most cases the value would be a price). The problem with large sorted lists is that as queries are added or removed, the INSERT and DELETE operations become computationally expensive. In order to alleviate the problem of high computational costs, a data tree structure is used. A binary search tree of height h can implement any of the basic set operations--such as INSERT and DELETE--in O(h) time. The set operations are fast, if the height of the search tree is small, but if its height is large, their performance may be no better than a linked list. Red-black trees are one of many search-tree schemes that are "balanced" in order to guarantee that basic set operations take O(log n) time in the worst case. A red-black tree is a binary search tree with one extra bit of storage per node: its color, which can be either red or black. By constraining the way nodes can be colored on any path from the root to a leaf, red-black trees ensure that no such path is more than twice as long as any other, so that the tree is approximately balanced. In one embodiment of the invention, a red-black tree is used to order and maintain the ConjunctionPointerList 606. The red-black tree can return for any value (price) the set of surrounding intervals and during operation of the method 700 above, the Evalcounter variable 632 in private ConjunctionsHash can be incremented when a match occurs. Auto Suspend Alerts In one embodiment, notification alerts may be suspended for a period of time after they have been tripped. For example, a stock price alert for XYZ>80 would need to be suspended after the stock has been traded over $80 for the first time on a given day. A user would not want to be alerted repeatedly for the rest of the trading day if the price stays over $80. In this case, the alert is suspended for the rest of the trading day after it has been tripped. Auto Delete Alerts In one embodiment, the notification alert is removed after the alert has been tripped. For example, a search query such as "alert me when the movie Casa Blanca is released on DVD" will only happen at one point in time. Therefore, the notification alert is not needed after the alert has been tripped. In this case, tripping the alert would also result in an action to remove the alert from the system. Indexing Incoming Articles To further improve the matching performance of the search engine, one embodiment included in the present invention indexes the incoming articles into a pre-organized set, which can be processed in the following way. A set of n articles (4 articles A1 to A4) is collected. Each word in the article set gets assigned a bit vector of length n. When a bit is set in the bit vector, the particular word is present in the corresponding article. The bit vectors get initialized before the search method 700 is executed. Another column in the Keyword Hash Table 506 is included so that the keyword entries in the Keyword Hash Table have an additional pointer to the bit vector for an individual word. After the method 700 has been executed, one additional step is required to determine the set of documents that needs to be returned to a single user. For all Conjunction IDs with an Evalcounter equal to it respective Counterdefault, the WordPointerList pointer is followed to obtain the pointer to the bitvector for all keywords in the conjunction. All bitvectors are AND together to produce the bitmask of all articles that matched the conjunction. For example, to index 4 articles consider the following two queries: ConjId1=blue AND black ConjId2=Sunnyvale AND Rent_600 The Bitvectors for each word in the article collection can be expressed as follows:
Incoming Articles
Words A1 A2 A3 A4
blue 0 0 1 1
black 1 0 1 0
sunnyvale 0 1 1 0
Based on the above, the Keyword Hash Table would include a Document Bit Vector Pointer column that would contain the following information.
KeyWord Hash Table
Keyword DocumentBitvectorPtr ConjunctionsPointerList
blue 0011 ConjId1
black 1010 ConjId1
sunnyvale 0110 ConjId2
rent_600 NULL ConjId2
If the keywords "blue" and "black" are searched for, then by ANDing together the corresponding bit vectors, it can be determined that article 3 contains both and thus present a match to the query. Therefore, the above article indexing allows multiple articles to be organized and searched simultaneously, thereby reducing processing costs. EXAMPLES Shopping/Classifieds/Auctions User Interface In most cases the Boolean query language used in embodiments of the present invention may be hidden by an HTML form where the user selects from a set of predefined choices. Alternatively, a persistent query can be derived from a regular search query that the user entered or from the category that the user is currently browsing. Shopping for Computer An HTML interface for the computer category may have the following attributes: Brand: XXX, YYY, ZZZ, XYZ, YYZ, ZZX Processor (at least): 486, Pentium, Pentium Pro Minimum Price: 800, 1000, 1200, 1400 Maximum Price: 800, 1000,1200, 1400 Memory (at least): 8, 16, 32, 48, 64, 128 Hard Disk (at least): 1, 1.5, 2, 3, 4, 6, 8 CD Rom (at least): 2x, 4x, 6x, 8x, 10x DVD(at least): 2x, 4x, 6x, 8x, 10x Modem (at least): 14.4, 19.2, 28.8, 33.6, 56 From the above information, a search query could be generated. For example, if a user is looking for a XXX computer in the price range of $1200 to $1400, with at least a Pentium processor, the HTML query system would generate the following query: Shopping_Computer AND Brand_XXX AND Proc_Pentium AND ShopPrice_above.sub.-- 1200; Shopping_Computer AND Brand_XXX AND Proc_Pentium AND ShopPrice_below.sub.-- 1400; Shopping_Computer AND Brand_XXX AND Proc_PentiumPro AND ShopPrice_above.sub.-- 1200; Shopping_Computer AND Brand_XXX AND Proc_PentiumPro AND ShopPrice_below.sub.-- 1400. Classifieds Classified ads for may include attributes other than price, such as automobile make, model, location, phone area code, price range, year, mileage, for sale by (owner/dealer). To accommodate these other attributes, embodiments of the notification engine allow queries such as, "Find a car in the Bay Area in the price range from $2000 to $5000", to be processed into: Classifieds_Car AND Location_CA_SFO AND CarPrice_above.sub.-- 2000 Classifieds_Car AND Location_CA_SFO AND CarPrice_below.sub.-- 5000 Auctions Auctions are a good candidate to apply exact number matching. With exact number matching it is possible to immediately notify users if their current bid gets outbid. For example, a query can be constructed to automatically determine when a bid is exceeded, such as: Auction_Toy AND blue AND eye AND furby AND CurrentBid>51.75 The above description is illustrative and not restrictive. Many variations of the invention will become apparent to those of skill in the art upon review of this disclosure. The scope of the invention should, therefore, be determined not with reference to the above description, but instead should be determined with reference to the appended claims along with their full scope of equivalents.
|
Same subclass Same class Consider this |
||||||||||
