Method and product for integrating an object-based search engine with a parametrically archived database5802524Abstract The invention relates to directing one or more object-based search engines to an object store, which stores information archived according to parametric classifiers. In order for the object-based search engine to access the objects, a special index class, in the form of a SQL table is created. The index class comprises attributes of a part number, an object identification number, an item identifier, a search state attribute, and a search engine identifier, an object being assigned a corresponding value for each attribute and each object being uniquely identified by the item identifier attribute. The search engine is activated depending on the type of data object to be indexed, in accordance with the search state attribute the catalog table. A computer readable medium is provided with program code to implement the above integration. Claims What is claimed is: Description FIELD OF THE INVENTION
TABLE 1
______________________________________
The Search Index Class
Text Text
Item Search Search Search Item Part Time Search
ID Engine Index State ID No Stamp Info
______________________________________
i1 SM SMIDX1 tobeup-
p1 psz1 basic DEU
dated
i2 SM SMIDX1 queued p2 psz2 update
ENU
time
i3 SM SMIDX1 indexed
p3 psz3 update
DEU
time
i4 OTHER FUNNY tobeup-
p4 psz4 basic ENG
IDX dated
i5 OTHER FUNNY weirdstate
p5 psz5 update
CHT
IDX time
i6 SM SMIDX2 tobede-
p6 psz6 basic FRA
. leted . .
. . .
. . .
in pn pszn
______________________________________
Column ItemID corresponds to a unique identifier for the object data indexed by the catalog. Column SearchEngine identifies which search engine will be used to index or is search the object. For Search Manager, the attribute SearchEngine="SM". Other values may also be used. Column SearchIndex corresponds to the index within the search engine will be used to index the object. For Search Manager, the value of this field will be SMServerSMIndex. Column SearchState indicates the type of processing that is to be performed on the row object. SearchState may assume basic values corresponding to To.sub.-- Be.sub.-- Deleted, To.sub.-- Be.sub.-- Added, and To.sub.-- Be.sub.-- Updated. The inclusion of the search state attribute allows the index class to be updated according to data changes made in the object store such that referential integrity may be maintained between the contents of the index on which a parametric search is performed, and the objects of the index class on which the more comprehensive search may be performed. For Search Manager, the corresponding codes for the SearchState attribute is as follows.
______________________________________
SIM.sub.-- SEARCH.sub.-- STATE.sub.-- TOBEUPDATED
0 .times. 100
SIM.sub.-- SEARCH.sub.-- STATE.sub.-- TOBEDELETED
0 .times. 200
SIM.sub.-- SEARCH.sub.-- STATE.sub.-- QUEUED.sub.-- DELETE
0 .times. 301
SIM.sub.-- SEARCH-STATE.sub.-- QUEUED.sub.-- UPDATE
0 .times. 302
SIM.sub.-- SEARCH.sub.-- STATE.sub.-- INDEXED
0 .times. 400
SIM.sub.-- SEARCH.sub.-- STATE.sub.-- ERROR
0 .times. FFF
______________________________________
Column TextltemID corresponds to an identifier of a set of objects that satisfies a particular parameter in the parametric search. In Digital Library, objects corresponding to an item are managed by a Folder Manager and Library Client application program interface (API). The parameters p1 . . . pn are identifiers corresponding to the location in the object store where sets of objects associated with the TextltemID value are stored. These identifiers direct Search Manager to the object store so that the set of objects may be processed according to the SearchState attribute. The TextPartNo corresponds to an individual object that satisfies the parameter corresponding to the TextItemID in the parametric search. The TextPartNo, in turn, directs Search Manager to an individual object so that a string search may be made on the object. The parameters psz1 . . . pszn are identifiers corresponding to object store locations holding the object associated with the particular TextPartNo and TextltemID. These identifiers direct the object-based search engine to the object store so that the particular object may be processed. Column TimeStamp corresponds to a time identifier for an object and is also used to ensure referential integrity. The time stamp is set to the NULL value when Digital Library operates to index items added to the object store. The TimeStamp is set to a current Digital Library time upon activation of Search Manager. It is noted that the Search Manager may be periodically "pushed" using a known SM user.sub.-- exit call or the Search Manager may be periodically "pulled" to self-activate. When a request is made to add, delete or update an object to be indexed by Search Manager, the TimeStamp attribute for that object is set to a non-NULL value so the Search Manager recognizes that the object requires updating. The changes between the NULL and non-NULL states of the TimeStamp attribute allows the Search Manager to track when modifications are being made to the object, especially in the case when successive updates are requested in Digital Library while the external search engine is performing re-indexing. Column SearchInfo, in Search Manager, corresponds to a three character language code of the text which is indexed by Search Manager. The SQL table can be created using conventional database tools. As a specific example, the above index class may be created using the DL API Ip2CreateClass or LibDefineIndexClassO. The details of the operation and parameters for these API's are set forth in VI ImagePlus Applications Programming Reference, Version 1 (1995) (hereinafter "VI Reference"), which is incorporated by reference. Ip2CreateClass is a Folder Manager API available in DL Library for creating a user defined index class for use by the library server. LibDefineIndexClass is a Library Client API available in DL Library for creating, changing, or deleting an index class. Alternatively, the above index class of Table 1 may be created using the SysAdmin graphic user interface available with Digital Library operating on OS/2. Objects, which are stored in or added to the object stores, are made available to an external search engine through the above index class. It is noted that using the above index table, several different external search engines may be linked to the object stores by setting other values for the SearchEngine attribute. Thus, this implementation of the catalog table expands the search capability on these stored items, which were previously archived and searchable using only the particular parameters recognized by the DL library catalog. Maintaining the Index Class The index class must also provide a way to update the index class according to the changes in the contents of the object store. More particularly, the index class upon which the Search Manager acts must indicate when a particular object has been created, updated, or deleted to maintain referential integrity between the object store and the index class. It is noted that the Digital Library contains its own procedures for updating the information contained in the object stores, specifically, API's SimLibCreateItem and SimLibWriteAttr. However, the sole reliance on these existing procedures to maintain the new search class will allow slippage in referential integrity between the object stores and the contents of the new index class. In order to prevent such slippage, three new procedures to create, delete, and update the index class are created once the index class is defined similar to Table 1. The new procedures for maintaining referential integrity are based on existing API's available in Digital Library. Adding an Object to the Index Class When a client requests an object to be added to the object store, the index class must be correspondingly updated. This function is performed by calling the new API SimLibCreateItemPartExtSrch which sets SEARCH STATE=SIM.sub.-- SEARCH.sub.-- STATE.sub.-- TOBEUPDATED and the attribute TimeStamp=NULL. The API is based on the existing API's used by Digital Library's object server. The parameters for SimLibCreateItemPartExtSrch largely overlap with the parameters for the DL API's SimLibCatalogObject, SimLibStoreNewObject, and SimLibCreateItem. These DL API's are briefly described as follows. SimLibCatalogObject stores a new object from an existing file. SimLibStoreNewObject stores a new object from memory. Once the object is obtained either from a file or from memory, SimLibCreateItem creates a row in the index class for the added object. API SimLibCreateItemPartExtSrch operates on the following parameters. The parameters defined for Digital Library include the conventional data types defined for OS/2 VisualAge C++, Version 3 and IBM C SET++ for AIX, Version 3. Details are
______________________________________
ULONG (HSESSION HSession,
SimLibCreateItemPartExtSrch
PATTRLISTSTRUCT pAttributeList,
USHORT usNumOfAttrs,
USHORT usIndexClass,
PITEMID pszItemID,
HOBJ hObj,
ULONG ulConCls,
ULONG ulAffiliatedType,
PVOID pAffiliatedData,
PSZ pszFullFilename,
PVOID PObjBuffer,
ULONG ulObjSize,
PSZ pszSearchEngine,
PSZ pszSearchIndex,
BOOL bCallExtSrch,
PRCSTRUCT pRC)
HSESSION hSession:
library session identifier, created by API
SimLibLogon which verifies client
access and privileges
PATTRLISTSTRUCT
pAttributeList:
pointer to an array of ATTRILISTSTRUCT
defining the index class created to integrate
the object-based search engine
USHORT usNumOfAttrs:
number of in pAttributeList array
USHORT usIndexClass:
index class
PITEMID pszItemlD:
pointer to item ID of preexisting ITEM
uniquely identifies an object in the index
class
HOBJ hObj: pointer to object handle block,
corresponds to the part number and
text item id number as designated in
the object store
ULONG ulConCls:
content of class id
ULONG ulAffiliatedType:
type of affiliated object
PVOID pAffiliatedData:
pointer to data structure
PSZ pszFullFilename:
pointer to filename
PVOID pObjBuffer:
pointer to memory buffer
ULONG ulObjSize:
size of object
PSZ pszSearchEngine:
pointer to name of Search Engine
PSZ pszSearch Index:
pointer to Search Index
BOOL bCallExtSrch:
if TRUE, call SimLibProcExtSrch which
will activate the object-based search
engine to update the index class
PRCSTRUCT prc:
a return code to confirm that API
call is successful
______________________________________
The conditions on the following parameters should be noted when using the above-listed object server API's to implement SimLibCreateItemPartExtSrch. The parameters pszItemID and (pAttributeList,usNumOfAttrs, usIndexClass) are mutually exclusive. Pointer pszItemID should be used if an Item for the object already exists. The parameters (pAttributeList, usNumOfAttrs, usIndexClass) should be used to create the Item. Any parameters not used should be NULLed. The parameters pszFullFilename and (pObjBuffer and ulObjSize) are mutually exclusive. Pointer pszFullFilename should be used for an object obtained from a file on disk. The parameters (pObjBuffer and ulObjSize) should be used for an object obtained from memory. Any parameters not used should be NULLed. The parameters hObj, ulConCls, ulAffiliatedType, pAffiliatedData are used by SimLibCatalogObject or SimLibStoreNewObject depending on the source for the new object, i.e. from an existing file or from memory. The pointers pszSearchEngine and pszSearchIndex are used to provide values for the column attributes SearchEngine and SearchIndex to complete an object row in the catalog table. If this application sets bCalIExtSrch to FALSE, the API can either depend on the "pull capability" of the Search Manager to self-activate periodically or invoke the new API SimLibProcExtSrch at a later time to index the new item using the object-based search engine. Parameters for pre-existing API's also have the conditions that pAsycnCtl must be NULL and pSMS must be NULL. For SimLibCreateItem, the parameter usItemType must be SIM.sub.-- DOCUMENT. For SimLibCatalogObject: fCreateControIBITS must be SIM.sub.-- CLOSE. Deleting an Object from the Index Class When a client requests an object to be deleted from the object store, the index class must be correspondingly updated. This function is performed by calling the new API SimLibDeleteItemPartExtSrch which sets SearchState=SIM.sub.-- SEARCH.sub.-- STATE.sub.-- TOBEDELETED and TimeStamp=NULL. The API is based on the object server API's SimLibDeleteObject and SimLibDeleteItem. The former is used to delete an individual part PartItemID associated with the designated attribute value TextItemID. The latter is used to delete all objects associated with the designated attribute TextItemID. API SimLibDeleteltemPartExtSrch operates on the following parameters.
______________________________________
ULONG SimLibDeleteItemPartExtSrch
(HSESSION hSession,
PITEMID pszItemID,
HOBJ hObj,
BOOL bCallExtSrch,
PRCSTRUCT pRC)
HSESSION hSession:
library session identifier, created by
API SimLibLogon
which verifies client access and privileges
PITEMID pszItemID:
pointer to item ID of preexisting ITEM
HOBJ hObj: pointer to object handle block, corresponds
to the part number and text item id number
as designated in the object store
BOOL bCallExtSrch:
if TRUE, call SimLibProcExtSrch which
will activate the object-based search engine
to update the index class
PRCSTRUCT pRC:
a return code to confirm that API
call is successful
______________________________________
Conditions for the above parameters are as follows. The parameters pszItemID and hObj are mutually exclusive. Use pszItemID with SimLibDeleteItem if an Item stored in the object store is to be deleted with all its parts. Use hObj with SimLibDeleteObj if only specified parts of the Item are to be deleted. NULL the parameters which are not used. The parameters and operations of the DL API's are explained in detail in the VI Reference. Replacing an Object in the Index Class with an Updated Version When a client requests an object to be replaced by an updated version in the object store, the index class must be correspondingly updated. This function is performed by calling the new API SimLibReplaceItemPartExtSrch which sets SearchState=SIM.sub.-- SEARCH.sub.-- STATE.sub.-- TOBEUPDATED and TimeStamp=Null. The API is based on the object server API's SimLibCopyObject and SimLibWriteAttr. The former copies an entire object to another. The latter updates the row information in the attribute table. API SimLibReplaceItemPartExtSrch acts on the parameters following parameters.
______________________________________
ULONG SimLibReplacePartExtSrch
( HSESSION hSession,
HOBJ hDestObj,
HOBJ hSrcObj,
BOOL bCallExtSrch,
PRCSTRUCT pRC)
HSESSION hSession:
library session identifier,
created by API SimLibLogon
which verifies client access and privileges
HOBJ hDestObj:
pointer to target object handle block,
corresponds to the part number and text
item id number as designated in
the object store
HOBJ hSrcObj: pointer to source object handle block,
corresponds to the part number and text
item id number as designated in
the object store
BOOL bCallExtSrch:
if TRUE, call SimLibProcExtSrch which
will activate the object-based search
engine to update the index class
PRCSTRUCT pRC:
a return code to confirm that API
call is successful
______________________________________
An additional parameter for the existing API SimLibCopyObject is that fdelete must be TRUE. Activating the External Search Engine When one of the above new API's is called through the library client, a Search Manager module user.sub.-- exit may be synchronously called. The object information is collected and requests Search Manager to index the new information according to the added, deleted or updated text. The user.sub.-- exit call is used to push Search Manager so that the search engine knows that the indexing task is to be performed. The Search Manager examines the index class and items in the index class will be selected for processing if the corresponding TimeStamp attribute value is not the NULL state and the SearchState attribute value is SIM.sub.-- SEARCH.sub.-- STATE.sub.-- TOBEUPDATED or SIM.sub.-- SEARCH.sub.-- STATE.sub.-- TOBEDELETED. After a row-object corresponding to the above attribute values is identified by the Search Manager, the search engine sets the time stamp to a current date and time, and the Search Manager schedules requests for updating the index class according to the SearchState attribute for an item. Thus, referential integrity is maintained, documenting changes in both the indexing for the parametrically archived database and the object-based search engine. An API to activate the Search Manager when information needs to be indexed may be created, using the following parameters.
______________________________________
ULONG SimLibProcExtSrch
(HSESSION hSession,
PSZ pszSearchEngine,
PSZ pszSearchIndex,
PITEMID pszItemID,
PSZ pszPartNo,
PRCSTRUCT pRC)
HSESSION hSession:
library session identifier, created by API
SimLibLogon which verifies client
privileges
PSZ pszSearchEngine:
pointer to name of Search Engine
PSZ pszSearchIndex:
pointer to Search Index
PITEMID pszItemID:
pointer to item ID of the part
PSZ pszPartNo: pointer to part no
PRCSTRUCT pRC: a return code to confirm that API call
is successful
______________________________________
If it is desired that the search engine operated in batch mode, the pointers pszSearchIndex, pszItemID, and pszPartNo should be set to NULL. In the Search Manager, the corresponding function call to activate the search engine is xxxScheduleRequesto, where the prefix is defined by the platform on which Search Manager is operating. The parameters of the function are as follows.
______________________________________
ULONG xxxScheduleRequest
(HSESSION hSession,
PSZ pszSearchEngine,
PSZ pszSearchIndex,
PITEMID pszItemID,
PSZ pszPartNo,
PRCSTRUCT pRC)
HSESSION hSession:
library session identifier, created by API
SimLibLogon which verifies client
access and privileges
PSZ pszSearchEngine:
pointer to name of Search Engine
PSZ pszSearchIndex:
pointer to Search Index
PITEMID pszItemID:
pointer to item ID of the part
PSZ pszPartNo: pointer to part no
PRCSTRUCT pRC: a return code to confirm that API
call is successful
______________________________________
When a request for indexing is scheduled by the Search Manager user exit on request of an application, a SM document id is passed together with the request. The document ID has the following structure.
__________________________________________________________________________
DLLibServerName
ItemId
PartNumber
SSitemId
LangCode
TimeStamp
__________________________________________________________________________
(1) (2) (3) (4) (5) (6)
__________________________________________________________________________
(1) DL library server
(2) DL item id
(3) DL item part number
(4) DL search service index class
(5) SM language code for the part
(6) Time stamp
Although a particular embodiment of the present invention has been shown and described with respect to Search Manager integration with Digital Library, it will be obvious to those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the present invention. Other database arrangements and object-based search engines may be integrated using a similar method and means as described above and encompassed by the appended claims.
|
Same subclass Same class Consider this |
||||||||||
