Database arranged as a semantic network5870751Abstract A semantic network is described in which data is stored in data storage areas at nodes of the network, there being a root node from which all other nodes depend, in a branched structure, and in which data stored at each node includes relationship information about related nodes other than nodes linked by a branch from said node. The relationship information includes a token identifying the type of relationship. The other node in a relationship may be a virtual node, which does not actually exist, but whose data content is stored within the relationship information. Claims I claim: Description TECHNICAL FIELD
______________________________________
/* describe the header record of the database */
typedef struct {
char eyec›8!; /* SEMNET */
int size; /* total bytes in memory segment
*/
int nfree; /* total free bytes */
int nrel.sub.-- name;
/* total relationship names
*/
int nsubj; /* total subjects */
int nreln; /* total half relationships
*/
int pcat; /* prime category offset
*/
int relnhead; /* head of relation list
*/
int relntail; /* tail of relation list
*/
int relntree; /* tree to find names quickly
*/
int freehead; /* head of free list
*/
int freeblok; /* current block to alloc from
*/
int ncomit; /* number of commits so far
*/
int lockid; /* who has it locked ?
*/
} semnet;
______________________________________
Example 1 shows the structure in memory of a header for a semantic network implemented in accordance with the present invention. "eyec" is the identifier or name used for this release of a semantic network. This may be used, for example, by a browser of the semantic network to determine that the structure in memory is indeed a semantic network and also what version number the semantic network is. "size" is the total number of bytes which this semantic network has been allocated in memory. The memory used may be a shared memory segment of random access memory or it may be a file stored on non-volatile storage. The file is managed by a file manager, which is part of an operating system. In the preferred embodiment, the UNIX operating system is used (UNIX is a registered trademark in the United States and other countries licensed exclusively by X/Open Company Limited). "nfree" is the number of free bytes, within the total number of bytes defined by "size", within the memory segment or file which is free. On initialization of the semantic network, all of the bytes of memory are placed in parcels, but the parcels may contain bytes, all of which are free, or they may contain information about subjects or relationships. "nrel.sub.-- name" is the number of relationship names which exists in this semantic network. Examples of relationship names are "is employed by" and "is father of". Each of these relationships names are defined by the contents of a relationship descriptor parcel (see Example 2). There may be many instances of each of the relationships within a single semantic network. For example, a father of multiple children will have multiple "is a father" relationships. But there is only a single relationship name of "is a father of". "nsubj" is the number of subjects. A subject in the semantic network implemented according to the present invention corresponds to a node 102,104,106 in the semantic network of FIG. 1. Each subject is defined by the contents of a subject descriptor parcel (see Example 3). "nreln" is the total number of half relationships which exists in the network. A half relationship is a relationship between a first node and a second node. In FIG. 1, a half relationship 150 exists between node 114 and 118. The relationship is directional, which is indicated by the arrow head shown in FIG. 1. If node 118 has a similar relationship to node 114 as node 114 has to node 118, then there will be another half relationship joining node 118 to node 114, with the arrowhead shown in the opposite direction. "pcat" is the prime category offset. The prime category is another term used for root node, so this is the offset in the semantic network structure where the information on the root node or subject can be found. The data at that memory location is interpreted in the same way as for any other subject. There may be many different relationship names within a single semantic network. The header includes an anchor pointer for a singly linked list of the relationship descriptors for the semantic network. This list is illustrated in FIG. 3. "relnhead" in Example 1 is a pointer 210 to the first entry 230 in a list of the relationship descriptors 230,232,234,236 defined for this database. Each relationship descriptor has a pointer to the next relationship descriptor. "relntail" is a pointer 220 to the last entry 236 in a list of the relationship descriptors defined for this database. Each relationship descriptor is defined in a parcel (described later), the parcel including a pointer ("next") to the next relationship descriptor in the list. In this way starting from the "relnhead" entry in the header, the linked list of relationship descriptors can be used to find a relationship descriptor quickly starting from the head of the list. In addition, the end of the list can quickly be located using "relntail" in order to insert an additional relationship descriptor. "relntree" is a pointer to a separate, superimposed binary tree which is used in a conventional manner to locate more quickly a relationship descriptor, given its name. The header includes an anchor pointer for a singly linked list of the free blocks 330,332,334,336,338,340,342 in memory allocated for the semantic network. This list is illustrated in FIG. 4. "freehead" is a pointer 310 to the first free block 330 of memory which has not been allocated to either the header itself or to a parcel used to store details of relationships or subject. Each block of free memory has a pointer to the next free block of memory. "freeblok" is a pointer 320 to the largest block 332 of free memory available for allocation. This pointer is used so that, by allocating memory from the largest block available, then over time, there is a greater chance that small blocks of free memory will combine together. In addition, there is also the greatest chance that the data to be inserted in the block will be able to fit into the block of memory allocated. From the description of the function of "freeblok", it can be seen that pointing to the single largest block of memory at all times is not essential, it must point to one of the largest blocks most of the time in order to perform the desired function. "ncomit" is the number of times that data has been committed. This indicates the number of times that data in the semantic network has been updated. In a simplest form of the network, without the ability for the network to be locked, this may be used to determine whether the semantic network has changed since a previous access. "lockid" is the id of the entity which has this semantic network locked. The entity is, for example, a process. The id may be any unique identifier identifying the entity which has the network locked. A lock is a mechanism by which use of a resource such as semantic network is restricted to the holder of the lock. The entity requests the lock for the semantic network. When the entity has been given the lock, the id of the entity is placed in the header of the semantic network. Once the entity receives that lock then no other entity may update the semantic network for which the lock was received. In a preferred embodiment, the entity obtains the lock for the entire semantic network and then locates the subject, relationship or relationship descriptor which it wishes to lock and checks a lock flag associated with that subject, relationship or relationship descriptor. If the lock flag does not indicate that the subject, relationship or relationship descriptor is already locked by another entity, then the entity locks the subject, relationship or relationship descriptor, and then releases the lock on the entire semantic network. When it wishes to release the lock on the subject, relationship or relationship descriptor, the process is repeated to release the lock. Relationship Parcel EXAMPLE 2
______________________________________
/* describe a relationship descriptor */
typedef struct {
unsigned short
len; /* parcel size
*/
unsigned int
next; /* single linked list
*/
int low.sub.-- tree;
/* lower names
*/
int high.sub.-- tree;
/* higher names
*/
unsigned short
rel.sub.-- num;
/* identifier */
char name›1!; /* at least a NULL
*/
} rel.sub.-- name;
______________________________________
Example 2 shows the structure in memory of a relationship descriptor parcel for a semantic network implemented in accordance with the present invention. "len" is the size, or length, of the parcel and is measured in bytes of memory. "next" is a pointer 231,233,235 to the next relationship descriptor parcel in the singly linked list shown in FIG. 3. "next" is used in conjunction with "relnhead" and "relntail", located in the semantic network header to form the list. In a preferred embodiment, a binary tree, separate from the linked list of relationship parcels, keyed on the names of relationships, is superimposed on the network. This binary tree uses "low.sub.-- tree" as a pointer to relationship descriptor parcels having names lower in position in the binary tree of relationship descriptor names. "high.sub.-- tree" is a corresponding pointer to relationship descriptor parcels having names higher in position. Without the binary tree, the search for a relationship name must traverse the pointers in the singly linked list of relationships illustrated in FIG. 3. With the binary tree, the required descriptor is found more quickly. However, such a binary, or any other type of tree is not an essential part of the invention. "rel.sub.-- num" is an identifier or token used to identify this relationship. "rel.sub.-- num" is used in the subject descriptor parcels to identify the type of a relationship. In this way, only a token need be stored in each of the subject descriptor parcels, the name of the relationship being capable of being found out by looking in the relationship descriptor parcel having that token in its "rel.sub.-- num" entry. This makes the subject descriptor parcels much more efficient in terms of storage. "name›1!" is the name of the relationship stored as text and is, for example, "Has member" or "is employed by". The text is terminated by a null character. Subject Parcel EXAMPLE 3
______________________________________
/* describe the various types of subject - integer, float, string and
sequence */
typedef struct {
/* a standard subject header */
/* a standard parcel header */
unsigned short
len; /* parcel size */
unsigned int
next; /* single linked list
*/
/* end of the standard parcel header */
/* remainder of the standard subject header */
unsigned char
sflags; /* see description below
*/
unsigned char
nrel; /* current number of relationships
*/
/* the relationship description below is repeated for each of the "nrel"
relationships */
/* describe a relationship */
unsigned char
rflags; /* see description below
*/
unsigned short
rel.sub.-- num;
/* ID of relationship
*/
union {
int rnum; /* parcel number of related subj
*/
int rival; /* integer - virtual subject
*/
float rfval; /* float */
char rcval›1!;
/* string */
char rsval›1!;
/* sequence */
} s;
/* end of the relationship description */
/* end of the standard subject header */
/* describe the value */
union {
int ival; /* integer - virtual subject
*/
float fval;
char cval›1!
char sval›1!;
} v;
/* end of the subject description
} subj;
______________________________________
Example 3 shows the structure in memory of a subject descriptor parcel for a semantic network implemented in accordance with the present invention. The subject descriptor contains a subject header, having a parcel header and a subject header, which includes the number of relationships which this subject has. There follows details of each of the relationships this subject has. Finally, there is the data for the subject itself. Subject header "len" is the size, or length, of the parcel and is measured in bytes of memory. "next" is a pointer to the subject extension parcel, if any. Subject extension parcels are described below. If there is not a subject extension parcel, then "next" contains no relevant value. "sflags" is a set of flags, bitmapped into a single byte. The flags used in subject parcels and associated with the subject information are defined as follows. The low order two bits are used to define whether the subject itself is an integer, a float, a string or a sequence. The next bit, "category" identifies whether this subject is a container subject for another subject. This is used to enforce the non-deletability of this subject until all of the subjects for which it acts as a container subject have been deleted. The next bit is not used. The next bit is a "lock" bit, which is used to identify the entity, if any, which has this subject locked. The next bit is a "delete" bit, which is used to indicate if this subject has been deleted. The high order two bits are unused. "nrel" is the number of relationships which this subject currently has. Following the first part of the subject header is information identifying each of the "nreln" relationships which this subject currently has. The structure used to define one of these relationships is repeated "nreln" times so as to describe each of the relationships. A subject parcel is actually defined as having a fixed number of relationships, of which only "nrel" is the number which have valid data. In a preferred embodiment, this fixed number is 4. The remainder do not contain valid data, but are filled with valid data when a relationship is added to this subject parcel. "nrel" is updated to reflect the new number of relationships currently in this parcel. "nrel" is never greater than the fixed number of relationships for which this subject parcel has memory allocated. If further relationships are required for a subject parcels beyond the fixed number, then a subject extension parcel, described later, is used. "rflags" is a set of flags, used in subject parcels and associated with relationship information. They differ in content from those described above for subject information. The low order two bits are used to define whether the related subject, that is the subject at the other end of the relationship, is an integer, a float, a string or a sequence. Three of the next four bits are defined in the same way as for "sflags", that is "category", "lock" and "delete". The bit which is not used, that is undefined, in "sflags" is defined in "rflags" as an "extend" bit. This indicates whether there is an extension parcel for a "virtual" subject. "Virtual" subjects are explained in the next paragraph and the effect of the "extend" bit, which is on how "rnum" should be interpreted, is discussed below together with "rnum". The next bit is used to identify if the related subject is a real subject, that is one found in a separate subject parcel, or a "virtual subject", that is one in which the relationship is unidirectional and the related subject (its value) is contained within the relationship itself. An example of the use of a virtual subject is when the subject itself is a data file, and the related subject is the state "read-only" (R/O). The related subject value "R/O" will easily fit into the area of memory used to describe the relationship itself, and so the related subject need not actually be a separate subject with its value and relationships stored at a separate node. In place of the pointer to the related subject, the value itself of the subject is stored. This has the advantage of more efficient usage of storage. However, it does mean that it is not possible to traverse the relationship to the related subject, and from the related subject, to find all of the data files whose status is read-only. The highest order bit "negative" is used to identify if the relationship is a negative one, that is, it points in a direction towards this subject from a related subject, rather than away from it, towards a related subject. "rel.sub.-- num" is the identifier or token used to identify this relationship. Only a token is stored in the subject descriptor parcels, the name of the relationship being capable of being found out by looking in the relationship descriptor parcel having that token in its "rel.sub.-- num" entry. An integer subject is just an integer number, the range of values depending on the number of bits used in the implementation of the programming language. In a preferred embodiment, 4 bytes are used to represent integer numbers. A range between -2.sup.31 (-2,147,483,648) and +2.sup.31 -1 (2,147,483,647) results from the use of 4 bytes. A float subject is a floating point number, again the range of values depends on the number of bits, but is generally between about 10.sup.-38 and 10.sup.+38. String subjects are merely strings of characters terminated by a null character. Sequence subjects are also strings of characters, but without the terminating character, the length of the sequence being obtained by implication from the parcel length stored in the header, less the fixed length of the header itself. For real subjects, that is subjects that are not virtual subjects, "rnum" is the parcel number of the related subject. What is stored here is a pointer in memory to the area of memory used to store the subject parcel at the other end of this relationship. For virtual subjects, the content of the related parcel is stored instead of the pointer to that parcel as "rival" if it is an integer subject, "rfval" if it is a float subject, "rcval" if it is a string subject and "rsval" if it is a sequence subject. In this way the content of the related subject can be accessed with having to find the subject at the other end of the relationship. However, the relationships which the related subject itself has, cannot be determined from the content stored at this end of the relationship, except of course, any relationships to this subject itself. In an exception to the above, if the subject is a virtual subject and the subject is a string (both of which can be determined from the value of "rflags" for this relationship) and the string is more than 4 bytes in length, including its null terminator, then what is stored at this location is a pointer to a relationship extension parcel, described later with reference to Example 5. A relationship extension parcel is merely used as an extension to the value of a virtual subject. The existence of a relationship extension parcel is known from the setting of the "extend" bit in "rflags" mentioned above. The setting of the "extend" bit is then used to determine whether "rnum" should be interpreted as a pointer to the relationship extension parcel, if the "extend" bit is true, or interpreted as the value, if it is false. The content of this subject parcel itself is stored as "sival" if it is an integer subject, "sfval" if it is a float subject, "scval" if it is a string subject and "ssval" if it is a sequence subject. The union {..} v; is a standard compiler structure that means that only one of the terms within the parentheses is used. Subject Extension Parcel EXAMPLE 4
______________________________________
/* subjects have extension blocks */
typedef struct {
unsigned short
len; /* parcel size
*/
unsigned int
next; /* single linked list
*/
unsigned char
mxrel; /* capacity of this block
*/
unsigned char
nrel; /* current number of relationships
*/
/* the relationship description below is repeated for each of the "nrel"
relationships */
/* describe a relationship */
unsigned char
flags; /* see relationship flag definition
*/
unsigned short
rel.sub.-- num;
/* ID of relationship
*/
union {
int rnum; /* parcel number of related subj
*/
int rival; /* integer - virtual subject
*/
float rfval; /* float */
char rcval›1!; /* string */
char rsval›1!; /* sequence */
} s;
/* end of the relationship description */
} subj.sub.-- ext;
______________________________________
Example 4 shows the structure in memory of a subject extension parcel for a semantic network implemented in accordance with the present invention. As mentioned above, a subject descriptor parcel only contains space for a fixed number of relationships. This limitation is because the subject descriptor parcel is created before the number of relationships is known and once created, it cannot be moved, since other subject descriptors refer to its position in their own relationships. This problem is overcome by the use of a subject extension parcel which contains only information on relationships. There is no subject data. The subject extension parcel is chained to the subject parcel to form a single linked list of indefinite length. The linked list starts from the "next" entry in the subject descriptor parcel, which points to the first subject extension parcel for that subject. The list continues with a pointer located in the "next" entry of the first subject extension parcel pointing to the second subject extension parcel. Further parcels can be chained to the list. The final parcel in the list contains no relevant value in the "next" entry. "len" is the size, or length, of the parcel and is measured in bytes of memory. "next" is a pointer to the next subject extension parcel in the single linked list, if any. If there is not a further subject extension parcel, then "next" contains no relevant value. "mxrel" contains the maximum number of relationships which can be stored in this subject extension block. The choice of this value is a trade-off between a large value, which is wasteful of storage, when there are many subject extension parcels with unused space for relationships and a small value, when there will be a need for many further extension parcels. In a preferred embodiment the maximum number of relationships stored is 10. "nrel" is the current number of relationships stored in this subject extension block. The relationship description, that is "flags", "rel.sub.-- num","rnum", "rival", "rfval", "rcval" and "rsval" is the same as that described above that is used in a subject parcel and is repeated as many times as there are relationships stored in this subject extension parcel. Relationship Extension Parcel EXAMPLE 5
______________________________________
typedef struct {
unsigned short
len; /* parcel size */
unsigned int
next; /* actual length of data if this
*/
/* parcel contains a sequence
*/
union {
int vival; /* integer value
*/
float vfval;
char vcval›1!;
char vsval›1!;
} v;
} reln.sub.-- ext;
______________________________________
Example 5 shows the structure in memory of a relationship extension parcel for a semantic network implemented in accordance with the present invention. Relationship extension parcels are used only for supporting virtual subjects having a length of more than the 4 bytes which cannot be fitted in if they are implemented as described above in a subject parcel. The actual value of the virtual subject is stored in this relationship extension parcel and a pointer placed in the subject parcel (in the location identified as "rnum") to point to this relationship extension parcel. "len" is the size, or length, of the parcel and is measured in bytes of memory. "next" is only used where the virtual subject is a "sequence" type, when it is used to store the actual length of the sequence. Otherwise, "next" contains no relevant value. The content of this virtual subject itself is stored as "ival" if it is an integer virtual subject, "fval" if it is a float virtual subject, "cval" if it is a string virtual subject and "sval" if it is a sequence virtual subject. The union {..} v; is a standard compiler structure that means that only one of the terms within the parentheses is used. The value stored in "rflags" location (lower order 2 bits) of the subject parcel to which the virtual subject is a related subject is used to identify the type of content of the virtual subject, and hence which of the definitions within the union statement should be used. Application Programming Interface An application programming interface is provided that allows the semantic network described above to be manipulated. The function of the interface is made available through a "C" interface. Calls are provided for setting up and maintaining the semantic network and for moving around and extracting information from the semantic network. The calls for setting up and maintaining the semantic network include:
______________________________________
db.sub.-- create
Create the semantic network - initialises the
allocated area of memory into parcels and
creates a root node.
db.sub.-- delete
Delete the semantic network.
db.sub.-- reset
Remove all the subjects and all the
relationships.
db.sub.-- create.sub.-- relation
Create a relationship descriptor - given a name
for the relationship, it creates a relationship
descriptor parcel and allocates a token to that
relationship.
db.sub.-- new.sub.-- integer,
Create a new subject. There are separate calls
db.sub.-- new.sub.-- float,
for creating each of the four types of subject
db.sub.-- new.sub.-- string,
presently defined in the embodiment of the
db.sub.-- new.sub.-- sequence,
semantic network described above. Given a
container subject, the type of relationship to
that container subject and a value for the
subject, a subject descriptor parcel is created.
db.sub.-- chg.sub.-- integer,
Change a subject value. There are separate calls
db.sub.-- chg.sub.-- float,
for changing the value of each of the four types
db.sub.-- chg.sub.-- string,
of subject presently defined in the embodiment
db.sub.-- chg.sub.-- sequence,
of the semantic network described above.
Given the subject name and the updated value,
the subject descriptor parcel is updated with the
changed value.
db.sub.-- chg.sub.-- integer,
Change a related subject value. There are
db.sub.-- chg.sub.-- rel.sub.-- float,
separate calls for changing the value of each of
db.sub.-- chg.sub.-- rel.sub.-- string,
the four types of related subject presently
db.sub.-- chg.sub.-- rel.sub.-- sequence,
defined in the embodiment of the semantic
network described above.
db.sub.-- delete.sub.-- subject
Delete a subject. If the subject is a container
subject, that is the CATEGORY bit in the flags
of any of the relationships is set, then this must
fail in order to preserve the acyclic containment
graph. All relationships from this subject are
deleted, with the exception of those used for
containment.
db.sub.-- virtual.sub.-- integer,
Create a new virtual subject. There are separate
db.sub.-- virtual.sub.-- float,
calls for creating each of the four types of
db.sub.-- virtual.sub.-- string,
virtual subject presently defined in the
db.sub.-- virtual sequence,
embodiment of the semantic network described
above. Given the real subject where the virtual
subject is to be actually located, the type of
relationship from the real subject to the virtual
subject and the value of the virtual subject, a
relationship is added to the subject parcel.
db.sub.-- add.sub.-- reln
Add a new relationship. Given the subject to
which a relationship is to be added to, the type
of relationship from that subject to the related
subject and the name of the related subject, a
relationship is added to the subject parcel.
db.sub.-- delete.sub.-- reln
Delete a relationship. If the relation is marked
as a container relationship, that is the
CATEGORY bit in the flags is set, then this
must fail in order to preserve the acyclic
containment graph. The related subject for
which this relationship is acting as a container
relationship must be deleted first.
______________________________________
|
Same subclass Same class Consider this |
||||||||||
