Compound document

Hitmask for querying hierarchically related content entities

6839701

Abstract

A web-based system, method and program product are provided for searching a content object (e.g., a custom compilation or prepublished work) stored in a data repository as a group of hierarchically related content entities. Each noncontainer content object is stored as a separate entity in the data repository. Each content entity is also stored as a row in a digital library index class as a collection of attributes and references to related content entities and containers. Each noncontainer content object is preferably stored as a separate entity in the data repository. Each content entity is also stored as a row in a digital library index class as a collection of attributes and references to related content entities. Each container and noncontainer is associated with a unique identifier that includes hierarchical information about its position in the hierarchy. Queries are executed on the hierarchical containers and noncontainers through an application or user-interface. The results of the independent searches are merged using hit masks. A hit mask is a string of bits, each bit representing a query. For each container and noncontainer in the result set, a hit mask is generated and ones of the bits are set to indicate which of the queries the container or noncontainer satisfies. Container hit masks are OR-ed with their child containers and/or noncontainers to reflect inheritance. Containers and noncontainers with all bits set comprise the merged result set.


Claims

What is claimed is:

1. In a data repository containing a plurality of hierarchically related content entities, a method for combining search results obtained for a plurality of queries, the queries being performed on entities of different hierarchical levels, comprising the steps of:

associating each entity with an identifier containing information about the hierarchical relationship of that entity to others of the entities;

for each entity in a result set, generating a hit mask comprising n bits, where n equals the number of queries, each bit corresponding to one of the queries, and wherein a value of `1` for any bit indicates that the entity is a hit for the corresponding query; and

using the entity identifiers to determine if any entity of the result set is a container entity that contains other entities in the result, if so, logically OR-ing the hit mask of the container entity with the hit masks of those entities in the result sets contained within the container entity.

2. The method of claim 1, further comprising the step of returning only those entities whose hit mask bits are all 1's.

3. The method of claim 1, wherein the hierarchically related content entities further comprise a parent container type and a child container type, wherein parent containers can contain child containers, and child containers can contain content entities.

4. The method of claim 3, wherein an identifier associated with each entity has the following format:

parentcontainerref.childcontainerref.contententityref

where parentcontainerref is a reference to a parent container, childcontainerref is a reference to a child container and contententityref is a reference to a content entity, thereby indicating the hierarchical level of the entity.

5. The method of claim 4, wherein the parent container type is a book, the child container type is a chapter, and the content entity is a section.

6. The method of claim 1, further comprising the step of creating a row in an entity table for each entity, each row including the entity's identifier.

7. The method of claim 1, further comprising identifying an entity, which in combination with its hierarchically related child entities satisfies the plurality of queries, by locating an entity with a logically OR-ed hit mask in which all bits are set to `1`.

8. A program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to perform method steps for combining search results obtained for a plurality of queries performned on entities of different hierarchical levels, the entities being stored in a data repository, comprising the steps of:

associating each entity with an identifier containing information about the hierarchical relationship of that entity to others of the entities;

for each entity in a result set, generating a hit mask comprising n bits, where n equals the number of queries, each bit corresponding to one of the queries, and wherein a value of `1` for any bit indicates that the entity is a hit for the corresponding query; and

using the entity identifiers to determine if any entity of the result set is a container entity that contains other entities in the result, if so, logically OR-ing the hit mask of the container entity with the hit masks of those entities in the result sets contained within the container entity.

9. The program storage device of claim 8, further comprising the step of returning only those entities whose hit mask bits are all 1's.

10. The program storage device of claim 8, wherein the entities further comprise a parent container type and a child container type, wherein parent containers can contain child containers, and child containers can contain content entities.

11. The program storage device of claim 10, wherein an identifier associated with each entity has the following format:

parentcontainerref.childcontainerref.contententityref

where parent containerref is a reference to a parent container, childcontainerref is a reference to a child container and contententityref is a reference to a content entity, thereby indicating the hierarchical level of the entity.

12. The program storage device of claim 9, wherein the parent container type is a book, the child container type is a chapter, and the content entity is a section.

13. The program storage device of claim 8, further comprising the step of creating a row in an entity table for each entity, each row including the entity's identifier.

14. The program storage device of claim 8, further comprising the step of identifying an entity, which in combination with its hierarchically related child entities satisfies the plurality of queries, by locating an entity with a logically OR-ed hit mask in which all bits are set to `1`.

15. A system for querying a plurality of hierarchically related content entities and combining search results obtained, comprising:

a data repository for storing the plurality of hierarchically related content entities;

means for associating each entity with an identifier containing information about the hierarchical relationship of that entity to others of the entities;

means for generating a hit mask for each entity in a result set, the hit mask comprising n bits, where n equals the number of queries, each bit corresponding to one of the queries, and wherein a value of `1` for any bit indicates that the entity is a hit for the corresponding query; and

means for determining from the entity identifiers if any entity of the result set is a container entity that contains other entities in the result, if so, logically OR-ing the hit mask of the container entity with the hit masks of those entities in the result sets contained within the container entity.

16. The system of claim 15, further comprising means for returning only those entities whose hit mask bits are all 1 's.

17. The system of claim 15, wherein the hierarchically related content entities further comprise a parent container type and a child container type, wherein parent containers can contain child containers, and child containers can contain content entities.

18. The system of claim 17, wherein an identifier associated with each entity has the following format:

parentcontainerref.childcontainerref.contententityref

where parent containerref is a reference to a parent container, childcontainerref is a reference to a child container and contententityref is a reference to a content entity, thereby indicating the hierarchical level of the entity.

19. The system of claim 18, wherein the parent container type is a book, the child container type is a chapter, and the content entity is a section.

20. The system of claim 15, further comprising means for creating a row in an entity table for each entity, each row including the entity's identifier.

21. The system of claim 15, further comprising a means for identifying an entity, which in combination with its hierarchically related child entities satisfies the plurality of queries, by locating an entity with a logically OR-ed hit mask in which all bits are set to `1`.


Description

REFERENCE TO A COMPUTER LISTING APPENDIX

Appendix A to this application is set forth on a single compact and the material recorded thereon is incorporated by reference herein. The following file is recorded on the compact disc: file name: AppendixA.txt; file size: 107 kB; date of creation: May 16, 2002.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to content management, and more specifically, to a system, method and program product for creating compilations of content from hierarchical content stored in a data repository.

2. Background of the Invention

Content management systems have enabled content of all types, e.g., text, still images, moving images, and audio content, to be stored digitally. Content management systems include, for example, relational databases, digital libraries, and media servers. They have further provided functions for manipulating the content, e.g., searching and editing capabilities.

It would be desirable to enable a user to take advantage of vast stores of content to create compilations tailored to the user's needs or desires. For example, a university professor would find value in creating custom textbook tailored to a specific course from prepublished textbooks stored in a content management system. This compilation could be further enhanced to include associated multimedia materials. As another example, a music lover would benefit from a system that allows him to specify musical selections to be included in a custom album. Such systems would have to partition large content objects (e.g., albums, books, videos) into smaller, selectable objects (e.g., musical selection, chapter section, episode) for inclusion in a compilation.

SUMMARY OF THE INVENTION

A web-based system, method and program product are provided for creating a compilation of content stored in a data repository as a group of hierarchically related content entities, managing, displaying, and searching the content, then creating and exporting compilations of content for publication. Also provided are a system, data structure, method, and program product for storing content into a repository for use in creating a compilation of content.

The content is hierarchical in nature. Accordingly, entities at each level of the hierarchy except the lowest are defined by "containers". For example, in the case of textual content, the hierarchical structure of the data may include book containers, volume containers, chapter containers, and subsections (noncontainers, because they are at the leaf level of the hierarchy). In the case of audio content, the hierarchical containers may be album, compact disk, and musical selection, and excerpts of the musical selections are defined as noncontainers. In the case of video content, the hierarchical containers may include movies and excerpts from each movie, and frames are defined as noncontainers. If desired, the maximum size of a container may specified. For example, the volume size in a custom book is preferably determined using a threshold value defining maximum amount of content allowable for that container, and a procedure is provided for managing content entities and containers to maintain this maximum.

The hierarchical data and associated metadata are preferably stored in a digital library that includes search support. A web-based user interface is provided for presenting a user with a plurality of selectable objects, each object representing a subset of the hierarchical data (e.g., chapter subsections, musical excerpts, video excerpts, etc.). The plurality of objects may represent all subsets of the stored content or less than all of the subsets (e.g., categorizing the content and by providing a bookshelf for each category that a user may browse). The user then selects one or more of the objects for inclusion in a compilation (e.g., a custom textbook). Alternatively, the user may search the content by specifying search criteria through the interface. Additionally, the user may create new content, e.g., a new chapter or section, for inclusion in the final compilation by inputting user-provided material through the web interface. The system preferably stores the new content and creates a reusable, selectable object associated with the new content.

Each noncontainer content object is preferably stored as a separate entity in the data repository. Each content entity is also stored as a row in a digital library index class as a collection of attributes and references to related content entities. Each container and noncontainer is associated with a unique identifier that preferably includes hierarchical information about its position in the hierarchy.

As the user selects desired objects for inclusion in a compilation, the system arranges the objects hierarchically, e.g., into volumes, chapters and sections according to the order specified by the user. The system then creates a file object (e.g., a CBO) defining the compilation that contains a list or outline of the content entities selected, their identifiers, order and structure. This file object is stored separately in the data repository.

The list or outline is presented to the user at the web interface as a table of contents, and may be edited through the interface. For example, the user may add content, delete content, or move content within and across containers. Editing the list or outline redefines the structure of the compilation. Once the user is satisfied with the organization of the compilation, it is submitted it for publication. The submitted compilation is then forwarded to an approval process and is accepted, rejected, or returned to the user with editorial comments appended by the editor.

An aspect of the invention is the calculation of the compilation's cost by estimating the amount of content it contains and determining a content cost based upon the content estimate. Optionally, a cost is assigned to each content entity in the data repository and these actual costs are summed as part of the cost estimation procedure.

Another aspect of the invention is to provide permission checking. Occasionally, it may be desired to prevent certain content entities from appearing a same compilation as other content entities. For example, an author may specify that his work can not be published in the same compilation as the work of another author. Permission checking first requires associating each container and noncontainer with any mutually exclusive containers or noncontainers. For example, such association may be achieved by defining a set of rules specifying containers and/or content entities that are mutually exclusive. Upon selection of a container or noncontainer to add to the compilation, the permission checking procedure determines if the container or noncontainer is mutually exclusive of any other containers or content objects, e.g., by consulting the rules. If so, the permission checking procedure then analyzes the compilation outline to determine whether any of the other mutually exclusive containers or noncontainers already exists in the compilation. If so, then the selected container or noncontianer is not added to the compilation and the user is notified that the content selected may not be included in the compilation. Otherwise, the content is added.

A further aspect of the invention is to provide prerequisite checking, wherein some entities are associated, e.g., by a set of rules, with content objects that are prerequisites to that object (e.g., front or backmatter associated with the subsection such as an introduction, appendix, or bibliography), and wherein selection by the user of an entity prerequisites causes automatic inclusion of all associated prerequisite objects in the final compilation.

Another aspect of the invention is the provision of a functional layer between the user interface and data repository for facilitating the creation, manipulation, storage and management of content objects in the data repository.

Another aspect of the invention allows a user to create multiple compilations concurrently. Yet another aspect of the invention allows a user to modify a compilation by creating a clone or copy of the compilation and applying user-specified changes to the copy (e.g., in the creation of a new edition or version of an existing work.)

Other aspects of the invention include a configurable model for storing hierarchically related data in a relational database, and a data structure for storing the data and associated metadata, whereby the hierarchical relationship of the data is preserved.

As a further aspect of the invention, queries are executed on the hierarchical containers and noncontainers through an application or user-interface. The results of the independent searches are merged using hit masks. A hit mask is a string of bits, each bit representing a query. For each container and noncontainer in the result set, a hit mask is generated and ones of the bits are set to indicate which of the queries the container or noncontainer satisfies. Container hit masks are OR-ed with their child containers and/or noncontainers to reflect inheritance. Containers and noncontainers with all bits set comprise the merged result set.

DESCRIPTION OF THE DRAWING

FIG. 1 is a block diagram representing the content management system according to the present invention;

FIG. 2 is a block diagram representing the content input path of the present invention;

FIG. 3 is a block diagram representing a digital library suitable for practicing the present invention;

FIG. 4 graphically represents the structures for storing content parts in a digital library;

FIG. 5 graphically represents the index classes used in storing content in a digital library;

FIG. 6 is a block diagram representing the path for creating and submitting compilations of content according to the present invention;

FIG. 7 represents parts of a compilation of content stored in the digital library;

FIGS. 8A-21B represent the interface of an embodiment of the present invention;

FIGS. 22A-22E represent the system administrator interface of an embodiment of the present invention;

FIG. 23 is a block diagram representing the path for approving and publishing compilations of content; and

FIG. 24 is a state diagram representing the states of a user, request and CBO at various stages of the process for creating compilations of content.

DETAILED DESCRIPTION

I. System Overview

FIG. 1 functionally depicts a system for creating compilations of content. It comprises three parts: a path for inputting content to the data repository (FIG. 2), a path for enabling a user to select content and organization from the data repository through a web-based interface for inclusion in a compilation of content (FIG. 3), and a path that interfaces with a publishing system for creating the compilation of content from the user's specification (FIG. 2). Each path will be described in detail below.

The present invention will now be described in terms of a specific embodiment for creating custom textbooks. The intended user group comprises university professors, for example. The content stored in the system comprises a plurality of published textbooks, broken down into hierarchically related objects: book, volume, chapter and chapter subsection.

Using the proposed system in this context, a university professor is able to access content from a collection of textbooks stored in a digital library and select books, volumes, chapters and/or chapter subsections for inclusion in a custom textbook, and is further able to create content objects for inclusion in the final work.

Although the specific embodiment is provided to facilitate the reader's understanding, it will be understood that present invention is of a much broader scope and may be applied in the creation of compilations of all types of content including text, image, audio and video content.

A. Receiving and Storing Content

In the exemplary embodiment of the invention, content and other information is input to digital library 20 through the input data path shown in FIG. 2. Briefly, the content and other information is input by a user in at an input interface represented by block 8. In the preferred embodiment, the input content is provided in SGML format, although other formats may be supported if desired. The content is forwarded by input application 8 to a converter 10 for conversion into the format expected by data loader 14.

After reformatting, converter 10 outputs the reformatted content and other information to a loader application 14. Loader 14 receives and maps the data for storage in the data repository according to a configuration model 12. According to the present example, the data repository is a digital library 20, and the configuration model 12 is specific to the IBM DB2(R) Digital Library data storage model. Loader 14 interfaces with the digital library 20 through the digital library client application 16. Using the configuration model 12, the content loader 14 is able to map the content and other information it receives in a manner appropriate for the structure of the underlying digital library 20. However, the loader 14 of the present invention may be reconfigured for other types of data repositories by defining a configuration model 12 for each data repository used. Thus if the data repository type is later changed, the configuration file 12 can be updated to reconfigure the input path without having to reprogram the loader application 14.

The elements of the input path will now be described in greater detail.

1. Digital Library

Examples of digital libraries suitable for use in the present invention are described in commonly owned U.S. Pat. No. 5,787,413 entitled "C++ classes for a digital library" issued to Kauffman et al., and U.S. Pat. No. 5,857,203 entitled "Method and apparatus for dividing, mapping and storing large digital objects in a client/server library system" also issued to Kauffman et al.

In the preferred embodiment of the present invention, the data repository comprises the commercially available IBM DB2 Digital Library. However, other commercially available data repositories may be used either in combination with, or in lieu of, the DB2 Digital Library

Digital libraries are used to store and manage a wide variety of digital objects such as documents, graphics, audio, video, spread sheets and word-processing text. A conceptual view of a conventional digital library client/server system is shown in FIG. 3 and includes a library server 44, one or more object servers 48 and a library client 42. Each of the library and object servers and the library client includes an information store. That is, the library server 44 includes a library catalog 46, the is object server 48 includes an object store 50 and the library client 42 includes a client cache 40. The client applications interface to the digital library through an object-oriented API 16. Also, a communications isolator (not shown) is included which allows the library server 44, object server 48 and library client 42 to communicate with one another without concern for complex communications protocols.

The library server, object servers and library clients are connected by a communications network, such as a wide-area network (WAN), but also can be locally connected via a local area network (LAN). In the conventional library client/server system the library client 42 is typically embodied in a workstation, such as a personal computer, and the library server 44 and object servers 48 are typically embodied in a host processor: generally a mainframe computer environment such as a MVS/ESA environment running under CICS. The library server 44 uses a relational database such as the IBM DB2 Universal Database or the Oracle database as a library catalog 46 to manage digital objects and provide data integrity by maintaining index information and controlling access to objects stored on one or more object servers. Object servers can also use a relational database such as IBM DB2 or the Oracle database to manage their contents. Library servers and object servers run, for example, on AIX and Windows NT.

Library Server. The library server 44 directs requests from clients to update or query entries in the library catalog 46, which contains object indexes and descriptive information. Library server 44 additionally performs searches and routes requests to the appropriate object server 48 to store, retrieve, and update objects.

Each user is assigned a set of privileges for access to the library by a system administrator. Library server 44 checks library catalog 46 before processing a request to ensure that the user's name and password are valid, and to ensure that the user has been granted the appropriate privileges to perform the requested action. An example of a library privilege is the ability to delete objects. In typical implementations, there are groups of individuals who need access to the same objects. Therefore, to simplify the process of granting access to objects a system administrator can define patrons as members of a group. When a patron is defined as a member of a group, that patron is able to access any object for which the group has been granted privileges.

The library server 44 also checks to ensure that the object's owner has granted the patron the privileges needed to do what is requested (e.g., update the object). The owner of an object is the user who first stored the object. When an owner stores an object that owner must specify which other patrons are to have access to the object.

If a client request involves the storage, retrieval, or update of an object, library server 44 forwards the request to the object server 48 that contains or will store the object(s) referred to in the request based upon information provided by library catalog 46. If the client request is a query of the information stored in library catalog 46, library server 44 will interact only with the library catalog 46 and will not contact object server 20.

Library Catalog. The library catalog 46 is analogous to a conventional library's card catalog. It is a set of database virtual tables or index classes which contain an index of all the objects stored in the library system and the object servers owning them. Each row of these virtual tables or index classes references one or more stored objects. Implicitly, the first column of each index class contains a unique digital library item identifier (e.g., the IBM DB2 Digital Library ItemID) for the object referenced by its corresponding row. Other information stored in an index class may include textual descriptions for each object, information on the type of object (e.g., image object, spreadsheet, text document), user names and privileges, access authorization data for each object, links between objects, and an object's properties.

An item is a row in an index class and a part is a file within the object server 48 that is stored in an access managed directory structure. The management access of the directory structure is performed by the object server 48, but the directory structure responsibilities are performed by the operating system (i.e. AIX, NT, MVS).

The library server 44 contains a parts table 62, as shown in FIG. 4, which resides in the library catalog 46. For each part or object in the library system, library server 44 stores information about that part. As shown in the parts table 62 in FIG. 4, the information stored for a part includes the item identifier (ItemID), a part number (PartID), a representation type (REP type) and an object server ID identifying which object server contains the object. In the presently described embodiment of the invention, the REP type is a default value (FRNSNULL).

When a part is stored in the conventional client/server library system 20, library server 44 assigns an item ID and a part number, which are 16 bytes and 4 bytes long, respectively. The item ID is a unique identifier for an item (i.e. row in the library server index class) to which the part belongs. For example, an item could represent a folder in which the part represents a document within that folder. Likewise, the part number is a unique identifier for that part.

The REP type field can be used to indicate the type or class in which the part is classified. For example, if the part is an image stored in a TIFF format, the REP type for that part could indicate that the part is a TIFF formatted image.

Object Servers. An object server 48 maintains objects stored within the library system. Objects are stored or retrieved from an object store 50 by object server 48. Object server 48 receives requests from library server 44 and communicates with library client 42 to complete the requests. Such a library system can contain several distributed object servers. Referring to FIGS. 3 and 4, the object server field in the library server's parts table 62 indicates the identifier for the object server 48 which owns the part. For example, if the part is stored on object store 50 of object server 48, the object server ID field will contain the identifier for object server 48.

Each object server 48 contains an object server table 64 as shown in FIG. 4. The object server 48 uses object server table 64 to manage storage of parts in its storage areas, such as the object store 50. Object server table 64 also contains the same item ID, part number and REP type for the part as does the library server parts table 62. The object server table also contains a file name for the part 66, which indicates the location in object store 50 of stored part 66.

When a user's privileges are defined a default object server can be set for that user. When the user stores an object, it will be stored in his default object server. If it is later determined that an object or a group of objects should be relocated to a different object server, a client application can cause those objects to be moved from one object server to another.

Library Client. The library client 42 is the interface through which application programs can submit requests to the library system. These can include requests to store objects, update/add descriptors to objects, delete objects and query information in the library catalog. Library requests can be submitted through the library client either individually or in batches.

The library client 42 includes a client cache 40 used to locally hold copies of objects that have been stored to or retrieved from the object server 48. These local copies allow very fast access to objects and provide a means for communicating between the library client 42 and the servers 44, 48.

Additional Search Support. IBM DB2 Digital Library includes parametric search support, and is integrated with text search support from the IBM Intelligent Miner for Text. The library server 44 may be further integrated with other search support 52. For example, image querying may be provided by IBM's Query by Image Content(QBIC) technology (see commonly owned U.S. Pat. No. 5,579,471 to Barber et al.).

In the present example for creating compilations of text, library server 44 is preferably coupled to the IBM Intelligent Miner for Text full text search support, allowing the user to automatically index, search, and retrieve documents based on a full text search. Text Miner allows users to locate documents by searching for words or phrases, abbreviations and acronyms, and proper names. In a typical LAN environment, a text search installation comprises one or more servers and several clients. The text search server program is installed on a machine with other Digital Library components. The text search client resides on client workstations and provides access to the server. Text search runs, for example, on AIX and Windows 95 and NT. In addition to the server and client components, text search uses dictionaries to support the linguistic processing of documents in different languages during indexing and retrieval. Dictionaries are installed on the server workstation, and at each client workstation.

Data Flow. Referring to FIGS. 3 and 4, when a requesting library client 42 requests an object, or blob, it sends a request to library server 44. Upon receipt of the request library server 44 consults the parts table 62, among other tables, in the library catalog 46 and determines which object server 48 owns and has the requested object stored in its object store 50. The request contains the item ID, part number and REP type of the requested part. Upon receiving the request, object server 48 retrieves the blob from object store 50 by consulting its object server table 64 and sends a copy of it to client 42. Object server 48 stores the blob in client cache 40. When the blob is successfully transmitted to client cache 40 object server 48 sends a response to library server 44 indicating a successful transfer of the blob to client cache 40. Library server 44, in turn, sends a response to requesting library client 42 indicating that the blob was successfully transferred, which allows the client 42 to retrieve the blob from client cache 40 for use by a client application.

When an application program submits a request for storage of an object in the library system, library client 42 creates a copy of the object in its client cache 40 to allow the appropriate object server 48 to retrieve the object. The library client then sends a storage request to library server 44. Included in the storage request is a handle to the object stored in the client cache 40. The handle is an identifier which is used to locate the object in the client cache.

Upon receiving the storage request, library server 44 updates tables in library catalog 46, including the parts table 62 shown in FIG. 4, to identify the object server 48 in which the object is to be stored. Typically, the object server 48 is selected by default based on the user's identity. Library server 44 then sends a request to object server 48 to retrieve the blob from the client cache 40 and store it in the object store 50. Included in the request is the handle of the object stored in client cache 40 and the item ID, part number and REP type of the part.

The object server 48, upon receiving the request to retrieve a copy of the object, retrieves the copy from client cache 40 and stores it in object store 50, then updates its object server table 64 accordingly to indicate a file name for the blob stored in object store 50. The file name uniquely identifies the location of the blob stored in object store 50.

Upon successfully storing a copy of the blob, object server 48 sends a response to library server 44 to notify it that the object was successfully stored. Library server 44 then updates its tables including the parts table 62 to indicate that the object is successfully stored in object server 48. The library server 44 sends a response to library client 42 indicating that the object was successfully stored so that the library client 42 can take further action based on the fact that the object was successfully stored in object store 50, such as deallocating memory resources for that object in client cache 32.

2. Data Model Definition

Storing content for use in creating a compilation of content first requires defining a Data Model, i.e., the constructs for mapping input content and other information in digital library 20. The data model is dependent on the constructs available within the underlying data repository. It is also defined by the nature of the content and information being input.

The content to be stored comprises products such as books, albums, images and videos. The content of each of these products may be organized hierarchically. For example, the hierarchy of a book may be defined by its volumes, chapters and chapter subsections. Since it is desired to create compilations of content from selected entities of these products, the content of the input products is partitioned into selectable entities. Information about the hierarchical relationship is also stored in the data repository. In the present example, other information to be stored includes user and content category definitions.

In the present example, the data repository is a digital library that includes a relational database, and the data model consists of entity groups defining the constructs in which the content is to be organized and stored within the relational database. Each entity group includes index class definitions, and may include part definitions. The parts store the actual content, and outlines describing the hierarchical relationship of the content entities. The index classes define relational tables for storing parametric attributes parametric (i.e. Integer, Float, Date, Time, String, Char, etc.) of the content, programs, and approval requests. The content index classes further include references to the parts containing them.

There are four entity groups in the present example: the Product Entity Group, the Program Entity Group, the CBO Entity Group and the Request Entity Group. The Product Entity Group defines the constructs for storing prepublished works or "products" in the digital library 20. These products provide the content from which a user can build a compilation of content. The Program Entity Group defines categories for content. In the present example these categories consist of academic programs. For example, "Freshman Engineering" is one program defined in the present example. The CBO Entity Group defines the constructs for storing a compilation of content. The Request Entity Group defines the contructs for storing information about requests for approval of compilations of content.

The following tables represent index class definitions, i.e., the meta definitions of the index classes. The rows within the figures define the columns of the index classes. For example, the Product_Aux index class contains 8 columns: SeqID, ProductItem, ParentItem, SiblingItem, ChildItem, Keyword, Value and NextValueItem.

Each primary index class contains a fixed number of columns. The columns of the index class definitions for the primary index classes define the primary index class column name (first column from the left), attribute type (second column), and source of the attribute value for each column of the index class (third column). In some cases, an attribute value is passed to digital library 20 by the loader 14 application, and the second column of the definition table is used to map the external attribute names to the internal digital library attribute names. In other cases, the attributes are program generated, as is indicated by the value "program generated" in column two. In the index class definition tables below, a fourth column has been added to each table to describe each column. It shall be understood, however, that this column is only provided to facilitate the reader's understanding and is not a part of the index class definitions.

The primary index class columns are restricted to single value attributes. Those columns that are multivalued or were not known when the system was first created are placed into the auxiliary index class.

The Program Index Class, Product Index Class and Request Index Class each have an associated auxiliary index class (ProgramAux Index Class, ProductAux Index Class, and RequestAux Index Class). Use of auxiliary index classes is generally understood by those skilled in the use of digital libraries. Each row within an auxiliary index class defines an additional (theoretical) column to a ROW in the corresponding primary index class (NOT to the entire primary index class). The column is theoretical in the sense that the digital library 20 does not handle auxiliary index class rows as additional columns in the primary index class. Rather, the API layer 30 provides the mapping mechanism to enable this theoretical column notion. Therefore users perceive these auxiliary index class rows as additional columns for a row, but in actuality they are stored as rows within the auxiliary index class. Theoretically, the primary index class appears as a table containing multiple rows and each row contains the columns defined in the primary index class definition plus those columns defined by rows in the auxiliary index class. In other words, these auxiliary index class columns (a.k.a. theoretical columns) are bound to a row within the primary index class and not the primary index class itself.

The manner in which an auxiliary index class defines theoretical columns on rows of a primary index class will now be described with reference to the Product Entity auxiliary index class. The ProductItem column (represented as a row in the auxiliary index class definition, below) contains the ItemID, a unique identifier for each row in the primary index class. This column forms the linkage between a row within the auxiliary index class and the corresponding row of the primary index class.

The keyword column of the auxiliary index class (not to be confused with the Keyword column of the auxiliary index class definition) represents the name of the theoretical column to be added to a row of the primary index class. The current domain of theoretical primary index class column names appears in the Keywords column of the product auxiliary index class definition, below (2nd column from left). For example, one theoretical column name is Pub_Med_Type.

Note: In the present example, the domain is not restricted by the digital library 20 other than that the names must not exceed the length of the keyword column definition. Therefore, the domain of theoretical primary index class column names can be continuously enlarged by simply adding additional columns to the auxiliary index class.

The Value column contains the value for the theoretical column identified by the auxiliary index class Keyword column.

In addition to defining additional theoretical primary index class columns, the auxiliary index class can store multiple valued theoretical columns and hierarchical theoretical columns. Similar to theoretical single valued columns, theoretical multiple valued columns can be represented within a relational datastore model by using rows of an auxiliary index class. In the single valued column, only one row is necessary. In the multiple valued column, two or more rows are necessary (1 row for each value needing to be stored). Each value in the multiple valued column is ordered. This order is then used to chain multiple rows within the auxiliary index class together. Furthermore, the NextValueItem column contains the unique identifier of the auxiliary index class row which follows in the multivalued chain.

For example, if one wishes to store a multivalued column, First_Name with values: Fred and Barney and the auxiliary index class row containing Barney in the Value column has a unique identifier equal to ABC then the NextValueItem column for the row containing Fred in the Value column is ABC. Thus, the NextValueItem serves as the pointer to the next value in the multivalued chain.

The ParentItem, SiblingItem and ChildItem columns in the auxiliary index class are used to store hierarchical attributes of a row. Since a book's data model is hierarchical, the concept of hierarchical attribute storage/retrieval is crucial. The ParentItem column of a row in the auxiliary index class contains the unique identifier or ItemID of another row in the auxiliary index class that holds a parent attribute of the current row. Similar to the multivalued columns, the children of a container are ordered (chained together). The unique identifier of the auxiliary index class row containing the proceding child is stored in the SiblingItem field. A container's first child's unique identifier is stored in the ChildItem column of the container row, thereby constructing a link between the container and first child, first child and second child and all other children after.

For example, the AC_Group column in the product auxiliary index class is a hierarchical attribute. AC_Group contains child attributes: ACFORMID and NUMBERAC. This inheritance is identifiable by the tabbing of the terms in the keywords column of the figure. Each AC_Group attribute contains an ACFORMID and NUMBERAC. Therefore the AC_Group is a kind of container.

This attribute family is represented by three rows within the auxiliary index class: one representing an AC_Group, one representing the ACFORMID and one representing the NUMBERAC. The parentItem column for the AC_Group row is blank to indicate that it is a parent attribute, whereas the parentItem column for the ACFORMID and NUMBERAC rows contains the unique identifier of the AC_Group row. The ChildItem column of the AC_Group contains the unique identifier of the ACFORMID row. The SiblingItem column for the ACFORMID contains the unique identifier of the NUMBERAC row. The NUMBERAC row's SiblingItem is left blank representing the last child of AC_Group.

The Product and CBO Entity Groups are associated with Part definitions, since these entities define constructs for storing content in the digital library 20.

Product Entity Group

The Product Entity Group includes two index classes: Product Index Class and ProductAux (Auxiliary) Index Class. These index classes define the storage model for existing content products and their associated attributes to be stored. More specifically, they are used to generate a Product Index class in a relational database representing the content products, and the parts containing the actual content, prerequisite material and hierarchical product outline.

"Product" in this context refers to an existing content product such as a book, album or video. Since users will be selecting excerpts of existing content products to include in a compilation of content, each content product is stored as a group of hierarchically related entities. Entities at each hierarchical level of the work except the lowest is defined by containers. In the present example, the containers are "book", "volume", and "chapter". Each container is described by the subentities or "content entities" it contains. For example, each "book.c" container includes references to all chapters denoted by the keyword, "chapter.c", contained in that textbook product. Similarly, each "chapter.c" container includes references to all sections contained in that chapter. The lowest level of the hierarchy is a "section". All three entities (book.c, chapter.c and section) are described by a unique sequence identifier. Each entity is represented by a row in the Product Index class.

Product Index Class

The product index class defines a relational Product Index Class that is populated with a row for each content entity. Thus for textbook products the resulting product index class includes a row for each book, volume, chapter and section. In addition, each associated component for an entity is also represented by a row in the index class This index class is used as a quick reference for obtaining attribute information about each product entity, as well as a reference to the actual part numbers containing the product files.

Each product entity is assigned a unique identifier or sequence ID. Preferably, the sequence identifier further includes intrinsic information about the hierarchical level of the entity. To illustrate, the sequence ID used to represent textbook components is in the following form:

XXXXXXXXXX.CC.SS

where XXXXXXXXXX represents a book's ISBN (International Standard Book Number?), CC represents the chapter number (if any) and SS represents the section number (if any). The CC and SS portions of a book entity sequence identifier will be zeroes. Similarly, the SS portion of a chapter entity sequence identifier will be zero. Thus the sequence number of a container serves as a reference to the subentities of that container, since all subentities will share the same ISBN and container reference number. For leaf entities, the sequence number is used as a reference to the entity's actual content in the data repository.

    Index Class
    Attribute
    Name            ATR Mapping            Type            Description
    SeqID           Seq_ID                 Ext..Alpha [32] Unique sequence
     identifier for product entity
                                           INDEXED
    EntityType      PSF                    Ext..Alpha [32] Entity type, e.g.,
     book, chapter, section
    ParentItem      Program generated      Ext..Alpha [16] Unique internal ID
     of any parent entity (e.g. For
                                                           a section entity,
     the parent would be its chapter
                                                           container)
    SiblingItem     Program generated      Ext..Alpha [16] Unique internal ID
     of the next sibling entity
                                                           (e.g. For a section
     entity, the siblings would be
                                                           other sections of
     the same chapter.
    ChildItem       Program generated      Ext..Alpha [16] Unique internal ID
     of the first child entity (e.g.
                                                           a chapter entity's
     children would be the sections
                                                           it contains.
    AuxItem         Program generated      Ext..Alpha [16] Reference to first
     entry in the auxiliary table for
                                                           this entity
    ProgramID       PE_ID                  Alpha [4]       Identifier of
     Program to which the product
                    AC_PE_ID                               belongs
    Status          Status (SGML)          Alpha [1]       Indicates if entity
     is available for browse,
                    AC_Status (AC)                         search or use in a
     CBO
    Title           Title                  Alpha [250]     Entity title
                    AC_Title
    Subtitle        Subtitle               Alpha [250]     Entity subtitle, if
     any
                    AC_Subtitle
    ISBN            ISBN                   Alpha [10]      Product ISBN
                    AC_ISBN
    CDAOID          CDAOID                 Ext. Alpha [8]  Associated component
     attribute
                    AC_CDAOID
    YearOfPub       Yr_of_Pub              Numeric [1]     *
                    AC_Yr_of_Pub
    Edition         Edition                Ext. Alpha [2]  *
                    AC_Edition
    Revision        Revision               Alpha [2]       *
                    AC_Revision
    Version         Content_Ver            Ext. Alpha [8]  *
                    AC_Content_Ver
    PubMediaType                           Ext. Alpha [20] Media type, e.g.,
     compact disk
    ContentType     Content_Type           Ext. Alpha [8]  Content type, e.g.,
     SGML
                    AC_Content_Type
    ContentFilename Filename               Ext. Alpha [254] Name of file
     containing the entity's content
                    AC_Graphic_Filename
    ImageType       AC_Image_Type          Ext. Alpha [8]  Type of image, e.g.,
     TIF.
    CharCount       SGML_Char_Cnt          Numeric [8]     Number of non-markup
     characters in content
                                                           (used to calculate
     CBO price)
    AC_ImageCount   AC_Image_Cnt           Numeric [3]     Number of associated
     component images in
                                                           content
    AvailabilityDate Date_of_Availability   Ext. Alpha [10] Date entity is
     available for use
                    AC_Date_of_Availability
    ExpirationDate  Date_of_Expiration     Ext. Alpha [10] Date entity is no
     longer available for use
                    AC_Date_of_Expiration
    CreateDate      Create_Date            Ext. Alpha [14] Date that table
     entry was created
                    AC_Create_Date
    CreatedBy       CreatedBy              Alpha [8]       Identifier of user
     who created entry
                    AC_CreatedBy
    LastModifiedDate Last_Modified_Date     Ext. Alpha [14] Last date entry was
     modified
                    AC_Last_Modified_Date
    LastModifiedBy  Last_Modified_By       Alpha [8]       Identifier of user
     who last modified entry
                    AC_LastModifiedBy
    PageCount       PageCount              Alpha [6]       Actual page count of
     content (used in CBO
                                                           pricing formula)


Part Structures & Text Indices

This table defines the digital library parts used to store each entity. For a row that represents a product entity, Part 1 contains the SGML content for a product entity. Parts 5-11 are parts containing subsets of that content that can be searched by Text Miner. The Text Index column contains the Text Miner indices for each of these searchable subsets. For a row that represents an entity's associated component, Part 20 contains the actual associated component file. (e.g., images).
        Part No.      Description              Text Index
            1         Content                  None
            5         Authored Abstract        EABSTRAC
            6         Generated Abstract       EABSTRAC
            7         Index Terms              EIXTERMS
            8         Key Terms                EIXTERMS
            9         Entity Structure Part    None
           10         Teaching Concepts        ETEACHCO
           11         Concepts Topics          ETOPICS
           20         Associated Component     None


ProductAux Index Class

In the present example, the auxiliary index class is used to define additional columns in specified rows of the Product Index class Specifically, each label in the Keywords column corresponding to the Keyword attribute defines the name of an additional column in the Product Index class The "value" attribute is the attribute type for each of these keywords. Indentations represent nested keywords. The SeqID, ProductItem, ParentItem, SiblingItem and ChildItem attributes specify the rows in the auxiliary Product Index class for storing hierarchical values. In the present example, "Index_Term" is an example of a multi-valued attribute, meaning that there may be more than one index term defined for each program entity. For performance reasons, the values of a multivalued attribute may be stored in separate rows of the Product Index class Thus The "Next ValueItem" attribute identifies the row of the next item in a set of attribute values. Multivalued attributes are structured as linked lists when loaded into digital library 20, and this order is maintained in the auxiliary Product Index class
    Index Class
    Attribute
    Name          Keywords               Type            Description
    SeqID         PSF                    Ext. Alpha [32] Sequence ID of entity
     that this
                                                         attribute belongs to
    ProductItem   Program generated      Ext. Alpha [16] Unique internal ID of
     the product
                                                         index class row that
     this attribute
                                                         belongs to
    ParentItem    Program generated      Ext. Alpha [16] Unique internal ID of
     the auxiliary
                                                         product index class
     row that is this
                                                         attribute's parent
     attribute
    SiblingItem   Program generated      Ext. Alpha [16] Unique internal ID of
     the auxiliary
                                                         product index class
     row that is this
                                                         attribute's next
     sibling attribute
    ChildItem     Program generated      Ext. Alpha [16] Unique internal ID of
     the auxiliary
                                                         product index class
     row that is the
                                                         first child attribute
     for this attribute
    Keyword       Pub_Med_Type           Alpha [32]      Media type (e.g.,
     compact disk, audio
                                                         tape, paper, etc.)
                  AC_Counts                              AC (Associated
     Component)
                                                         attribute group
                  ACFORMID                               AC type.
                  NUMBERAC                               The number for each AC
     type.
                  Index_Term                             Index term in a
     product entity
                  Key_Term                               Key term in a product
     entity
                  Contrib_Group                          This group defines
     properties re: one
                  Contrib_Role                           contributing author of
     a prepublished
                  Contrib_Title                          book. Since a book can
     have multiple
                  Contrib_First_Name                     contributors, more
     than one
                  Contrib_Middle_Name                    contrib_group of
     properties can exist
                  Contrib_Last_Name                      for that book.
                  Contrib_Suffix
                  Job_Title
                  Contrib_Affiliation
                  Contrib_Credentials
                  Use_Actuals                            Switch variable to
     determine if actual
                                                         or estimated page
     count is to be used
                                                         in calculating price.
    Value         Value depends on specific Ext. Alpha      Actual value for
     the keyword above
                  attribute keyword above [254]
    NextValueItem Program generated      Ext. Alpha [16] Unique ID of the
     auxiliary product
                                                         index class row that
     is the next value
                                                         in a multi-valued
     attribute.


Program Entity Group

It is sometimes desirable to categorize users and content to facilitate the creation of a compilation. For example, a system user who wishes to compile an album of classical music is not interested in viewing selections from a country music album. Audio content may therefore be categorized according to music type. The user may also be assigned to a particular category, either by default or by personal selection. In a system for creating custom textbocks, subsets are organized according to particular programs or disciplines. For example, prepublished textbooks may be assigned to categories such as Engineering, Mathematics, English, and so on. In the present example, these categories have been defined even more narrowly Freshman Engineering, Sophomore Engineering, etc.

A Program Entity Group is used to define categories or "programs" to which users and prepublished content can be assigned.

Program Index Class

The Program Index Class definition below defines a Program Index Class or Program Index class that is populated with a row for each user/content category. This index class is used as a quick reference for obtaining attribute information about each program
    Index Class
    Attribute Name     ATR Mapping                Type        Description
    Program_ID         PE_Program_ID              Alpha [4]   Program
     identifier, e.g. "FE" is the
                                                  INDEXED     identifier for
     "Freshman Engineering"
    EntityType         PSF                        Ext. Alpha  Used when
     programs are nested to define
                                                  [32]        hierarchical
     level of each program entity
    ParentItem         Program generated          Ext..Alpha  Supporting
     hierarchical or "parent"
                                                  [16]        programs
    Title              PE_Title                   Alpha [250] Program Title
    Subtitle           PE_Subtitle                Alpha [250] Program subtitle,
     if any
    AvgChrPerImage     PE_AC_Avg_Image_Bytes      Numeric [6] Average
     characters per image for products
                                                              within this
     program
    AvgChrPerSGMLAC    PE_AC_Avg_SGML_Bytes       Numeric [6] Average
     characters per SGML associated
                                                              component for
     products within this
                                                              program
    MaxChrPerUPMTier   PE_Chars_Per_UPM_Tier      Numeric [6] Maximum number of
     characters allowed
                                                              for a UPM in this
     program
    Status             PE_Status                  Alpha [1]   Status indicating
     whether program entity
                                                              is currently
     valid/invalid
    CreateDate         PE_CreateDate              Ext..Alpha  Date table entry
     created
                                                  [14]
    CreateBy           PE_CreateBy                Alpha [8]   Identifier of
     user who created entry
    LastModifiedDate   PE_LastModifiedDate        Ext..Alpha  Date entry was
     last modified
                                                  [14]
    LastModifiedBy     PE_LastModifiedBy          Alpha [8]   Identifier of
     user who last modified entry
    SiblingItem        Program generated          Ext..Alpha  Related sibling
     programs providing
                                                  [16]        support for
     hierarchical programs.
    ChildItem          Program generated          Ext..Alpha  Related child
     programs, if any, providing
                                                  [16]        support for
     hierarchical programs.
    AuxItem            Program generated          Ext..Alpha  Reference to
     auxiliary table
                                                  [16]
    SeqID              PSF                        Ext. Alpha  Unique program
     identifier, e.g., "FE" for
                                                  [32]        "Freshman
     Engineering"
                                      ProgramAux Index Class
    Index Class
    Attribute
    Name            Keywords                     Type        Description
    SeqID           PSF                          Ext. Alpha  Unique identifier
     (i.e., Sequence ID)
                                                 [32]        of this row.
    ProgramItem     Program generated            Ext..Alpha  Unique internal ID
     of row within
                                                 [16]        auxiliary program
     index class that
                                                             this attribute
     belongs to
    ParentItem      Program generated            Ext..Alpha  Unique internal ID
     of row within
                                                 [16]        auxiliary program
     index class that this
                                                             attribute's parent
     attribute belongs to
    SiblingItem     Program generated            Ext..Alpha  Unique internal ID
     of row within
                                                 [16]        auxiliary program
     index that this
                                                             attribute's next
     sibling attribute
                                                             belongs to
    ChildItem       Program generated            Ext..Alpha  Unique internal ID
     of row within
                                                 [16]        auxiliary program
     index that the first
                                                             child attribute
     for this attribute
                                                             belongs to
    Keyword         PE_Req_Count                 Alpha [32]  The next available
     unique identifier
                                                             for a request
                    PE_AC_Group                              This group defines
     associated
                                                             component
     attributes used in the
                                                             pricing formula
                    PE_AC_FormID                             AC type
                    PE_AC_ByteCount                          Number of
     "characters" for that AC
                                                             type
                    PE_Price_Group                           This group defines
     more attributes
                                                             used in pricing
     formula
                    PE_Country                               Country
                    PE_Monetary_Unit                         Monetary unit
                    PE_Min_Order_Price                       Minimum order
     price
                    PE_Base_Cust_Pub_Price                   Base price added
     to every custom
                                                             publication
                    PE_Base_UPM_Fec                          Base price added
     when UPM is
                                                             included
                    PE_Incr_UPM_Fee                          Additional price
     per UPM pricing
                                                             block
                    PE_Source_Price_Per_Page                 Price per page for
     prepublished
                                                             content included
                    PE_UPM_Bytes_Per_Page                    Number of UPM
     characters in a page
                    PE_Minimum_Page_Limit                    Minimum number of
     pages required
                                                             in a custom
     publication
                    PE_Volume_Page_Limit                     Maximum number of
     pages in a
                                                             volume
    Value           Value depends on specific attribute Ext. Alph
                    keyword above                [254]
    NextValueItem   Program Generated            Ext..Alpha  Unique internal ID
     of row within
                                                 [16]        auxiliary program
     index representing
                                                             the next value of
     a multi-valued
                                                             attribute.


CustomBookOutline Index Class

The CustomBookOutline Index Class defines a relational CBO Index Class that includes a row for each compilation of content created. Each row further includes a reference to a part containing a road map or outline of the compilation of content. The index class is used as a quick reference for obtaining attribute information about a compilation, as well as for locating the corresponding part numbers. Again, the attributes are a matter of design choice.
    Index Class
    Attribute Name    Source            Type          Description
    ProgramID         Web application   Ext. Alpha [4] Program identifier
    CBOTitle          Web application   Alpha [120]   Custom book title
    ApprovalStatus    Program generated Alpha [1]     Approval status, i.e.,
     active, submitted, approved,
                                                      rejected or printed
    UPMCharCount      Program generated Alpha [8]     Character count of any
     user-provided content
    RightsFee         Program generated Alpha [8]     License fee
    SGMLPageEstimate  Program generated Alpha [4]     Estimated page count for
     SGML content
    TotalPageEstimate Program generated Alpha [4]     Estimated total page
     count
    PriceEstimate     Program generated Alpha [8]     Estimated price
    ISBN              Program generated Alpha [10]    Unique ISBN assigned to
     the custom book at
                                                      submission time.
    CreatorID         Program generated Alpha [20]    Creator's unique
     identifier
    CreatorTS         Program generated Alpha [14]    Timestamp representing
     time of current edit
    LastModifiedTS    Program generated Alpha [14]    Timestamp representing
     time last modified
    CBOTerms          Program generated Ext. Alpha [32] Name of file containing
     terms and conditions that
                                                      will apply to custom
     book?
    ActiveCBOPartID   Program generated Alpha [3]     Part number of active
     custom book
    LastUPMPartID     Program generated Alpha [3]     Part number of the last
     user-provided material
                                                      added


Part Structures & Text Indices

The part definition describes the parts associated with each compilation. In the present example, three parts are defined: part 1 initially containing the custom book outline, part 2 initially containing a backup copy of the custom book outline for use in undo operations, and parts numbered 50 or higher containing user provided material (UPM). (Note: After undo, part 2 becomes the active CBO, and part 1 is the backup. The attribute value of "ActiveCBOPartID" indicates which is of these is currently the active part.) The first UPM added to a custom book is assigned to part 50, the second UPM added is assigned part 51, and so on. The last UPM part number assigned is stored in the CBO Index class defined above and serves two functions. It is a value that is retrieved and incremented each time new UPM is added. In addition, it serves as an indicator of how many parts the custom book currently contains.
          Part No.    Description               Text Index
           1          Part number for Active/         None
                      Inactive CBO
           2          Part number for Active/         None
                      Inactive CBO
          50+         Part numbers for             None
                      user-provided content


Request Entity Group

Whenever a compilation of content is submitted for publication, the Request Entity Group is used to generate an entry in a Request index class corresponding to the submission request. A unique ISBN is assigned to the CBO once it has been approved for publishing, Attributes are a matter of design choice. In the present example, they describe useful information about the custom book such as its unique identifier, author, approval status, price, etc.
                                      Request Index Class
    Index Class
    Attribute Name    Source            Type            Description
    CBOID             Program generated Ext. Alpha [20] Unique CBO identifier
     assigned at submission
                                                        time
    Userid            Program generated Ext. Alpha [20] Author
    ApprovalStatus    Program generated Alpha [1]       CBO state in the
     process
                                                        0 - Active
                                                        1 - Submitted
                                                        2 - Approved
                                                        3 - Rejected
                                                        4 - Printed
    TotalPrice        Program generated Numeric [9]     Price of custom book
    QtyStudentCopies  Web application   Numeric [4]     Number of student
     copies requested
    QtyDeskCopies     Web application   Numeric [2]     Number of desk copies
     requested
    QtySupplements    Web application   Numeric [2]     Number of books to be
     used as supplements
    NeedByDate        Web application   Ext. Alpha [10] Date needed by
    TermStartDate     Web application   Ext. Alpha [10] Start date of the
     school term for which this
                                                        CBO is created
    TermName          Web application   Ext. Alpha [20] E.g., Spring, Fall
    University        Web application   Ext. Alpha [100] University name, e.g.,
     Stanford University
    Department        Web application   Ext. Alpha [100] Department name, e.g.,
     Electrical Engineering
    ClassName         Web application   Ext. Alpha [128] Class name, e.g.,
     Engineering Basics
    ClassNumber       Web application   Ext. Alpha [12] Class number
    CourseNumber      Web application   Ext. Alpha [12] Course number, e.g.,
     101
    ShipToNameTitle   Web application   Ext. Alpha [12] *
    ShipToFirstName   Web application   Ext. Alpha [40] *
    ShipToLastName    Web application   Ext. Alpha [40] *
    ShipToAddrLine1   Web application   Ext. Alpha [40] *
    ShipToAddrLine2   Web application   Ext. Alpha [40] *
    ShipToAddrLine3   Web application   Ext. Alpha [40] *
    ShipToCity        Web application   Ext. Alpha [40] *
    ShipToState       Web application   Ext. Alpha [20] *
    ShipToCountry     Web application   Ext. Alpha [20] *
    ShipToPostalCode  Web application   Ext. Alpha [20] *
    PackageISBN       Program generated Alpha [10]      The ISBN assigned to
     the entire book. This
                                                        may be different from
     the ISBN's assigned to
                                                        volumes within the
     book.
    CreateTS          Program generated Alpha [14]      Time entry created
    RequestID         Program generated Ext. Alpha [16] Unique request
     identifier
    *Self-explanatory


RequestAux Index Class

The RequestAux Index Class is used in the present example to add additional columns to designated rows of the Request Index class when a CBO contains more than one volume. More specifically, if greater than one volume exists, the CBO and each volume it contains are each assigned a unique ISBN, and the Volume, VolumeISBN and VolumeID columns are added to the row representing the submission request. The RequestItem, ParentItem, SiblingItem and ChildItem attributes are used to identify the row to which these columns are added.
    Index Class
    Attribute
    Name          Source            Type            Description
    RequestItem   Program generated Ext. Alpha [16] Unique internal ID of row
     within request index
                                                    class of entity that this
     attribute belongs to
    ParentItem    Program generated Ext. Alpha [16] Unique internal ID of row
     within auxiliary request
                                                    index class of entity that
     is this attribute's parent
    SiblingItem   Program generated Ext. Alpha [16] Unique internal ID of row
     within auxiliary request
                                                    index class of entity that
     is this attribute's next
                                                    sibling (siblings are
     ordered)
    ChildItem     Program generated Ext. Alpha [16] Unique internal ID of row
     within auxiliary request
                                                    index class of entity that
     is this attribute's first
                                                    child (children are
     ordered).
    Keyword       Volume            Alpha [32]      The parent attribute of the
     volume information.
                  VolumeISBN                        The child attribute of
     Volume which stores the
                                                    ISBN of the volume.
                  VolumeID                          unique internal ID of row
     within request index
                                                    class of volume entity
     corresponding to this
                                                    volume.
    Value         Program generated Ext. Alpha [254]
    NextValueItem Program generated Ext. Alpha [16] Unique internal ID of row
     within auxiliary request
                                                    index representing the next
     value of a multi
                                                    valued attribute.


Login/Registration Database Model

The Users Table simply defines a relational table for storing user information. The fourth column represents if this is a primary key field and the fifth column represents if this is a foreign key field.
                                    USERS Table
    Table Column       Table Column      Table Column  Table Column  Table
     Column
    Name               Datatype          Null Option   Is PK         Is FK
    USER_ID            VARCHAR2(30)      NOT NULL      Yes           No
    DEPT_UD_ID         NUMBER(8)         NULL          No            No
    UNIV_UD_ID         NUMBER(8)         NULL          No            No
    DEPARTMENT_ID      NUMBER(8)         NULL          No            No
    UNIV_ID            NUMBER(8)         NULL          No            No
    USERNAME           VARCHAR2(30)      NOT NULL      No            No
    PASSWORD           VARCHAR2(30)      NOT NULL      No            No
    TITLE              VARCHAR2(100)     NULL          No            No
    FIRST_NAME         VARCHAR2(30)      NULL          No            No
    LAST_NAME          VARCHAR2(30)      NULL          No            No
    ADDRESS1           VARCHAR2(80)      NULL          No            No
    ADDRESS2           VARCHAR2(80)      NULL          No            No
    ADDRESS3           VARCHAR2(80)      NULL          No            No
    CITY               VARCHAR2(50)      NULL          No            No
    STATE              VARCHAR2(2)       NULL          No            No
    ZIP                VARCHAR2(10)      NULL          No            No
    COUNTRY            VARCHAR2(50)      NULL          No            No
    PHONE              VARCHAR2(15)      NULL          No            No
    EMAIL              VARCHAR2(80)      NOT NULL      No            No
    CHALLENGE          VARCHAR2(255)     NOT NULL      No            No
    RESPONSE           VARCHAR2(255)     NOT NULL      No            No
    SECURITY           CHAR(1)           NOT NULL      No            No
    TIMESTAMP          DATE              NULL          No            No
    CBO_ID             VARCHAR2(64)      NULL          No            No


USER_CBOS Table

The USER_CBOS table enables a user to have more than one active CBO at a time.
    Table Column       Table Column      Table Column  Table Column  Table
     Column
    Name               Datatype          Null Option   Is PK         Is FK
    USER_ID            NUMBER(8)         NOT NULL      Yes           No
    CBO                VARCHAR2(100)     NULL          No            No
    TIMESTAMP          DATE              NULL          No            No


3. Input Interface

An interface 8 is provided to the user for entering information to be stored in digital library 20. Information includes Program categories and prepublished content. The interface can be in a variety of forms, but it must be able to communicate with an OO Api layer 30 which is in the present embodiment comprises a C dll. The interface 8 of the present embodiment is a web based solution consisting of 22, 24, 26 and 28. Alternatively, application code 28 may provide the same function.

In the present example, each prepublished content product is input as one SGML file. The hierarchical levels within that SGML file are discernible by their delimiting tag types. Program information is provided as a field identifying the program for each prepublished content product. The program configuration information is defined in PSF/ATR files and loaded into the datastore using the loader 10.

4. Converter

Converter 10 receives the SGML files and uses the delimiting tags to separate the product entities and associated components. It also builds a file defining the hierarchical relationships of these entities and components, and extracts relevant product attributes. In the present example, the resulting files include four possible file types: a Product Structure File (PSF), Attribute Files (ATR), Content Component Files and Associated Component Files.

Product Structure Files (PSF). For content, the Product Structure File provides a hierarchical outline of the contents in a prepublished product. More specifically, it is a parsable formatted file listing all of the entities making up a content product (e.g., a book container, volume containers, chapter containers and sections, each identified by its sequence identifier). This file is used as a road map (i.e., a list or table of contents) defining the content, order and hierarchical structure of the prepublished product, thereby relating a product's separately stored content entities. It is stored as a part in digital library 20. An example of a PSF file for a content product is shown below:
    PRODUCT.C:0130808598.00.00.00
        FRONT_AND_BACK_ELEMENT:0130808598.01.01.00
        FRONT_AND_BACK_ELEMENT:0130808598.01.02.00
        FRONT_AND_BACK_ELEMENT:0130808598.01.03.00
        FRONT_AND_BACK_ELEMENT:0130808598.01.04.00
        CHAPTER.C:0130808598.02.00.00
            FRONT_AND_BACK_ELEMENT:0130808598.02.01.00
            SECTION:0130808598.02.02.00
            SECTION:0130808598.02.03.00
            SECTION:0130808598.02.04.00
            SECTION:0130808598.02.05.00
            SECTION:0130808598.02.06.00
            FRONT_AND_BACK_ELEMENT:0130808598.02.07.00
        CHAPTER.C:0130808598.03.00.00
            FRONT_AND_BACK_ELEMENT:0130808598.03.01.00
            SECTION:0130808598.03.02.00
            SECTION:0130808598.03.03.00
            SECTION:0130808598.03.04.00
            SECTION:0130808598.03.05.00
            SECTION:0130808598.03.06.00
            FRONT_AND_BACK_ELEMENT:0130808598.03.07.00
        CHAPTER.C:0130808598.04.00.00
            FRONT_AND_BACK_ELEMENT:0130808598.04.01.00
            SECTION:0130808598.04.02.00
            SECTION:0130808598.04.03.00
            SECTION:0130808598.04.04.00
            SECTION:0130808598.04.05.00
            FRONT_AND_BACK_ELEMENT:0130808598.04.06.00
        CHAPTER.C:0130808598.05.00.00
            FRONT_AND_BACK_ELEMENT:0130808598.05.01.00
            SECTION:0130808598.05.02.00
            SECTION:0130808598.05.03.00
            SECTION:0130808598.05.04.00
            SECTION:0130808598.05.05.00
            SECTION:0130808598.05.06.00
            FRONT_AND_BACK_ELEMENT:0130808598.05.07.00
        CHAPTER.C:0130808598.06.00.00
            FRONT_AND_BACK_ELEMENT:0130808598.06.01.00
            SECTION:0130808598.06.02.00
            SECTION:0130808598.06.03.00
            SECTION:0130808598.06.04.00
            SECTION:0130808598.06.05.00
            SECTION:0130808598.06.06.00
            SECTION:0130808598.06.07.00
            SECTION:0130808598.06.08.00
            FRONT_AND_BACK_ELEMENT:0130808598.06.09.00
        CHAPTER.C:0130808598.07.00.00
            FRONT_AND_BACK_ELEMENT:0130808598.07.01.00
            SECTION:0130808598.07.02.00
            SECTION:0130808598.07.03.00
            SECTION:0130808598.07.04.00
            SECTION:0130808598.07.05.00
            FRONT_AND_BACK_ELEMENT:0130808598.07.06.00
        CHAPTER.C:0130808598.08.00.00
            FRONT_AND_BACK_ELEMENT:0130808598.08.01.00
            SECTION:0130808598.08.02.00
            SECTION:0130808598.08.03.00
            FRONT_AND_BACK_ELEMENT:0130808598.08.04.00
        CHAPTER.C:0130808598.09.00.00
            FRONT_AND_BACK_ELEMENT:0130808598.09.01.00
            SECTION:0130808598.09.02.00
            FRONT_AND_BACK_ELEMENT:0130808598.09.03.00
        CHAPTER.C:0130808598.10.00.00
            FRONT_AND_BACK_ELEMENT:0130808598.10.01.00
            SECTION:0130808598.10.02.00
            SECTION:0130808598.10.03.00
            FRONT_AND_BACK_ELEMENT:0130808598.10.04.00
        CHAPTER.C:0130808598.11.00.00
            FRONT_AND_BACK_ELEMENT:0130808598.11.01.00
            SECTION:0130808598.11.02.00
            SECTION:0130808598.11.03.00
            FRONT_AND_BACK_ELEMENT:0130808598.11.04.00
        FRONT_AND_BACK_ELEMENT:0130808598.12.01.00


For program categories, the PSF file contains the unique program identifier. As an example, the contents of a PSF file for the "Freshman Engineering" program is shown below:

PROGRAM:fe

Attribute Files (ATR). Attribute files contain metadata about each program or product entity input. This information must be extracted by converter 10. These files are mapped to the program and product index class defined by the Program and Product index classes (using the ELOADER.INI file described below) and stored in digital library 20. There is one attribute file for each program and for each product entity to be stored. Examples of ATR files are shown below. The first is an ATR file for a "book":
    ;;
    ;;PRODUCT.C - ATR file - Created: 29 October 1999 21:55:06
    ;;
    ;;Seq_ID: 0130808598.00.00.00
    ;;
    !SKU:0000000014595
    !!ISBN:0130808598
    !Title:Engineering Success
    !Contrib_Group
    !   Contrib_First_Name:Peter
    !   Contrib_Last_Name:Schiavone
    !   Contrib_Affiliation:University of Alberta
    !PE_ID:FE
    !Status:0
    !Page_Count:0
    !Use_Actuals:1
    !Yr_of_Pub:1999
    !Edition:01
    !Revision:00
    !Version:01.00
    !Created_By:BARKER
    !LastModified_By:BARKER


The ATR for chapter 3 of the preceding book is shown below:
    ;;
    ;; CHAPTER.C - ATR file - Created: 29 October 1999 21:55:09
    ;;
    ;; Seq_ID: 0130808598.03.00.00
    !SKU:0000000014618
    !Title:Introduction to Engineering and Engineering Study
    !Authored_Abstract:&Idquo;How much do you know about engineering? Why did
     you choose to study
    engineering?What reasons lead you to believe that you are ready and
     equipped to study
    engineering?What are the main differences between studying at a university
     and studying in high
    school?What new success skills do you need to succeed in engineering
     study?Can you write down 10
    answers to each question I have asked you? Go ahead and try."
    !Authored_Abstract:This is often how I begin my lecture to freshman
     engineering students enrolled in an
    introductory engineering class. After a little thought, most of them
     realize just how little they know about
    this subject called engineering and (often despite excellent high school
     averages) how ill equipped they
    are to study engineering.
    !Authored_Abstract:In this chapter, we address both issues. First, we ask
     the following questions:What is
    engineering?What do engineers do?Why choose to study engineering?
    !Authored_Abstract:The answers to these questions are not only interesting
     and informative, but will help
    keep you motivated along the long, hard road to an engineering degree.
    !Authored_Abstract:In , we address the question, &Idquo;Are you prepared
     and equipped for engineering
    study?"In doing so, we examine the study skills required to succeed in the
     university environment.
    For many students, the university is the next logical step after high
     school, the next academic challenge.
    Consequently, they expect their freshman year in engineering to be much
     like another year of high
    school-which, of course, it isn`t. In engineering, such an exception often
     manifests itself in
    unacceptably high first-year attrition rates. We address this issue by
     focusing on what you need to do to
    ensure the best possible start to earning your engineering degree.
     Essentially, you must develop the
    necessary:Work strategiesStudy strategiesAttitudesCommunication
     skillsAbility to work as part of a
    teamTime management skills


The ATR for section 3.2 of the same book is shown below:
    ;;
    ;;SECTION - ATR file - Created: 29 October 1999 21:55:09
    ;;
    ;;Seq_ID: 0130808598.03.02.00
    ;;
    !Filename:0130808598.03.02.00.sgm
    !CDAOID:AABQHDS0
    !Index_Term:engineering
    !Index_Term:defined
    !Index_Term:engineering, study
    !Index_Term:introduction to
    !Index_Term:engineering, defined
    !Title:What Is Engineering?
    !SGML_Char_Cnt:2370
    !AC_Counts
        ACFORMID:2
        NUMBERAC:1
    !Associated_Component
    !   AC_PE_ID:FE
    !   AC_CDAOID:AABQHDT0
    !   AC_Title:FIG1
    !   AC_Image Type:TIFF
    !   AC_Graphic_Filename:HiRes.backslash.AABQHDT0.TIF
    !   AC_Authored_Abstract:None


The ATR file for the "Freshman Engineering" program is shown below:
    ;;
    ;Program ID for Freshman Engineering set to "FE"
    !PE_Program_ID:FE
    !PE_Title:Freshman Engineering
    !PE_Subtitle:
    !PE_Req_Count:ESOU002300
    ;!PE_Related_Material:<value>
    !PE_AC_Group
    !   PE_AC_FormID:1
    !   PE_AC_ByteCount:2
    !   PE_AC_FormDesc:Inline Graphic
    !PE_AC_Group
    !   PE_AC_FormID:2
    !   PE_AC_ByteCount:1000
    !   PE_AC_FormDesc:Display Graphic
    !PE_AC_Group
    !   PE_AC_FormID:3
    !   PE_AC_ByteCount:68
    !   PE_AC_FormDesc:Inline Equation
    !PE_AC_Group
    !   PE_AC_FormID:4
    !   PE_AC_ByteCount:180
    !   PE_AC_FormDesc:Display Equation
    !PE_AC_Group
    !   PE_AC_FormID:5
    !   PE_AC_ByteCount:2000
    !   PE_AC_FormDesc:SGML
    !PE_AC_Av_Image_Bytes:0
    !PE_AC_Avg_SGMLBytes:0
    !PE_Chars_Per_UPM_Tier:2000
    !PE_Price_Group
    !   PE_Country:0
    !   PE_Monetary_Unit:USD
    !   PE_Min_Order_Price:1000
    !   PE_Base_Cust_Pub_Price:1000
    !   PE_Base_UPM_Fee:0
    !   PE_Incr_UPM_Fee:10
    ;JDR add 1/21/99
    !   PE_Source_Price_Per_Page:10
    !   PE_Minimum_Page_Limit:80
    !   PE_Volume_Page_Limit:480
    !   PE_UPM_Bytes_Per_Page:1000
    !PE_Status:F
    !PE_CreateDate:1998-12-07
    !PE_CreatedBy:UHANAED
    !PE_LastModifiedDate: 1999-1-19
    !PE_LastModifiedBy:UHANAED


Content Component Files (SGML). Content component files contain the product entities' actual ASCII or binary content that will be stored as parts in digital library 20. In the present example, these files comprise SGML files containing the ASCII text of chapter sections.

Associated Component Files. Associated Component (AC) Files contain any non-SGML content associated with the product entities. The content in the associated component files is stored as parts in digital library 20.

Both prepublished content and custom book outlines (CBO's), described below, are represented in the described file format. A feature of this format is that content objects such as a prepublished book or CBO are defined by the PSF file. Thus the PSF file may be used to redefine the content, order and structure of the content object without having to access the content itself. This feature proves useful in creating compilations of content, by simplifying the process for adding, moving and deleting content.

Composite PSF & ATR Files. Out of the PSF and ATR format comes a third file format that is a composite form of PSF and ATR. For simplicity, this type is also referred to as a PSF+ATR format. One can think of this file as a merge of PSF and ATR files, where attributes from an entity's ATR file have been inserted after that entity in the PSF. For example, it may be desirable to include certain attributes with a PSF file (e.g., author and price). It may be desirable to add certain attributes to the product structure file (e.g., author) when it is stored in the digital library. Accordingly, in the present embodiment, what is stored as "Entity Structure Part" described earlier is actually a composite form of PSF and ATR. For a book or product level entity, this file includes all entities in the book (including the book itself) and their attributes. For a chapter level entity, this file includes all entities in the chapter (including the chapter itself) and their attributes. For a section level entity, this file includes the section entity and its attributes. Attributes are also added to PSF files containing custom compilation outlines created by system users, and Equery result files. In the Equery results files, all of the entities returned are treated flat, namely at the same hierarchical level.

An example of a composite file format is shown below:
    Top_Entity1: sequence_ID
    !Attribute1: value
    !Attribute2: value
    !Attribute3:
    !   Subattribute1: value
    !   Subattribute2: value
    !Attribute4: Value4
        Sub_Entity1: Sequence_ID
        !Attribute1: value
        !Attribute2: value
            Sub_Sub_Entity1: Sequence_ID
            !Attribute1: value
            !Attribute2: value
        Sub_Entity2: Sequence_ID
        !Attribute1: value
        !Attribute2: value


An exemplaryentity structure part stored in Digital Library is provided in Appendix A.

In the present example, converter 10 is preferably Active System's Attribute Extractor (i.e. AE). Converter 10 creates a load directory for each prepublished content product, identified by that product's ISBN, which contains the product's corresponding Product Structure File (PSF), Attribute Files, Content Component Files and Associated Component Files. It also creates a load directory for each program category, identified by the program identifier and containing the program's corresponding PSF and ATR files. These directories are provided as input to content loader 14.

4. Content Loader

Content loader 14 is a software application for loading the program and prepublished content files described above into the digital library 20. It receives the load directories as input from converter 10, then loads this information into digital library according to a content configuration model 12 defined in the ELOADER.INI configuration file (described below). Content loader 14 interfaces with the digital library content server(s) 18 through the OO API layer 16.

The content loader 14 has three modes of operation: load, delete and purge.

Load. The purpose of this mode of operation is to load or reload the Content Entities, Associated Components and Attributes into the digital library 20. All Content Component Files are stored as binary large objects or BLOBs in the digital library object server 48. All Attribute Files are parsed and the resultant parametric data is stored in the digital library server 44.

As previously noted, the input files to the content loader 14 are a Product Structure File (i.e., a sequence-id.psf), an Attribute file for each product entity loaded (i.e., sequenceid.atr), a file for each Content Component (i.e., sequence-id.sgm) and a file for each Associated Component (i.e., sequenceid.cdaOID.gif)

The output of the ELoader will be placed into the appropriate index class in the digital library 20 as specified by the configuration model contained in the ELOADER.INI file.

Syntax: ELoader--load<sequence-id>

Example #1: ELoader--load 012345678

This load command launches loader 14 into load mode. It looks in the load directory identified by an ISBN="012345678" for all of the Attribute Files, Content Components and Associated Components stored therein, and processes these files.

Example #2: ELoader 012345678.02.00.00

This load command launches loader 14 into load mode. It looks in the load directory identified by an ISBN="012345678" for all Attribute Files, Content Components, and Associated Components associated with chapter container "012345678.02.00.00", and process these files.

Delete. The purpose of this mode of operation is to delete selected Content Entities, Associated Components and Attributes from the Digital Library. The ELoader will delete all content, attributes, and text index entries from digital library 20 for the ISBN/sequence number specified, as well as all child content and attributes associated with that ISBN/sequence number.

Syntax: ELoader--delete<sequence-id>

Example #1: ELoader--delete 012345678

This command launches loader 14 into delete mode and deletes all content and attributes for the prepublished content product whose ISBN="012345678".

Example #2: ELoader--delete 012345678.02.00.00

This command launches loader 14 into delete mode and deletes all entities and attributes for the entity whose sequence number="012345678.02.00.00" as well as all of its children. The rest of the content product remains untouched.

Purge. The purpose of this mode of operation is to purge Content Entities, Associated Components and Attributes from the Digital Library after a Load that did not complete successfully. Loader 14 deletes all content, attributes, and text index entries from the digital library 20 even though it is in a partially loaded state.

Syntax: ELoader--purge<sequence-id>

Example: ELoader--purge 012345678

This command launches loader 14 into purge mode and deletes all content and attributes for the prepublished content product whose ISBN="012345678".

5. Configuration Model

Configuration model 12 is embodied in a configuration file called ELOADER.INI, and associated configuration files that it calls. The configuration files contain all of the switches and parameters necessary to customize the operation of loader 14 to the data model defined above. The primary objective of these files is to minimize the need to change loader 14 program source code if the data model is modified.

The ELOADER.INI file is organized into several sections with multiple keywords and values in each section. The LOGON and DEBUG sections describe parameters that govern the overall loader operation. The ELOADER section and the Individual Group Sections describe the entity types that have been defined in the exemplary data model (i.e., the Program, Product, CBO and Request entity groups). The Individual Entity Sections describe each entity type that belongs to a given entity group. The ATTRIBUTES section and the Individual Attribute Sections describe the set of attributes that may be loaded for the entities in the data model.

The ELOADER.INI file, the data model file, and each of the individual GROUP attribute files are in the same format as an Attribute file as shown in the examples. The GROUP file is in the PSF format.

a. Structure

LOGON Section. This section specifies the digital library USERID and server names for all operations between content loader 14 and digital library 20.
    KEYWORD       VALUE   MEANING
    LIBRARY       name    The name of the DIGITAL LIBRARY Library
                          Server to be used.
    USERID        name    The USERID that will be used to logon to
                          DIGITAL LIBRARY.
    PASSWORD      name    The PASSWORD of the USERID.
    TEXT          name    The name of the client instance of the Text
    SERVER                Miner server.
    MAX HITS      number  The maximum number of hits to be returned by
                          EQuery (described subsequently).


DEBUG Section. This section specifies internal switch settings that are only used for debugging, testing, and performance analysis.
    KEYWORD       VALUE   MEANING
    TRACE         0       No debug trace will be created.
                  1       Create trace of internal activity for debugging.
                          This is not a log file.


Log files Section. This section specifies the names of the files to be used for logging and debugging.
        KEYWORD           VALUE       MEANING
        LOADER            name        Filename for Loader log.
        TRACE             name        Filename for debug trace.


ELOADER Section. This section specifies the name of the initialization file containing the full data model with all of its data groups. In other words, it is a pointer to a meta-metadata file.
    KEYWORD     VALUE   MEANING
    DATA        Name    Filename of a file containing each Group name
    MODEL               and the name of the Group File.
    DEFAULT     Name    Name of the default Group.
    GROUP
    ROOT        Name    String to be concatenated to the unique ISBN of a
    ENTITY              content object to obtain the root sequence-id.
    SID


CONTENT CLASSES Section. This section specifies the digital library content class for each of the possible file extensions of associated component files.
    KEYWORD       VALUE   MEANING
    DEFAULT       Name    BINARY if the component contains non-human
    CONTENT               readable data.
    CLASS                 ASCII if the component contains human
                          readable data.
    File extension Name    BINARY if the component contains non-human
                          readable data.
                          ASCII if the component contains human
                          readable data.


Groups File. This section lists the names of all hierarchical groups of entities within the data model. All entity instances that belong to the same group will be stored in the same set of digital library index classes. This is a convenient way to manage product-related entities separately from other business-related entities. There may be one or more GROUPS in a Groups File.
    KEYWORD     VALUE   MEANING
    GROUP       name    All instances of entities within this Entity group
                        will be stored in the same set of digital library
                        index classes. The Entity types that belong to this
                        group may be specified via ENTITY keywords in
                        an Individual Group Section.


Individual Group Attribute Files. Each filename in the group attribute files is the value of one of the GROUP keywords in the Groups File. It identifies the data model entities that will be stored together as a related group and the digital library index class names that will be used to store them. There is one Individual Group Section for each GROUP keyword in the Groups File.
    KEYWORD       VALUE   MEANING
    ENTITIES      name    The digital library index class name that will be
    CLASS                 used to store all instances of entities that belong
                          to this group.
    ENTITY ID     name    The digital library attribute name in the
                          ENTITIES Index Class for a unique identifier
                          for the entity. It is assumed to be unique and an
                          index.
    ENTITY TYPE   name    The digital library attribute name in the
                          ENTITIES Index Class for the digital library
                          Type ID of the entity.
    ENTITY        name    The digital library attribute name in the
    PARENT ID             ENTITIES Index Class for the digital library
                          Item ID of the parent container of this entity.
    ENTITY        name    The DIGITAL LIBRARY Attribute Name in
    CHILD ID              the ENTITIES Index Class for the DIGITAL
                          LIBRARY Item ID of the first child of this
                          entity.
    ENTITY        name    The DIGITAL LIBRARY Attribute Name in
    SIBLING ID            the ENTITIES Index Class for the DIGITAL
                          LIBRARY Item ID of the first sibling of this
                          entity.
    ENTITY AUX    name    The DIGITAL LIBRARY Attribute Name in
    ID                    the ENTITIES Index Class for the first
                          auxiliary attribute of this entity.
    ENTITY        name    The DIGITAL LIBRARY Attribute Name in
    SUBCOMP ID            the ENTITIES Index Class for the first
                          associated component of this entity.
    ASSOC COMP    name    The DIGITAL LIBRARY Attribute Name in
    ATTR NAME             the COMPONENTS Index Class for the
                          DIGITAL LIBRARY Item ID of the
                          ENTITIES Index Class item that owns
                          the COMPONENT item.
    ASSOC COMP    name    The DIGITAL LIBRARY Attribute Name in
    ID ATTR               the COMPONENTS Index Class for the
    NAME                  DIGITAL LIBRARY Item ID of the
                          ENTITIES Index Class item that owns the
                          COMPONENT item.
    ATTRIBUTES    name    The DIGITAL LIBRARY Index Class name
    CLASS                 that will be used to store all attributes that are
                          hierarchical or have multiple instances.
    ATTR          name    The DIGITAL LIBRARY Attribute Name in
    ENTITY ID             the ATTRIBUTES Index Class for the
                          DIGITAL LIBRARY Item ID of the
                          ENTITIES Index Class item that owns
                          the ATTRIBUTE item.
    ATTR          name    The DIGITAL LIBRARY Attribute Name in
    PARENT ID             the ATTRIBUTES Index Class for the
                          DIGITAL LIBRARY Item ID of the parent
                          container of this entity.
    ATTR CHILD    name    The DIGITAL LIBRARY Attribute Name in
    ID                    the ATTRIBUTES Index Class for the
                          DIGITAL LIBRARY Item ID of the first
                          child container of this entity.
    ATTR          name    The DIGITAL LIBRARY Attribute Name in
    SIBLING ID            the ATTRIBUTES Index Class for the
                          DIGITAL LIBRARY Item ID of the next
                          sibling container to this entity.
    ATTR NEXT     name    The DIGITAL LIBRARY Attribute Name in
    VALUE                 the ATTRIBUTES Index Class for the
                          DIGITAL LIBRARY Item ID of the next
                          value of this entity.
    ATTR          name    The DIGITAL LIBRARY Attribute Name in
    KEYWORD               the ATTRIBUTES Index Class for the attribute
                          containing the Attribute's Keyword.
    ATTR VALUE    name    The DIGITAL LIBRARY Attribute Name in
                          the ATTRIBUTES Index Class for the attribute
                          containing the Attribute's Value.
    ENTITY        name    Name of the entity type of the root entity.
    ENTITY        Part    Specifies the DIGITAL LIBRARY Part
    STRUCTURE     Number  Number where an internally generated
    PART                  summary of attribute values for this
                          entity and all of its descendants will
                          be stored.


ATTRIBUTE Definitions. Each top-level attribute name that may be present in a Group Attribute file must have keywords defined. Attributes that are part of an attribute hierarchy (i.e. sibling attributes with a parent attribute) are defined by an ATTRIBUTE keyword within the parent's attribute definition.
    KEYWORD       VALUE     MEANING
    ATTRIBUTE               Defines the beginning of a single attribute.
                            There may be one or more ATTRIBUTE
                            definitions in Group File. Each attribute
                            name that may be present in an attribute
                            file must have keywords defined.
    NAME          name      The attribute name that will be used to
                            identify this attribute.
    TYPE          ENTITY    The value of this attribute will be stored as
                            a Primary attribute in the Entities Index
                            Class of the appropriate Entity Group.
                  COMP      It uses the DIGITAL LIBRARY attribute
                            specified by the DIGITAL LIBRARY
                            NAME keyword. The value of this
                  AUX       attribute will be stored as a Primary
                            attribute in the Components Index Class
                            of the appropriate Entity Group. It uses
                            the DIGITAL LIBRARY attribute
                            specified by the DIGITAL LIBRARY
                            NAME keyword.
                  SYS       The value of this attribute will be stored
                            as an Auxiliary attribute along with its
                            keyword. Depending on its position in the
                            attribute file, it will also contain the
                            DIGITAL LIBRARY Item ID of either an
                            Entities Index Class or a Components
                            Index Class item of the appropriate Entity
                            Group. The value of this attribute may not
                            be loaded via ELoader and it is not
                            explicitly stored in the Digital Library.
                            The value of this attribute generated by
                            the DIGITAL LIBRARY query
                            engine and is available for output by
                            EQuery.
    VALUES        1 (default) This attribute may only have one value.
                  *         The attribute may be either PRIM or
                            COMP. This attribute may have zero or
                            more values. The attribute type may only
                            be AUX. The values will always be
                            retrieved in the same order that they are
                            stored.
    DEFAULT       INHERIT   If a value is not explicitly specified for
                            this attribute, the current value of the
                            same attribute of the parent entity is used
                  LINK      when the entity is stored (i.e. early
                            binding). If a value is not explicitly
                            specified for this attribute, the current
                  NULL      value of the same attribute of the parent
                            entity is used when the entity is retrieved
                            (i.e. late binding). If a value is not
                            explicitly specified for this attribute,
                            the value is assumed to be a NULL string
                            (i.e. no binding).
    FILE          0 (default) This attribute has a normal text value and
                  1         is not a file name. The value of this
                            attribute is a file name. In addition to
                            storing the file name as the attribute value,
                            the content of the file is stored as a
                            DIGITAL LIBRARY part in the Part
                            number specified by the PART keyword.
    PART          n         If the PART keyword is specified, the
                            value of the attribute is either a long
                            string or the contents of a file (based on
                            the value of the FILE keyword). The value
                            of the PART keyword specifies the
                            DIGITAL LIBRARY Part Number where
                            value of attribute will be stored. The part
                            will be stored with the item that contains
                            the attribute value. This type of attribute
                            may be searched with Text search, but not
                            parametric search.
    DL NAME       name      The DIGITAL LIBRARY attribute name
                            that will be used to store this attribute.
                            The attribute type must be ENTITY or
                            COMP.
    SEARCH        P         Allow parametric search. The attribute
                  T         type must be ENTITY.
                  PT        Allow text search.
                            Allow parametric and/or text search. The
                            attribute type must be ENTITY.
    TEXT INDEX    name      The name of the Text Miner index that
                            will be used to index this attribute value.
    ATTRIBUTE               The presence of an ATTRIBUTE keyword
                            indicates that the parent attribute has child
                            values. The top-level attribute type must
                            be AUX. The value of this attribute is the
                            aggregation of all of the values of the
                            attributes that are defined by all of the
                            attribute values that it contains.


Example of an ELOADER.INI File:
    !LOGON
    !   LIBRARY:LIBSRVRX
    !   USERID:chuck
    !   PASSWORD:chuck
    !   TEXT SERVER:TM
    !DEBUG
    !   TRACE:1
    !LOG FILES
    !   LOADER:Emissary.log
    !   TRACE:ETrace.log
    !ELOADER
    !   DATA MODEL:EMISSARY.GROUPS
    !   DEFAULT GROUP:PRODUCTS
    !   ROOT ENTITY SID:.00.00.00
    !CONTENT CLASSES
    !   DEFAULT CONTENT CLASS:BINARY
    !   tiff:BINARY
    !   gif:BINARY
    !   jpg:BINARY
    !   eps:ASCII
    !   sgm:ASCII
    !   txt:ASCII


E.GROUPS file:

GROUP:PRODUCTS

GROUP:ProugramGroup

ProgramGroup.ATR file for the group ProgramGroup:
    !NAME:PROGRAM
    ;Index Class control words
    !ENTITIES CLASS:E_Program
    !ENTITY ID:E_SeqID
    !ENTITY TYPE:E_EntityType
    !ENTITY PARENT D:S_ParentItem
    !ENTITY CHILD ID:S_ChildItem
    !ENTITY SIBLING ID:S_SiblingItem
    !ENTITY AUX ID:S_AuxItem
    !ENTITY STRUCTURE PART:9
    !ASSOC COMP ATTR NAME:Associated_Component
    !ASSOC COMP ID ATTR NAME:AC_CDAOID
    !ATTRIBUTES CLASS:E_ProgramAux
    !ATTR SEQUENCE ID:E_SeqID
    !ATTR ENTITY ID:S_ProgramItem
    !ATTR PARENT ID:S_ParentItem
    !ATTR CHILD ID:S_ChildItem
    !ATTR SIBLING ID:S_SiblingItem
    !ATTR NEXT VALUE:S_NextValueItem
    !ATTR KEYWORD:S_Keyword
    !ATTR VALUE:S_Value
    ;// Data Model "Entity types"
    !ENTITY:PROGRAM
    ;// System attributes
    !ATTRIBUTE
    !NAME:Associated_Component
    !TYPE:SYS
    !ATTRIBUTE
    !NAME:AC_CDAOID
    !TYPE:SYS
    !ATTRIBUTE
    !NAME:Hits
    !TYPE:SYS
    !ATTRIBUTE
    !NAME:HitWords
    !TYPE:SYS
    !ATTRIBUTE
    !NAME:Rank
    !TYPE:SYS
    !DL NAME:DKRANK
    ;// Data Model "Entity attributes"
    !ATTRIBUTE
    !NAME:PE_Program_ID
    !TYPE:ENTITY
    !DL NAME:E_ProgramID
    !ATTRIBUTE
    !NAME:PE_Title
    !TYPE:ENTITY
    !DL NAME:E_Title
    !ATTRIBUTE
    !NAME:PE_Subtitle
    !TYPE:ENTITY
    !DL NAME:E_Subtitle
    !ATTRIBUTE
    !NAME:PE_AC_Avg_Image_Bytes
    !TYPE:ENTITY
    !DL NAME:E_AvgChrPerImage
    !ATTRIBUTE
    !NAME:PE_AC_Avg_SGML_Bytes
    !TYPE:ENTITY
    !DL NAME:E_AVgChrPerSGMLAC
    !ATTRIBUTE
    !NAME:PE_Chars_Per_UPM_Tier
    !TYPE:ENTITY
    !DL NAME:E_MaxChrPerUpmTier
    !ATTRIBUTE
    !NAME:PE_Req_Count
    !TYPE:AUX
    !VALUES:*
    !ATTRIBUTE
    !NAME:PE_Related_Material
    !TYPE:AUX
    !VALUES:*
    !ATTRIBUTE
    !NAME:PE_AC_Group
    !TYPE:AUX
    !VALUES:*
    !ATTRIBUTE
    !   NAME:PE_AC_FormID
    !   TYPE:AUX
    !ATTRIBUTE
    !   NAME:PE_AC_ByteCount
    !   TYPE:AUX
    !ATTRIBUTE
    !   NAME:PE_AC_FormDesc
    !   TYPE:AUX
    !ATTRIBUTE
    !NAME:PE_Price_Group
    !TYPE:AUX
    !VALUES:*
    !ATTRIBUTE
    !   NAME:PE_Country
    !   TYPE:AUX
    !ATTRIBUTE
    !   NAME:PE_Monetary_Unit
    !   TYPE:AUX
    !ATTRIBUTE
    !   NAME:PE_Min_Order_Price
    !   TYPE:AUX
    !ATTRIBUTE
    !   NAME:PE_Base_Cust_Pub_Price
    !   TYPE:AUX
    !ATTRIBUTE
    !   NAME:PE_Base_UPM_Fee
    !   TYPE:AUX
    !ATTRIBUTE
    !   NAME:PE_Incr_UPM_Fee
    !   TYPE:AUX
    !ATTRIBUTE
    !   NAME:PE_Source_Price_Per_Page
    !   TYPE:AUX
    !ATTRIBUTE
    !   NAME:PE_UPM_Bytes_Per_Page
    !   TYPE:AUX
    !ATTRIBUTE
    !   NAME:PE_Minimum_Page_Limit
    !   TYPE:AUX
    !ATTRIBUTE
    !   NAME:PE_Volume_Page_Limit
    !   TYPE:AUX
    !ATTRIBUTE
    !   NAME:PE_Status
    !   TYPE:ENTITY
    !   DL NAME:E_Status
    !ATTRIBUTE
    !   NAME:PE_CreateDate
    !   TYPE:ENTITY
    !   DL NAME:E_CreateDate
    !ATTRIBUTE
    !   NAME:PE_CreatedBy
    !   TYPE:ENTITY
    !   DL NAME:E_CreatedBy
    !ATTRIBUTE
    !   NAME:PE_LastModifiedDate
    !   TYPE:ENTITY
    !   DL NAME:E_LastModifiedDate
    !ATTRIBUTE
    !   NAME:PE_LastModifiedBy
    !   TYPE:ENTITY
    !   DL NAME:E_LastModifiedBy


PRODUCTS.ATR file for the group PRODUCTS:
    ; Index class info for Products
    !ENTITIES CLASS:tmpResource2
    !ENTITY ID:SeqID
    !ENTITY TYPE:EntityType
    !ENTITY PARENT ID:ContainerItem
    !ENTITY CHILD ID:ChildItem
    !ENTITY SIBLING ID:SiblingItem
    !ENTITY SUBCOMP ID:SubcompItem
    !ENTITY AUX ID:AuxItem
    !ASSOC COMP ATTR NAME:Associated_Component
    !ASSOC COMP ID ATTR NAME:OID
    !ATTRIBUTES CLASS:tmpAux2
    !ATTR ENTITY ID:EResourceItem
    !ATTR PARENT ID:ContainerItem
    !ATTR CHILD ID:ChildItem
    !ATTR SIBLING ID:SiblingItem
    !ATTR NEXT VALUE:NextValueItem
    !ATTR KEYWORD:EKeyword
    !ATTR VALUE:EValue
    ; Data Model Root Entity Types for Products Group
    ; ENTITY:Product.c
    ; Data Model Attributes for Products-i.e., mapping of metadata properties
    ; from PSF attribute files to data locations in the DL data repository
    !ATTRIBUTE
    !   NAME:TITLE
    !   TYPE:ENTITY
    !   DL NAME:Title
    !ATTRIBUTE
    !   NAME:PRICE
    !   TYPE:ENTITY
    !   DEFAULT:INHERIT
    !   DL NAME:CharCount
    !ATTRIBUTE
    !   NAME:FILE NAME
    !   TYPE:ENTITY
    !   DL NAME:ContentFileName
    !   FILE:1
    !   PART:1
    !ATTRIBUTE
    !   NAME:INDEX TERM
    !   TYPE:AUX
    !   PART:5
    !   TEXT INDEX:TIXTERM
    !ATTRIBUTE
    !   NAME:ITEM INDEX
    !   TYPE:AUX
    !   PART:5
    !   TEXT INDEX:TIXTERM
    !ATTRIBUTE
    !   NAME:AUTHOR
    !   TYPE:AUX
    !   DEFAULT:INHERIT
    !   ATTRIBUTE
    !       NAME:NAME
    !       TYPE:AUX
    !       PART:6
    !       TEXT INDEX:TIXTERM
    !   ATTRIBUTE
    !       NAME:SCHOOL
    !       TYPE:AUX
    !ATTRIBUTE
    !   NAME:Associated_Component
    !   TYPE:COMP
    !   ATTRIBUTE
    !       NAME:OID
    !       TYPE:COMP
    !       DL NAME:ObjectId
    !   ATTRIBUTE
    !       NAME:TITLE
    !       TYPE:COMP
    !       DL NAME:Title
    !   ATTRIBUTE
    !       NAME:SIZE
    !       TYPE:AUX
    !   ATTRIBUTE
    !       NAME:COMP FILE NAME
    !       TYPE:COMP
    !       DL NAME:ContentFileName
    !       FILE:1
    !       PART:1


B. Selecting Content for a Compilation of Content

The selection path for creating a compilation of content is shown in FIG. 6. This path allows a user to interface with the digital library 20 to retrieve and view content objects stored therein, select objects for inclusion in a compilation of content, create new objects for inclusion in the compilation and for storage in the digital library 20, and submit the completed compilation for approval.

In brief, block 22 represents a user interface application 22 which preferably runs within a standard web browser. It comprises HTML and Javascript applications that provide a user interface and some amount of application function such as searching, viewing, selecting, creating, editing, and organizing content accessed from the content server(s). The user creates a compilation in the form of a custom content outline (CCO), which is essentially a formatted text document that includes pointers to the actual content to be included in the final compilation. In the current example for creating custom textbooks, the CCO is called a custom book outline or CBO.

The user-interface application 22 communicates through a web server 26 to an application layer 28. Application layer 28 preferably comprises a set of PERL applications that control some user interface transactions (e.g., login procedures), retrieve data for presentation to the user, perform CCO manipulation and submission, and forward commands to the API Layer 30 to communicate actions requested by the user.

Application layer 28 accesses the content server(s) 18 via API layer 30. The API layer 30 preferably consists of a collection of C++ routines that perform discrete functions such as the actual CCO manipulation functions and digital library 20 functions (e.g., search and retrieve). It also includes a PERL/C++ glue layer between the C++ routines and application layer 28 for bridging parameter lists between C++ and PERL. The API layer 30 is provided to map digital library 20 more closely to the customer's website and application program workflow. Underneath, this API 30 makes use of the digital library API 16 to query/update/delete and retrieve data from digital library 20.

1. Custom Book Outline

Prior to submission, a custom book outline is preferably an abstract representation of the compilation of content being created. For example, the CBO may be a hierarchical outline of the contents to be included in a compilation of content. At this point, it contains only references to the actual content to be included in the final work. This format is more efficient than pulling in content at creation time because it avoids retrieval and manipulation of large BLOBs of information until the CBO is in its final form.

In the present example, the CBO at creation time is a formatted text document comprising a parsable formatted file like the "PSF" filetype previously described. Like the content product PSF files, the CBO is merely an outline with references to the content entities to be included therein. One difference is that a CBO may be a composite PSF+ATR filetype, including attributes particular to the CBO such as author and price. This is the case in the present embodiment.

"Entities" once again refers to the content hierarchy definition. For example, the hierarchy definition of a textbook includes containers representing the higher levels of the hierarchy (cbo.c, volume.c and chapter.c). The smallest entity of the hierarchy is a section. Each entity in the CBO is represented by a sequence ID in the same format as previously described with reference to product entities. The sequence ID of a container entity is used to identify all subentities of that container. The sequence ID's of a leaf node is used to reference the actual content associated with that node.

A CBO according to the present example is stored as a digital library part. Its attributes are also contained in a row of a relational CBO index class defined by the Custom Book Outline Index Class, and its unique identifier for this particular implementation is stored in the User Table, although it could also be stored in the CBO index class The User Table contains this reference for the purpose of identifying the current CBO a user is working with. This allows for the user to log off and log back in and return to the previous CBO "work in progress". The row in the CBO index class includes references to the CBO part number, as well as any associated parts.

FIG. 7 depicts a row 82 representing a CBO. It includes a CBO identifier, CBO attributes, and pointers to one or more PSF files or "parts" associated with the CBO. The first part contains the parsable formatted text outline representing the compilation of content, which in turn includes references to actual product content making up the CBO. A second part comprises a backup downlevel copy of the first part that is used to undo previous transactions. A third part, designated with the number 50 or higher, represents any user-provided content that has been added to the CBO. Each part of this type includes pointers to the actual user-provided content, which is stored in digital library 20.

An example of a CBO is shown below.
    CBO.C:OW1T8$UEB4H3@SE7
    !PE_ID:FE
    !Title:Student Loans
    !Userid:DaveBaer
    !Undoable:FALSE
    !Product_Type:CBO
    !Create_Date:20001209203630
    !Last_Modified_Date:20001214113615
    !Status:0
    !CBO_State_Changed_Date:20001209203630
    !UPM_Terms_And_Conditions_Date:20001214 11:36:13
    !Acknowledgement:
    !Contrib_Group:
    !Price:2216
    !UPM_Price:1000
    !Page_Count:21.8
    !Char_Count:186
    !Nextchapter:2
    !ECtlSGMLChrPerPage:3800
    !ECtlAvgChrPerImage:0
    !ECtlAVgChrPerSMGLAC:0
    !ECtlMaxChrPerUPMTier:2000
    !ECtlSourcePricePerPage:10
    !ECtlUPMBasePrice:1000
    !ECtlUPMIncrPrice:10
    !Country_Code:US
    !PE_Volume_Page_Limit:480
    !PE_Minimum_Page_Limit:80
    !PE_Min_Order_Price:1000
    !UPM_Bytes_Per_Page:1000
    !Base_Cost:1000
        VOLUME.C:V1
        !UPM_price:0
        !Title:My New ESource Book created on 12/09/2000 at 20:36:28
        Volume Number 1
        !Price:216
        !Product_Type:
        !Publication_Media_Type:
        !Page_Count:21.8
            FRONT_AND_BACK_ELEMENT:
            !Title:Table of Contents
            !Page_Count:6
            !Price:60
            FRONT_AND_BACK_ELEMENT:
            !Title:Preface
            !Page_Count:9
            !Price:90
            CHAPTER.C:C1
            !Title:New Chapter
            !Price:16
            !Page_Count:1.8
                SECTION:0137842244.02.02.00
                !Title: Background Ideas
                !SGML_Char_Cnt:2111
                !PE_ID:FE
                !Page_Count:0.6
                !Info_Generated:1
                !Price:6
                SECTION:0137842244.02.03.00
                !Title:Why Study Engineering Ethics?
                !SGML_Char_Cnt:3905
                !PE_ID:FE
                !Page_Count:1.0
                !Info_Generated:1
                !Price:10
                UPM SECTION:50
                !Title:My New UPM Title
                !SGML_Char_Cnt:186
                !AC_Subdoc_Cnt:0
                !AC_Image_Cnt:0
                !Page_Count:0.2