System and method for XML based content management6910040Abstract System and method for a content management system are described. The content management system takes complex hierarchically represented content structures and represents the hierarchical model by way of a relational model that creates node tables and edge tables to represent various content structures. Moreover, the content is separated from the structure such that the same content units may be used by multiple content structures. Claims 1. A computer readable medium bearing computer readable instructions for carrying out the process of Description FIELD OF THE INVENTION
The graphs are converted into a SQL database manager for physical instantiation by moving into set theory and defining the nodes and relationships as entities. The Dublin core forms the base group of data model entity attribution. This forms a quick physical index scheme for content search and use. The Dublin Core Metadata Initiative serves as the definition of commonly used meta data and forms the core meta data of the CMS Store. The invention facilitates the creation of consumable content, based on an expandable razor/blade approach; content is targeted for multiple consumption methods with targeting editorial voice based upon target platform and audience. Different devices such as digital phones, tablets and PC's require different presentation due to device limits and constraints. Services can be targeted using SOAP for application consumption. Audiences can be targeted with separate navigation, editorial voice and content. The following definitions are useful guides in understanding the present invention. Note, however, that the definitions below are known to those of ordinary skill in the art and are presented here for convenience of the reader. XML XML stands for eXtensible Markup Language. It is not a language per se but rather a meta language for creating languages. XML provides structure to a document by using tags. XML is a markup language for creating other markup languages. XML was designed to be extensible and simple to implement and is based on SGML. A document that follows XML rules is said to be well-formed. A document can be invalid, but still be well-formed. A valid XML document conforms to a DTD or XML-Schema. Since XML is text-based data type, it is a lightweight and small, making it an efficient transport protocol for dynamic and consumable content. XML allows the efficient exchange of data between applications making it useful for distributed applications. XML schemas are an agreed upon industry-wide initiative to share common application languages based upon XML. The Organization for the Advancement of Structured Information Standards sponsors the ww.xml.org site. The site hosts industry group schemas. XSL XSL stands for eXtensible Style sheet Language. It is a language for expressing stylesheets and provides display semantics for XML. SXL maps XML elements into HTML or any other formatting language. It is similar in functionality to Cascading Style Sheets (CSS). XML-Schema XML-Schema, a current proposal, is a replacement for DTD. DTD, Document Type Declaration, was designed for legacy text and is not XML compliant. DTD does not support data type validation and supports only one document. A Schema working Group was established to propose the standard to the W3C (see http://www.w3.org/1999/05/06-xmlschema-1/). Schemas support XML syntax and data typing. It is an open content model that supports inheritance, constraints and namespaces. Namespaces are a way to share data between organizations and are a way to avoid element definition collision. Schemas are built in XML and can be used via DOM in Visual Basic or VBScript in ASP. Schemas provide datatypes such as float, currencies as well as relationships between elements. Schemas are extensible and allow for user defined data types. SOAP SOAP stands for Simple Object Access Protocol (sometimes seen as XOAP). SOAP is an XML-based programming interface that is machine and language independent. It will travel through firewalls. It is extensible and loosely coupled. SOAP uses XML for remote invocation of object methods and can interact with COM, CORBA or EJB. SOAP's goals are to create a standard object invocation protocol built on Internet standards, XML and HTTP that is extensible with an evolving payload format. Dublin Core Metadata Initiative A recognized external standards initiative built around Library sciences, the Dublin Core specifies the following fifteen (15) tags for building card catalogs and meta data. These tags could form the basis for classifying and tagging:
FIG. 1 illustrates an overview of the architecture of CMS 10 in accordance with an aspect of the system. CMS 10 provides an application program interface (API) 10a whereby a user uses a CMS Tool UI 14 on a client computer 20b to interface with CMS 10. The user may have an application that keep operates on content in Application DB 13. The user may use a web based interface 16 to access content maintained by CMS 10. Build process 18 generates content from CMS 10 to generate a document, web page, etc. for storage in Application DB 13. The build process then outputs web pages 11, etc. Stored procedures 17 store and retrieve content in Relational DB Management System 12a which manages the structured of the content in accordance with aspects of the invention. NTFS 15 stores various portions of content accessible to the CMS 10. Illustrative Computer Network Environment FIG. 2 illustrates how the system of FIG. 1 may be configured to communicate in a network environment. Here computers 20a-20c and 21a-21b may host various databases such as databases used in CMS 10 and Application DB 13 in accordance with aspects of the present invention. Although the physical environment shows the connected devices as computers, such illustration is merely exemplary and may comprise various digital devices such as PDAs, network appliances, notebook computers, etc. There are a variety of systems, components, and network configurations that support distributed computing environments. For example, computing systems may be connected together by wireline or wireless systems, by local networks or widely distributed networks. Currently, many of the networks are coupled to the Internet which provides the infrastructure for widely distributed computing and encompasses many different networks. The Internet commonly refers to the collection of networks and gateways that utilize the TCP/IP suite of protocols, which are well-known in the art of computer networking. TCP/IP is an acronym for "Transport Control Protocol/Interface Program." The Internet can be described as a system of geographically distributed remote computer networks interconnected by computers executing networking protocols that allow users to interact and share information over the networks. Because of such wide-spread information sharing, remote networks such as the Internet have thus far generally evolved into an "open" system for which developers can design software applications for performing specialized operations or services, essentially without restriction. The network infrastructure enables a host of network topologies such as client/server, peer-to-peer, or hybrid architectures. The "client" is a member of a class or group that uses the services of another class or group to which it is not related. Thus, in computing, a client is a process (i.e., roughly a set of instructions or tasks) that requests a service provided by another program. The client process utilizes the requested service without having to "know" any working details about the other program or the service itself. In a client/server architecture, particularly a networked system, a client is usually a computer that accesses shared network resources provided by another computer (i.e., a server). In the example of FIG. 2, computers 20a-20c can be thought of as clients and computers 21a, 21b can be thought of as servers where server 21a maintains the data that is then exported for use by the client computer 20. A server is typically a remote computer system accessible over a remote network such as the Internet. The client process may be active in a first computer system, and the server process may be active in a second computer system, communicating with one another over a communications medium, thus providing distributed functionality and allowing multiple clients to take advantage of the information-gathering capabilities of the server. Client and server communicate with one another utilizing the functionality provided by a protocol layer. For example, Hypertext-Transfer Protocol (HTTP) is a common protocol that is used in conjunction with the World Wide Web (WWW) or, simply, the "Web." Typically, a computer network address such as a Universal Resource Locator (URL) or an Internet Protocol (IP) address is used to identify the server or client computers to each other. The network address can be referred to as a Universal Resource Locator address. For example, communication can be provided over a communications medium. In particular, the client and server may be coupled to one another via TCP/IP connections for high-capacity communication. In general, the computer network may comprise both server devices and client devices deployed in a network environment (in a peer-to-peer environment devices may be both clients and servers). FIG. 2 illustrates an exemplary network environment, with server computers in communication with client computers via a network, in which the present invention may be employed. As shown in FIG. 2, a number of servers 21a, 21b, etc., are interconnected via a communications network 160 (which may be a LAN, WAN, intranet or the Internet, or a combination of any of these) with a number of client computers 20a, 20b, 20c, etc. Moreover, communication network 160 may comprise wireless, wireline, or combination wireless and wireline connections. Thus, the present invention can be utilized in a computer network environment having client computers for accessing and interacting with the network and a server computer for interacting with client computers. However, the systems and methods of the present invention can be implemented with a variety of network-based architectures, and thus should not be limited to the example shown. The present invention will now be described in more detail with reference to an illustrative implementation. FIG. 3 illustrates how the content management system applies content units to build various web pages. Web page 30 comprises a variety of content units that may be reassembled from CMS 10 in a variety of ways to create web pages for different applications. Web page 30, for example, comprises a table of contents 302, subtitles and abstracts 304, news titles 306, and graphics 308. Data Model Content normally is expressed mathematically following Named Edge Graph Theory (e.g., title, chapter, paragraph, and so on). Graph Theory is used to express XML and has corollaries to directory structures and content objects-the directory tree for storing content in computers. Graph Theory is much more extensible, allowing more than hierarchical relationships called polyhierarchy, but the allegory is easily grasped. Relational databases such as SQL is based on Set Theory (Unions, Intersections, etc.). SQL is a great scalable storage and retrieval mechanism for set based information. Graph Theory can be expressed in Set Theory by modeling Graph Theory nodes as collections and elements and Graph Theory edges as relationships (relations between collections, relations between elements and relations between collections and elements). The edges (relationships) between nodes (collections and elements) are named. Thus we have named relationships between elements and elements (such as synonym), collections with collections (related vocabularies), and collections and elements (is member). This is also known as a labeled-edge graph. In accordance with an aspect of the invention, a distinction is made between leaf nodes containing content and collections containing only structure. CMS 10 is implemented with separate entities for content containing elements and structure collections. This is done primary for performance. Preferably, the edge relationships are named but not enumerated. FIG. 4 Illustrates how the present invention converts a graph structure library, book, or document, e.g., represented by table of contents 45, into relational tables, e.g., 41, 42. The table of contents 45 can be represented by graph 48. Graph 48 is represented by a collection of nodes, e.g., 48a, 48b, etc. The nodes, e.g., 48a, 48b correspond to content and structure in from the structured table of contents 45. Here, root node 45 corresponds to MSDN Library in table of contents 45. Other titles in the table of contents 45 will also be represented by nodes in graph 48. The relationship of titles in the table of contents 45 can be represented by edges in graph 48, e.g., edges 47a, 47b. Simple Content Subject Graph FIG. 5A presents a simple subject graph 502 that further illustrates aspects of the invention. Rendered in XML this Simple Subject graph is:
Well formed XML has matching markups and follows the rules of XML creation. A valid XML has an XML-Schema (XSD) and conforms to that XML-Schema (XSD). The XML-Schema for the example is:
XML attributes of an XML element can be expressed as either XML elements or XML attributes of the elements. The decision on which to use is made on the concept of future node use. It is easier to break down elements into future element structures than it is to change XML element attributes into XML elements. If no further breakdown is anticipated then attribution is acceptable. XML element definition can be used exclusively and is potentially more flexible in the future. Expressing the XML graph in a relational data model maps the XML attributes as data model attributes (or sometimes called properties-physically modeled as the columns in the SQL table) of the data model entities (set theory). CMS 10 is preferably implemented with a distinction between leaf nodes containing information and structure nodes (collections). Container/Leaf Structure
As illustrated in FIG. 5B, content 504 is attributed with various terms to indicate content subject 506. Content carries metadata and relationships as well as key values pairs 508 for attribution. As shown in the entity relationship diagram of FIG. 5C, the content or resource instance is attributed by converting the resource graph into set theory by placing the resource instance (DR_Content) 52 and attribute instance (DR_Attribute) 56 into SQL tables. The labeled edge becomes a many to many relationship table (DR_ContentAttribute) 54. The content associations can be extended to include tree structure that forms the basis of the Table of Contents (TOC). The structure or tree (root, node and leaf) is represented in CMS 10 as separate and distinct tables from the content entity. The tree structure begins with a root element and constructs labeled edge relationships with nodes. The end node or leaf is a relationship to content. Entities can be thought of as "files" and Structure can be thought of as "directories". Structure can also occur within documents-with introductory paragraph, body paragraph etc. The structure node is typed with a "document" to note the leaf node of the TOC navigation structure and the "root" node of the document structure. This makes explicit the distinction between bounded "internal" structure and unbounded "external" structure. Structure Entity Relationship Model As shown in FIG. 5D, the tree structure graph nodes are represented as Nodes. The labeled edge forms a many to many relationship entity (DR_Branch) 51 and is labeled with a Branch Type 53. Note that the Node entity is used twice 50a, 50b, and thus collapses into a single entity used twice in the relationship (Branch) as shown in FIG. 5E. The Node entity is used for both nodes of the graph. Content Ownership Content also has a notion of ownership. The ownership of content can be represented in graph form as shown in FIG. 5F. For example, certain content in the directed graph 550 including TechNet 556, MSDN 554, and BDM 552 is owned by slevy 546, amyi 544, and tpetras 542. All of this is owned by RPU 532, which is in turn owned by kimsau 522. Entity Relationship Model The three graphs: Owner, Structure and Content have different properties. The owner entity does not carry the Dublin core properties. Structure and Content have different relationships and some properties not in common. Content carries a relationship to content data, format and status that structure does not. All three graphs can be represented by a single ER model as shown in FIG. 5G.0 Attributes (DR_Attributes) 56 are grouped in sets (DR_AttributeSet) 562 and sets are grouped in Attribute groups (DR_AttributeGroup) 564. The content entity is extended by recognizing content versions. The actual content data is related to the content version. A content type is extended and relates to structure as well. A status is applied to version. The Content Entity (DR_Content) 52 is related to an Owner (DR_Owner) 533 and the status (Dr_ContentStatus). Entity Model Glossary The entities and relationships of the ER diagram of FIG. 5G is further described below: Resource Instance Cluster Resource: An resource is a unit of Knowledge Management, a content item. Download Overviews, KB Articles and Book Overviews all represent objects. DR_Content 52
Resource Attribution Cluster
DR_Attributes 56
DR_AttributeSet 562
DR_AttributeGroup 564
DR_ContentAttribute 54
DR_NodeAttribute 512
Resource Tree Structure Cluster DR_Node 50
DR_Branch 51
DR_BranchType 53
DR_NodeNode 505
DR_NodeContent 510
Resource Owner Cluster DR_Owner 533
DR_OwnerOwner 523
FIG. 6 further illustrates the data model in accordance with the present invention by providing a database schema wherein content in the form of a graph structure in converted into and stored as a relational model wherein it can be accessed, searched and manipulated by a database management system. With reference to FIG. 7, an exemplary system for implementing the invention includes a general purpose computing device in the form of a computer 110 that could form a portion of client computer 20a-20c or server computers 21a, 21b (see FIG. 1). Components of computer 110 may include, but are not limited to, a processing unit 120, a system memory 130, and a system bus 121 that couples various system components including the system memory to the processing unit 120. The system bus 121 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus (also known as Mezzanine bus). Computer 110 typically includes a variety of computer readable media. Computer readable media can be any available media that can be accessed by computer 110 and includes both volatile and nonvolatile media, removable and non-removable media. By way of example, and not limitation, computer readable media may comprise computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CDROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by computer 110. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term "modulated data signal" means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of any of the above should also be included within the scope of computer readable media. The system memory 130 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 131 and random access memory (RAM) 132. A basic input/output system 133 (BIOS), containing the basic routines that help to transfer information between elements within computer 110, such as during start-up, is typically stored in ROM 131. RAM 132 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 120. By way of example, and not limitation, FIG. 7 illustrates operating system 134, application programs 135, other program modules 136, and program data 137. Computer 110 may also include other removable/non-removable, volatile/nonvolatile computer storage media. By way of example only, FIG. 7 illustrates a hard disk drive 141 that reads from or writes to non-removable, nonvolatile magnetic media, a magnetic disk drive 151 that reads from or writes to a removable, nonvolatile magnetic disk 152, and an optical disk drive 155 that reads from or writes to a removable, nonvolatile optical disk 156, such as a CD ROM or other optical media. Other removable/non-removable, volatile/nonvolatile computer storage media that can be used in the exemplary operating environment include, but are not limited to, magnetic tape cassettes, flash memory cards, digital versatile disks, digital video tape, solid state RAM, solid state ROM, and the like. The hard disk drive 141 is typically connected to the system bus 121 through an non-removable memory interface such as interface 140, and magnetic disk drive 151 and optical disk drive 155 are typically connected to the system bus 121 by a removable memory interface, such as interface 150. The drives and their associated computer storage media discussed above and illustrated in FIG. 7, provide storage of computer readable instructions, data structures, program modules and other data for the computer 110. In FIG. 7, for example, hard disk drive 141 is illustrated as storing operating system 144, application programs 145, other program modules 146, and program data 147. Note that these components can either be the same as or different from operating system 134, application programs 135, other program modules 136, and program data 137. Operating system 144, application programs 145, other program modules 146, and program data 147 are given different numbers here to illustrate that, at a minimum, they are different copies. A user may enter commands and information into the computer 110 through input devices such as a keyboard 162 and pointing device 161, commonly referred to as a mouse, trackball or touch pad. Other input devices may include a microphone 163, joystick, game pad, satellite dish, scanner, or the like (not shown). These and other input devices are often connected to the processing unit 120 through a user input interface 160 that is coupled to the system bus, but may be connected by other interface and bus structures, such as a parallel port, game port or a universal serial bus (USB). A monitor 191 or other type of display device is also connected to the system bus 121 via an interface, such as a video interface 190. In addition to the monitor, computers may also include other peripheral output devices such as speakers 197 and printer 196, which may be connected through an output peripheral interface 195. The computer 110 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 180. The remote computer 180 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer 110, although only a memory storage device 181 has been illustrated in FIG. 7. The logical connections depicted in FIG. 7 include a local area network (LAN) 171 and a wide area network (WAN) 173, but may also include other networks. Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets and the Internet. When used in a LAN networking environment, the computer 110 is connected to the LAN 171 through a network interface or adapter 170. When used in a WAN networking environment, the computer 110 typically includes a modem 172 or other means for establishing communications over the WAN 173, such as the Internet. The modem 172, which may be internal or external, may be connected to the system bus 121 via the user input interface 160, or other appropriate mechanism. In a networked environment, program modules depicted relative to the computer 110, or portions thereof, may be stored in the remote memory storage device. By way of example, and not limitation, FIG. 7 illustrates remote application programs 185 as residing on memory device 181. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers may be used. The various techniques described herein may be implemented with hardware or software or, where appropriate, with a combination of both. Thus, the methods and apparatus of the present invention, or certain aspects or portions thereof, may take the form of program code (i.e., instructions) embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other machine-readable storage medium, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention. In the case of program code execution on programmable computers, the computer will generally include a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. One or more programs are preferably implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the program(s) can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language, and combined with hardware implementations. The methods and apparatus of the present invention may also be embodied in the form of program code that is transmitted over some transmission medium, such as over electrical wiring or cabling, through fiber optics, or via any other form of transmission, wherein, when the program code is received and loaded into and executed by a machine, such as an EPROM, a gate array, a programmable logic device (PLD), a client computer, a video recorder or the like, the machine becomes an apparatus for practicing the invention. When implemented on a general-purpose processor, the program code combines with the processor to provide a unique apparatus that operates to perform the indexing functionality of the present invention. While the present invention has been described in connection with the preferred embodiments of the various figures, it is to be understood that other similar embodiments may be used or modifications and additions may be made to the described embodiment for performing the same function of the present invention without deviating there from. Therefore, the present invention should not be limited to any single embodiment, but rather construed in breadth and scope in accordance with the appended claims.
|
Same subclass Same class Consider this |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
