|
|
|
Manipulating data structure (e.g., compression, compaction, compilation) |
Method and apparatus for facilitating use of hypertext links on the world wide web6772139
Abstract
A database server contains pointers to useful information, such as on the World Wide Web. Users of the server may have hypertext links added automatically into documents they submit. Users may additionally contribute to the link database, thereby extending it, and may add additional qualifying information pertaining to the links.
Claims
What is claimed is:
1. A system for organizing competing explanatory information, the system comprising:
hierarchical database means, including hierarchically organized nodes of information;
key-phrase association means, wherein a node of said hierarchical database is associated with a set of one or more key phrases, wherein the key phrases in the set of key phrases share a substantially common meaning;
competing definitions means, wherein a plurality of competing definitions is associated with said set of key phrases, wherein each said competing definition substantially explains the common meaning shared by the key phrases;
property association means, wherein the property association means associates two or more of the plurality of competing definitions with two or more definition properties to produce a set of associated definition properties, wherein at least one of said definition properties is selected from the group consisting of educational level, language, rating, viewer suitability, resource type, context, number of hits, number of installs, and date;
rank-ordering means, wherein the rank ordering means rank orders the plurality of competing definitions based on at least one of said associated definition properties.
2. The system of claim 1, further comprising
search means for specifying a subset of said information in said hierarchical database by means of a keyword-based search;
output means for returning said information subset as a search result.
3. The system of claim 2, wherein said search means further comprises means for specifying a subset of said information in said hierarchical database based on an assigned context for said information.
4. The system of claim 2, wherein said search means further comprises means for specifying a subset of said information in said hierarchical database based on at least one of said definition properties.
5. The system of claim 4, wherein at least two of said associated definition properties correspond to a rating.
6. The system of claim 4, wherein at least one of said associated definition properties corresponds to an owner.
7. The system of claim 1, further comprising
browsing means supporting navigation within said hierarchical database;
output means for returning information corresponding to a current position within said hierarchical database.
8. The system of claim 7, further comprising node selection means based on assigned contexts, and wherein said output means returns information in accordance with said node selection.
9. The system of claim 7, further comprising definition selection means, wherein said competing definitions are selected based on at least one of said associated definition properties, and wherein said output means returns information in accordance with said definition selection.
10. The system of claim 1, wherein at least two of said associated definition properties correspond to a rating.
11. The system of claim 1, wherein at least two of said associated definition properties correspond to an education level.
12. The system of claim 1, wherein paths to nodes of said hierarchical database correspond to semantic contexts.
13. The system of claim 12, further comprising
input means for specifying a set of one or more input key phrases;
ordered search means for searching said semantic contexts in a predetermined order for occurrences of key phrases in said set of input key phrases associated with said nodes of information;
search retrieval means for returning one or more competing definitions from said nodes of information associated with said set of input key phrases.
14. The system of claim 13, wherein said ordered search means further includes means for using a semantic context to limit said searching to a subset of said nodes of information.
15. The system of claim 13, wherein the search retrieval means further includes means for returning one or more competing definitions based on a property list of at least one of said competing definitions.
16. The system of claim 13, wherein the search retrieval means further includes means for returning one or more competing definitions based on an order in which the contexts are ordered by said ordered search means.
17. The system of claim 13, further comprising
link installation means for automatically linking one or more occurrences in a document of an input key phrase to one or more of said competing definitions returned by said search retrieval means.
18. The system of claim 13, wherein the ordered search means terminates upon a first occurrence of a matching key phrase matching any of said input key phrases.
19. The system of claim 18, wherein the search retrieval means returns one or more competing definitions ordered according to one or more properties of said property list.
20. The system of claim 18, wherein the search retrieval means returns a highest ranked competing definition for the matching key phrase.
21. The system of claim 20, further comprising
link installation means, wherein one or more occurrences in a document of an input key phrase is automatically linked based on the highest ranked competing definitions for the matching key phrase.
22. A computer-implemented method for providing a database of competing definitions, the method comprising:
a) storing in a database a hierarchically organized set of nodes;
b) associating with a node in the hierarchically organized set of nodes a set of one or more key phrases, wherein the key phrases in the set of key phrases share a substantially common meaning;
c) receiving and storing a plurality of competing definitions for the node, wherein each of the competing definitions provides a different explanation of the substantially common meaning shared by the key phrases associated with the node;
d) associating properties with the competing definitions to produce a set of associated definition properties, wherein said wherein at least one of said properties is selected from the group consisting of educational level, language, rating, viewer suitability, resource type, context, number of hits, number of installs, and date; and
e) rank ordering the competing definitions using at least one of the associated definition properties.
23. The method of claim 22, wherein said property list includes a property corresponding to an educational level.
24. The method of claim 22, wherein said property list includes a property corresponding to a rating.
25. The method of claim 22, wherein paths to nodes in the hierarchically organized set of nodes correspond to semantic contexts.
26. The method of claim 25, further comprising:
receiving input key phrases;
searching the database nodes in a predetermined order for occurrences of the input key phrases in matching database nodes;
retrieving one or more competing definitions from the matching database nodes.
27. The method of claim 26, wherein the searching is limited based on a semantic context to a subset of said database nodes.
28. The method of claim 26, wherein the retrieving comprises using one or more properties in the property list associated with the competing definitions in the matching database nodes.
29. The method of claim 26, wherein the searching terminates upon a first occurrence of a matching key phrase matching any of said input key phrases.
30. The method of claim 29, further including
linking one or more occurrences of an input key phrase in a document based on the retrieved competing definitions.
31. The method of claim 29, wherein the step of retrieving returns a highest ranked competing definition for the matching key phrase.
32. The method of claim 31, further including
linking one or more occurrences of an input key phrase in a document to the highest ranked competing definition for the matching key phrase.
Description
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to facilitating access to information over a computer network such as the Internet. More particularly, the present invention relates to technology for partially automating the linking of documents on the World Wide Web by authors of Web content. Such techniques are particularly useful for more easily creating richly interconnected information on the Web.
2. Description of Related Art
The World Wide Web provides an enormous distributed database of information interconnected physically by the Internet. One of the main difficulties for users of the Web is finding needed information out of the tremendous quantity of information that is available. Various mechanisms have been developed to address this problem.
One mechanism for facilitating access to information on the Web is the index website. An index website is typically a server computer connected to the World Wide Web which maintains an index of Web content that can be searched in various ways by users (clients) connected to the server over the Internet. Indexes are often updated automatically by means of "spiders" which systematically explore the Web looking for new or updated content. Most search engines also provide means for users to install information to be indexed, so that such information may be indexed immediately without waiting for a spider to find it. An example of a premier search engine is the "Alta Vista" website, accessible on the Web at the Universal Resource Locator (URL) address http://www.altavista.com.
A difficulty with search engines is that search results typically contain too much undesired information as well as the desired information. This occurs because the information content of the Web is vast, and because it is difficult for users to construct search parameters in such a way as to pass most desired content while rejecting most undesired content. As a result, users typically must spend a lot of time sifting through search-engine results and/or refining their searches with additional restrictions in the search parameters. Additionally, the information stored in the index is not organized in a form suitable for browsing in a logical order.
Another mechanism developed to facilitate access to information on the World Wide Web is the directory website which presents a hierarchical directory of information that can be browsed by the user. Premier sites of this nature include Yahoo (http://www.yahoo.com), Netscape (http://www.netscape.com), and Excite (http://www.excite.com). A visitor to such a site is first presented with a top-level list of topics. Choosing a topic by clicking on a topic's hypertext link with the mouse produces a list of subtopics, and so on, until a final level is reached at which useful information is displayed about the topic, or else a remote website pertaining to that topic is visited. Directory companies such as Yahoo typically have teams of editors who explore the Web looking for content suitable for reference at their site, and these workers perform a function analogous to the automatic "spiders" used by automated index websites. Like the search engines, directory websites normally support searching within the directory site, thus producing search results of generally higher quality and less "clutter" than typically encountered on an index site. Also like index websites, directory websites typically allow submission of content for reference, subject to editorial consideration. Thus, directory websites improve over index websites by providing editorial selection, logical organization, and browsing capability, all of which are absent in typical index websites.
A first difficulty, however, with directory websites is that they cannot reasonably keep up with the vastness of the information on the World Wide Web by means of manual editorial selection. As a result, directory websites tend to offer far less information relative to index websites. A second difficulty with directory servers is that their content is proprietary and controlled by a team of editors at one company. This editorial control, while ensuring consistently high quality on the site, makes it difficult and sometimes even infeasible for an information provider to obtain a desired listing in the hierarchical directory. One directory site that addresses this difficulty is the Open Directory project (http://dmoz.org/); The Open Directory allows any user on the Internet to become an "editor" for a particular topic at the site. A third difficulty related to the first is that typical directory sites are extremely broad in scope, contributing to the absence of specialized information that is not of interest to a wide general audience.
A difficulty with both index and directory websites is that information is presented without regard to the user's level of education. It is therefore often possible for a high-school senior working on a book report, for example, to encounter information understandable only by a graduate student in a specialized field. There is similarly normally no means for selecting information according to its type or source or other potentially desirable criteria.
To assist users in selecting sources of information, some websites provide a user rating system (or "scoring system") to which any user may contribute. An example of this mechanism is seen in the online book-store website http://www.amazon.com/. Amazon allows any user to contribute a "book review" and an overall rating on a five-star scale. The average rating is displayed for each book, and books which match the user's search criteria are displayed sorted according to decreasing score (and possibly other criteria such as the number sold). An interesting feature of the Amazon rating system is that it is democratic, allowing the vast quantity of World Wide Web users to jointly develop a ranking of the information sources (in this case books). Such a scheme addresses the difficulty of sorting through enormous quantities of information by harnessing a potentially enormous base of users as contributing editors, in effect. A difficulty with rating systems is that they are generally used only at the site where the ratings are collected, and no mechanism is provided for making use of the ratings elsewhere, such as in other documents on the Web linking to the same information.
An important mechanism integral to the function of the World Wide Web is the HyperText Markup Language (HTML) which is a text format supported by Web browser programs (such as Netscape Navigator or Microsoft Internet Explorer). A more recent variant called XML is now gaining support, and its function is similar to that of HTML for present purposes. HTML provides for the specification of hypertext links in Web-page text displayed by the browser. At a minimum, a hypertext link consists of text to be displayed by the browser and a link target which is usually not displayed. For example, the HTML code
<a href="http://www.w3k.org">W3K website</a>
contains the text (also known as the anchor) "w3K website", while the link target is http://www.w3k.org which is a URL pointing to the W3K website. Thus, the link target is normally addressed by a URL pointing to information on the Web about the displayed word or phrase. (The complete HTML format specification may be found online at the URL http://www.w3.org/.) To the browser user, the anchor text of a hypertext link as above appears in a Web-page display as an underlined word or phrase, e.g.,
Visit the W3K website for more information regarding automatic link installation,
and usually in a different color than normal, unlinked text. By clicking on the hypertext link with the mouse, the user directs the browser program to "follow the link" by "navigating" to the URL associated with the link. The link-target URL may point to another Web page anywhere on the World Wide Web, or it may simply point to another location within the same electronic document. Hypertext links in HTML documents make it much easier for the user to explore the World Wide Web by visiting Web pages and clicking on the links found therein. Web browsers further make it easy to return to the page containing the link by using the "back" button, or the "history" list of visited pages maintained by the browser.
A difficulty with hypertext links is that they must be laboriously added by Web content providers. Typical HTML editors merely provide a data-entry form in which the URL for the link target can be typed. A second shortcoming of HTML and Web browsers is that there is no standard mechanism for specifying link properties such as educational level, type of resource, information source, or the like, which could be supported by Web browsers to give the user finer control of link display based on link properties. After the links are typed in, they must be maintained as their URLs change, and as new and better link-targets become available. There is therefore a need for automated assistance with entering, maintaining, and improving hypertext links in documents intended for a hypertext document environment such as the Web.
SUMMARY OF THE INVENTION
It is a primary object of the present invention to facilitate the addition of hypertext links (also called "hyperlinks,""links," or "definitions") to documents intended for access on the Internet via the World Wide Web. Accordingly, the present invention is designed to provide a link installation service which automatically installs hyperlinks within information submitted to the service by hypertext authors. Submissions may be in HTML format, plain ASCII format, LaTeX source format, or a variety of additional formats to be added in the future. The output returned to the user may be in either HTML or LaTeX source format (which may be compiled into HTML format). Criteria can optionally be specified which govern the installation of hyperlinks.
The invention further provides selectable databases of hyperlinks, organized by category (or "context"), which can be optionally selected for automatic link installation. It is further provided that content developers may add their own links to the existing link databases, and they may additionally create new link databases and specify their relation to the existing link databases. Contributing users are preferably required to have a known, verified email address. A user with a verified email address is called a "known user". The invention further provides means for browsing the link databases in a logically organized, hierarchical tree structure, wherein higher-level nodes correspond to more general contexts, and lower-level nodes correspond to more specialized contexts. The link databases can additionally be searched for keyword matches within component fields. Users may provide ratings and/or reviews for individual links in the link databases.
The hyperlink databases of the present invention support various optional "properties" associated with each hyperlink. One such property, useful in the development educational content, is a level designation which indicates the educational level required for best understanding of the link-target information. Additional optional properties include the language of the content (such as English), a viewer suitability rating such as exists for movies (PG-13, R, etc.), and properties defined by the user. Link properties can be specified by users to control the automatic installation of links, and/or to control what is displayed while browsing the link databases.
Educational levels not specified on submission are estimated based on the level of links found within the link target document. As a result, every link in the link database is assigned either an educational level, either manually or automatically. Determining levels automatically detects any "cycles" in the link database. (A "cycle" occurs when document A links either directly or indirectly to document B, and document B links either directly or indirectly to document A.) Cycle detection can help content providers eliminate inadvertent "forward references." Means are provided for marking forward-reference links in submitted documents so that educational level will not be affected. Cycle-free systems of links can be more effectively used as a basis for online course materials.
Another feature of the present invention is the ability for users to rate (or score) the quality of any link in the database and/or to submit a written review of any link. The quality ratings may be averaged together and used to determine the relative ordering of the links when there are multiple link targets for the same word or phrase ("competing definitions"). In the typical case of HTML format, features of the JavaScript scripting language may be used to provide convenient access to multiple link targets, ranked according to score. Alternatively, the latest ranked list of competing definitions may be maintained on a central server on the Web, with the installed link pointing there, instead of containing only a snapshot at the time of link installation, which may rapidly go out of date. Alternatively, the currently highest rated link may be installed in the user's Web document for each recognized topic.
BRIEF DESCRIPTION OF THE DRAWING FIGURES
FIG. 1 shows an example initial Web page seen by a visitor using a Web browser to access the online version of the service.
FIG. 2 shows a Web page giving an overview of the capabilities of the online service.
FIG. 3 shows an example Web top-level page seen while browsing the hyperlink databases.
FIG. 4 shows an example lower-level page seen while browsing the hyperlink databases, in which the context has been narrowed considerably.
FIG. 5 shows an example browsing view at the level of a key phrase in which all displayed links are interpreted as "definitions" for the key phrase.
FIG. 6 shows an example form for adding a new link (definition) to the link database for the current key phrase.
FIG. 7 shows a Web page for submitting text for link installation.
FIG. 8 depicts the tree structure of the hierarchical link database.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
The following is a description of the best presently contemplated modes of carrying out the invention. The descriptions are not to be taken in a limiting sense but are made for the purpose of illustrating the general principles of the invention. It is particularly noted that the invention may be implemented in a variety of different file formats, database technologies, search and replace methods, computer processors and system architectures, host operating systems, network protocols, user-interface frameworks, and the like.
Client-Server Architecture on the World Wide Web
FIG. 1 illustrates how a World Wide Web "home page" might appear on a website embodying the principles of the present invention. The user has several choices of where to "navigate" next:
The first choice 101 is a hypertext link entitled "Learn about the W3K," where in this example, "W3K" is an acronym standing for the "World Wide Web of Knowledge." If this choice is selected by clicking the mouse on the underlined text, the visitor "navigates" to the Web page shown in FIG. 2 which provides an overview of the online service provided by the W3K. In particular, it is explained how submitting plain text 110 to the W3K server will result in hyperlinked text 111 being returned to the user. A summary 115 of high-level functions is also provided in FIG. 2.
The second choice in FIG. 1 is a hypertext link 102 entitled "Browse the W3K." If this choice is selected by clicking the mouse on the underlined text, the visitor "navigates" to the Web page shown in FIG. 3 supporting browsing of the hyperlink databases, as described further below.
The third choice in FIG. 1 is a hypertext link 103 in which the text displayed by the Web browser is "Install W3K links in a Web document." If this choice is selected, the visitor is taken to the Web page of FIG. 7 where the user can submit text for link installation in a variety of formats. The text is returned to the user by the server with hypertext links installed according to the user's specifications. Link databases to be searched can be collected into a list during the browsing operation.
The fourth choice, "Add to or Edit the W3K," is a link 104 to a Web page for editing the link databases. Editing operations include including submitting new links, creating new link categories, and changing previously submitted links or link properties. These editing functions are also available while browsing the databases.
The fifth and final choice, "Search the W3K Dictionaries," is a link 105 to a Web page for specifying search criteria in terms of link properties. The search collects together all links in the link databases matching the search criteria, and displays them organized by properties according to user specifications. The search feature is useful for collecting various link subsets together for various purposes including link installation, editing link properties, and other functions involving groups of links. As an alternative to a list display format, a sparse context hierarchy can be generated, containing only the database information matching the search criteria; the sparse hierarchy can then be conveniently browsed by the user.
These functions are described in further detail below.
Overview and Terminology
This section introduces the main terms which will be used hereafter.
Hierarchical Contexts
The link databases are organized hierarchically according to category, somewhat like the Dewey decimal system for library organization. Each category (or "directory") is interpreted as a context analogous to a field of study. Each context may itself contain any number of contexts ("subcontexts," or "subdirectories"), and it may additionally contain a database of information pertaining to that context (which may be a implemented in a file in that directory).
The particular sequence of directories obtained by visiting one subdirectory after another is called a path. Every context may be identified by the directory path that reaches it from the top level. Thus, the set of all contexts form a "tree structure" analogous to the hierarchical file systems used by all major computer operating systems at the present time.
Dictionaries
A link database (or "dictionary") preferably comprises a list of (key,URL) pairs. A key (or "key phrase" or sometimes "word") identifies a topic or concept, and the URL points to information about that topic on the Internet. In a loose analogy with an ordinary dictionary, the key is the "word being looked up", and the URL points to its "definition". However, unlike an ordinary dictionary, the (key,URL) pairs in the link database are interpreted within the particular context associated with the directory containing that dictionary. In a somewhat better analogy with a technical encyclopedia in a particular field, the key corresponds to the noun phrase identifying a technical topic for which an article exists in the encyclopedia, the URL may correspond to the page number on which the article begins, and the context may correspond to the technical field for which the encyclopedia was written.
Because dictionaries are interpreted in a particular context, alternate definitions are not allowed. In other words, a context is preferably sufficiently narrow such that all terms (words or key phrases) in that context have a unique meaning. Ordinary "flat" dictionaries must accommodate alternate definitions for a single word, while "hierarchical dictionaries" need not. Thus, if a term is found to have a second meaning in a particular context, it is time to create one or more subcontexts in which that term is disambiguated.
Synonyms
A single URL can provide only one "definition". However, a single URL can be used to "define" any number of key phrases, which are then regarded as synonyms. Often the title of the addressed HTML page on the Web is the "key phrase" that is "defined" by the URL. When there are several (key,URL) pairs having the same URL, the different keys are treated as alternate phrasings for the same concept or topic, and are said for form a synonym group. The following example dictionary entries provide an example of a synonym group:
KEY=Taylor Series Expansion
URL=http://www.mathworld.org/analysis/TaylorSeries.html
KEY=Taylor Expansion
URL=http://www.mathworld.org/analysis/TaylorSeries.html
KEY=Taylor Series
URL=http://www.mathworld.org/analysis/TaylorSeries.html
Order is important in the dictionary because "the first match wins" during automatic link installation. For example, with the above ordering, the key phrase "Taylor Series Expansion" will match before checking for "Taylor Expansion" or "Taylor Series". Ordering equivalent key phrases from longest to shortest ensures that the longest possible match will occur in documents submitted for link installation.
Synonyms can be listed in a link's properties, or they can simply be entered as additional link entries pointing to the same link target (URL), since links take up relatively little space.
"Key Phrase" Directories
A "key phrase" may be understood as a bottom-level subdirectory of the context tree. A key-phrase directory holds a dictionary (link database) containing at least one link. This database may be implemented as a file residing in a directory having a name derived from the key phrase. Preferably, however, all key phrases in a particular context (together with their links), plus perhaps additional contexts, are implemented in a single larger database file in the parent context directory. For simplicity, however, a key phrase will nevertheless be considered logically to be a bottom-level directory (leaf node) in the hierarchical context directory, irrespective of implementation details associated with the use of a hierarchical file system.
All of the links in the key-phrase directory are interpreted as competing sources of information on the one topic identified by the key phrase. The tangible difference between a key-phrase directory and a context directory is that the key-phrase directory has no subcontexts, only links. Thus, a bottom-level directory in the context tree hierarchy (a "leaf node" of the context tree) corresponds to a single concept or topic, and all of the (key,URL) pairs in its dictionary pertain to that one topic. The number of distinct URLs present is the number of competing sources of information.
Perhaps the simplest means for handling synonyms is to add a key-phrase directory for each alternate phrasing of each topic. Because order is important when traversing a synonym group, the context-tree implementation must provide a means for ordering sub-directories, at least when those sub-directories correspond to key phrases. Alternatively, an ordered link database file may reside in the context directory containing the key phrase, and include all other key phrases in that context as well (including synonyms); the key phrase(s) corresponding to each link may be stored as link properties.
In the preferred embodiment, synonyms are not handled as separate key-phrase directories. Instead, a single representative is selected (usually the most descriptive or canonical), and all other equivalent phrasings (synonyms) are listed in a separate synonym file in the key-phrase directory. (Order is carefully preserved.) During browsing, synonyms are displayed at the bottom of the key-phrase page.
Context Synonyms
The preferred embodiment also supports context synonyms, as opposed to key-phrase synonyms just described. Context synonyms are presently implemented using symbolic links in a UNIX file system implementation of the context tree. As an example, the context hierarchy below illustrates two different paths to the subcontext (directory) "Sound_Synthesis", where the notation "-->" indicates a symbolic link, as is typically done when listing files in a UNIX file system:
Humanities
Music
Computer_Music
Sound_Synthesis
Engineering
Electrical
Signal_Processing
Sound_Synthesis -->/Humanities/Music/Computer.sub.13 Music/Sound_Synthesis
In this example, the "true parent" of the node Sound_Synthesis is Computer_Music, while the parent Signal_Processing is a "linked parent". There can be any number of linked parents, but only one true parent.
Symbolic links provide a means for reaching multidisciplinary fields by browsing the constituent fields in a top-down way. At any time, a symbolic link may be deleted and replaced with a copy of some or all of the directory which was formerly linked (possibly utilizing symbolic links at a lower level). In this way, closely related contexts may start out as identical, but later may evolve into separate collections, as the maintainers see fit.
Context Dictionaries
The dictionary corresponding to a particular context is defined as the union of all key-phrase dictionaries in that context. In other words, the dictionaries belonging to all key-phrase subcategories in the particular context are logically concatenated together into one large dictionary, with additional dictionary entries added for all synonyms. This dictionary is referred to as the "context dictionary". The order in which the key phrases are concatenated is prescribed.
When installing links in submitted documents, several context dictionaries are normally combined together to form a larger "aggregate dictionary" which is what is used for link installation. A typical aggregate dictionary consists of the context dictionary for the "current context" (established, e.g., by browsing), followed by the context-dictionaries of all subcontexts (usually not in any particular order, unless explicitly listed by the user), then followed by the context dictionary of the "true parent" context, followed by the context dictionary of the true parent's true parent, and so on, until the context dictionary of the top-level directory is appended (which contains extremely generic terms). Linked parent dictionaries may also be added in where desired. Since order is respected during link installation, definitions provided in the "current context" will receive first precedence, followed by definitions occurring in subcontexts (which are considered within the current context), followed by the more generic definitions of parent contexts. Since "the first match wins" in link installation, generic terms defined in parent contexts are "overridden" by more specialized definitions of the same terms of art in the current context. For example, the word "resolution" might be defined at the top level as the first definition appearing in an ordinary dictionary of the English language, while in the context of " . . . /Signal_Processing/Spectrum_Analysis" it would be given its more arcane definition regarding the resolving power of a short-time Fourier transform.
Browsing the Link Databases
There several benefits to providing browsing of the link databases:
it provides a unique educational resource which organizes valuable information on the Web in a manner especially well suited for educational purposes;
it provides a convenient means for learning what links are available for installation in documents;
it provides a convenient means for collecting context dictionaries for subsequent use in automatic link installation. While browsing, links and/or entire context subtrees can be marked for inclusion or exclusion in subsequent automatic link installations;
it provides a convenient means for navigating to contexts in which subcontexts and/or key-phrases can be added and/or edited by the user, or to key-phrase directories in which links can be added and/or edited and/or rated; and
by displaying links selectively according various link properties, browsing provides a means for viewing useful link subsets, such as all links entered by the user or user's group.
Link database browsing support on the server may be implemented in a variety of ways. As an example, there are commercially available scripts which implement directory websites, such as the links-2.0 scripts from Gossamer Threads, Inc., and such scripts can be adapted to implement the hierarchical dictionary of the present invention. FIGS. 3 and 5 illustrate the appearance of such a browsing system. Alternatively, one may use HTML SELECT pop-up menus, which are dynamically generated from the current directory contents. However, for performance reasons, static HTML pages are preferable over dynamic HTML generated by the server, when feasible. To provide more context and ease of navigation, the database directory structure may additionally be displayed in a fixed HTML frame on the left, as is currently done on many websites. For example, the way directory trees are displayed on the left in Microsoft Windows Explorer is a good model.
FIG. 3 illustrates a Web page display at the start of browsing. The top-level context is displayed. In this simplified example, only four top-level subcontexts are offered (Computing 131, Education 132, Legal 133, and Music 134). Each of these words is a hypertext link, which can be clicked with the mouse to navigate to the associated sub-context. For example, clicking on Music 134, then on "Computer Music" (which is available in the Music context), then on "Signal Processing", then finally on "Sound Synthesis" produces the page shown in FIG. 4.
The Standard Browsing Menu
Near the top of each page during browsing is a set of hypertext links 130 separated by a vertical bar `.vertline.`. This is the "standard menu" appearing at the top of every page while browsing the W3K website and at other times as well. Each of these links allows the user to carry out some available function.
The "W3K Home" link in the standard menu 130 takes the browser back to the initial W3K home page illustrated in FIG. 1, as does clicking on the W3K logo.
The "Browse from Top" link in the standard menu 130 navigates to the top-level browsing page shown in FIG. 3.
The "Select Hierarchy" link navigates to a page where a different context hierarchy can be selected for browsing. There is only one primary public context hierarchy (the one reached from the second choice 102 in FIG. 1). However, individual users and groups of users can set up context hierarchies for their own purposes, without having to worry about fitting into the ever-expanding primary public context hierarchy. If well known "language localization" methods are not available, as preferred, to provide alternate language selection for each Web page in the public hierarchy, alternate hierarchies can be used to support alternate languages. Alternate hierarchies can be designated by their creators as public (anyone can add to it), restricted (anyone can read it, but only the owner(s) can write it), or private (only the owner(s) can read or write it). The owners include the creator and members of any groups listed by the creator as being co-owners.
The "Install Links" link in the standard menu 130 navigates to the form provided for submitting documents for link installation, which will be described further below.
The "Add Subcontext" link navigates to the form provided for creating a new subcontext within the current context. Since FIG. 3 is at the top-level context, this operation is only allowed in a public hierarchy for a select group of "trusted" users.
The "Add Key-Phrase" link leads to the form for adding a new key phrase in the current context. At least one definition link is required when adding a new key phrase. At the top level of the public hierarchy, this operation is restricted to trusted users since any key phrases appearing at the top of the hierarchy are "generic terms" having definitions which are independent of context. Truly global key phrases such as domain names and trademarks are appropriate at the top level.
The "Add Definition" link is for adding a new definition for a key phrase. This entails supplying a URL which points to information about the URL and some other information, as will be later described. Thus, the number of distinct URLs in the set of URLs associated with a particular key phrase can be increased from 1 (its usual initial value) to any number by adding more definitions. The "Add Synonym" link in the standard menu 130 allows the addition of a key phrase to a list of "synonyms" for an existing key phrase. A synonym can also be constructed for a context. Synonyms will be described further below.
"Submit Dictionary File" provides convenient submission of a large numbers of links (key phrases and definitions) as well as the ability to specify context path for each one, as will be described. It is additionally possible to display specified contexts and contexts selected for link installation in the form of a dictionary file. For example, a user can perform a search in order to collect all links contributed by that user, display the results as a dictionary file, save the dictionary file on his or her local computer, perform any desired editing operations, and submit the edited dictionary file back to the server to update his or her links on the server.
"Modify Additions" allows the user to edit (modify or delete) any information he or she submitted to the W3K site. In particular, it is possible to modify link properties, delete a link, delete a context or key-phrase directory wholly owned by the user or user's group, and so on. A user belonging to one or more groups may edit any information submitted by anyone in any of those groups. A set of records to be edited can be created by means of the search facility. A record may hold the information associated with a link, key-phrase directory, or context directory.
"Select Context" selects the "dictionary" associated with the current context for inclusion in subsequent "link installation". The context dictionary normally includes each key phrase in the current context together with at least one definition for each key phrase. It may also include similar information from parent contexts and subcontexts, as will be discussed. Thus, the aggregate dictionary used in link installation is like a kind of "shopping cart" that can be filled with component dictionaries found while browsing around the context hierarchy; in this analogy, "items to be purchased" correspond to the dictionaries to be used in link installation.
It is also possible to assemble various context directories into an aggregate dictionary for link installation without browsing by simply providing a context dictionary list, or by selecting contexts from a number of SELECT menus in HTML listing all available contexts. After the current context is selected, the "Select Context" link changes to "Deselect Context", so that clicking on it takes the current context out of the aggregate link installation dictionary.
Selection configuration information lower in the hierarchy is not modified when excluding a context, so that re-selecting the context allows the contained selection configuration to become active once again. During link-installation (FIG. 7), it is possible to override all such selection information by simply specifying an explicit list 75 of context dictionaries, or selecting "All W3K contexts" in the form entry for contexts 179.
Browser "cookies" are very useful for storing the context search preferences for the user across sessions; since many tend to work in one or a few fields, it is often the case that the contexts used for link installation do not change very often. Browser cookies are simply information stored on the user's computer (the client computer) by the server; cookie files are supported by the major Web browsers such as Netscape Navigator and Microsoft Internet Explorer. If cookie files are not available for any reason (they can be disabled by the user), preference information can be stored on the server indexed by the user's email address, which is unique among users.
"Browse All Selected" places hierarchy browsing in a special mode in which only the currently Selected contexts and links are visible. This can also be reversed so that only deselected contexts are visible. (Sometimes it is helpful to go back and forth.) This feature can help the user more quickly review what link databases ("key-phrases" and "definitions") have been selected for link installation.
"Edit All Selected" is similar to "Modify Additions" except that instead of determining the list of database elements to be edited by using a search (or direct specification), it is initialized from the set of selected links owned by the user and/or groups to which the user belongs.
"What's New" creates a list of all contexts, key phrases, or definitions which have been added recently to the system.
"What's Cool" creates a list of all contexts, key phrases, or definitions which have been receiving relatively high traffic ("hits") recently.
"Top Rated" creates a list of highest ranked links in the database. These are generally excellent home pages, tutorials and the like on various topics.
"Email Updates" allows the user to subscribe to the W3K newsletter.
"Random Link" takes the user to a randomly chosen definition link.
"Search" supports general search for information within the current context and beyond.
Context Path Display
While browsing, the "context path"140 (FIG. 4) is displayed just below the standard menu 130, with each path element separated by a colon `:`. In FIG. 4, for example, the context path is displayed as "Top: Music: Computer Music: Signal Processing: Sound Synthesis." Clicking on the "Lagrange Interpolation" key-phrase 144 in this context takes the browser to the definition page for Lagrange Interpolation shown in FIG. 5.
Search Form
Below the horizontal line in FIG. 3 is a search form. Typing text into the field and clicking on the "Search!" button results in a dynamically generated web page listing all links (in all contexts) matching the search criteria. More refined searches can be carried out by first selecting the "More search options" link. Since links have quite a few properties (to be discussed), searches can be honed rather finely without relying entirely on typical means for selecting a subset of all names and phrases within contexts, key-phrases, and definitions.
Topics under a Context
FIG. 4 displays the contents of the context-path
/Music/Computer_Music/Signal_Processing/Sound_Synthesis.
We see that the "Sound.sub.13 Synthesis" context contains two subcontexts "Acoustic Instruments" 148 and "Vintage Methods" 149.
In addition to subcontexts, there is a list labeled "Words and phrases defined in context Sound Synthesis" 141. (For greater convenience when browsing contexts, browsing can be configured to show only a single link to the key-phrase list on a separate page.) The phrases listed include "Commuted Synthesis" 142, "Physical Modeling" 145, and "Lagrange Interpolation" 144.
Technically, as far as the browsing function is concerned, "words and phrases" (key phrases) are similar to "subcontexts". However, key phrases are browser categories with no subcategories, only links, while contexts are browser categories containing subcategories (either subcontexts or key phrases). The links under a key-phrase are treated as "competing definitions" for that key phrase.
FIG. 8 illustrates the relationships among contexts, key phrases, and definitions. The top level context 180 is the root node of the tree structure defined by the hierarchical link database. There can be any number of subcontexts or key phrases under the top level context 180. In the example of FIG. 8, there are two subcontexts, "Intermediate Context 1" 181 and "Intermediate Context 2" 182. Since these are context directories, they each may contain any number of subcontexts and/or key phrases. In the present example, there are two key phrases 183 and 184 in the first subcontext 181 and one synonym group 185 (two equivalent key phrases) in the second context 182. A key phrase must have at least one definition (link) associated with it. In the present example, "Key Phrase 1" 183 contains three competing definitions 186, "Key Phrase 2" 184 contains four competing definitions 187, while the synonym group 185 consisting of "Key Phrase 3" and "Key Phrase 4" contains two definitions 188 to choose from for that synonym group. Since order is important, we may choose a consistent ordering convention for tree diagrams in which the ordering of all subnodes of a node is defined as left to right in a diagram as in FIG. 8.
Note also in FIG. 4 that the "Sound Synthesis" 141 context includes one synonym 143. This is a context synonym identified by the path
Engineering: Signal Processing: Sound Synthesis
which can be thought of as a different context path to the same place. A context synonym can be thought of as a "symbolic link," in the sense of a UNIX file system, from one "context directory" to another. It is often appropriate for multidisciplinary fields, such as the field of sound synthesis, which belong as a subcontext of more than one high-level context. In link installation, context synonyms can provide what is analogous in computer science as "multiple inheritance", i.e., the dictionaries of multiple parents ("Music" and "Engineering" in this example) can optionally be included automatically in the formation of the aggregate dictionary for link installation, while only the one main context ("Sound Synthesis" in this example) has to be selected for link installation.
To illustrate a "context synonym in FIG. 8, we could add a third subcontext box under the "Top Level Context" box 180 entitled "Intermediate Context 3" which could have a different kind of border to indicate that it is a symbolic link to some other context. We could then draw an arrow from the "Intermediate Context 3" box to its equivalent, such as either "Intermediate Context 1" 181 or "Intermediate Context 2" 182.
FIG. 5 shows a display of two "competing definitions" for the phrase "Lagrange Interpolation" 151. Either of the two links 152 or 153 may be installed in a document containing the phrase "Lagrange Interpolation". They are both named "Lagrange_Interpolation" because that happens to be the title of both Web documents. However, the links point to two different targets on the Web written by two different authors.
The "new" superscript after a link 152 or 153 means it was added relatively recently. In this example, both links for "Lagrange Interpolation" were added on the same day.
The "popular" superscipt for a link 152 or 153 means it has been receiving relatively frequent visits (or "hits") via the W3K site. The number of hits displayed in this case is 0.
Also displayed in FIG. 5 for each link 152 or 153 is the date 155 the link was submitted, the number of hits 156 (number of times a anyone as clicked on the link at this site), a rating 157 for each link (which is 0 since the links were just added), and the number of votes included in each rating (also 0 at the moment). Available elsewhere on the website also is the number of times a link has been installed in Web documents. Finally, there are three links 154 for rating each link (assigning a quality score from 1 to 10 and optionally submitting a more detailed written review), reading the reviews written by others, and viewing all of the link's properties in tabular form. After the rating display is a hyperlink which a user can select in order to contribute a rating or a review of the link.
This completes a first-pass overview of the main pages and selections seen by the user while browsing the link databases. Functions available while browsing will be described further in the following sections.
Adding or Modifying Definitions or Categories
In FIG. 5, the "Add a Definition" link 158 navigates to the form shown in FIG. 6 for adding another definition link for Lagrange Interpolation. The current key phrase "Lagrange Interpolation" is filled into the "Topic" field 161, and the context path leading to the key phrase is filled into to "Context" field 160. This makes it convenient to enter a new source of information (definition) on a topic (key phrase) while browsing.
When "Add a Category" or "Add a Key Phrase" is selected from the top-level context (or "Add . . . " is selected on the main website home page), the "current context" field of the form becomes instead a pop-up HTML "select" list containing all of the contexts presently in the database, making it convenient to quickly select any context in which a new subcontext or key-phrase is to be added.
The only required fields on the add-definition form (FIG. 5) are the URL 162 and user's email address 169. All others are optional.
The URL is the new definition, and it is tested by the server to make sure it is responding. If the Site Title field 163 was left blank, the title of the Web page addressed by the URL, which is automatically retrieved by the server (using the Perl LWP module), is filled in automatically as the link title.
The contributor's email address is required because all submissions to the server in the preferred embodiment are associated with the contributor's email address. However, there are alternative means for identifying users known in the art, such as a more conventional registration procedure in which the user chooses a login name and password. The preferred embodiment ensures that the email address given really reaches the user. If the user is new, an authorization process, described in .sctn.5.1.9, is initiated which tests the user's email address.
While not required, the link contributor is invited to write a short description 164 of the website, specify the minimum 165 and maximum 166 educational level covered at the site (usually done by the author of the site), and specify the type of resource 167 (home page, conference paper, book chapter, or the like). The user may also type in his or her name 168.
Fields such as educational level 165 that are potentially confusing tend to have a "Help" link 62 next to them. For example, the educational level help 62 explains that the numerical value is in units (loosely) of "years of education likely required to understand the material". A minimum level with no maximum level corresponds to setting one level rather than a range of levels. When no educational level at all is provided with the definition, the link server will attempt to compute it automatically based on the level of the links it contains, as will be described. In a script-based submission, finer control is possible using additional level-related properties.
Things like "educational level" and "resource type" are examples of link properties. The context path leading to a link is also one of its properties, as is its URL, title, description, and so on. A link can have more properties than these, some of which will be described below. The "Specify Additional Properties" link 61 takes the user to a larger form where the additional properties can be specified.
When the user is satisfied with the filled in definition-submission form, the submit button 60 can be pressed to send the form to the link database server (a computer at w3k.org in this case). At that point, the server tests the URL by retrieving the first page, checks that the user's email. address is known and that the user's IP address and cookie information match information previously stored on the server (otherwise authorization is carried out), checks for duplication of the key phrase and URL in the given context, possibly checks the URL target for "inappropriate content", assigns an automatic educational level if none was provided (unless automatic level assignment is already scheduled at regular intervals), and adds the new definition to the link database for the current key phrase (and context path, if the database file holds links for multiple key-phrase contexts). If the addition was successful, the user is navigated to a dynamically generated Web page summarizing the information added to the database. If there were any problems, an error page is generated listing the reason(s) for failure to accept the page.
A far quicker means of entering definitions is by means of dictionary file submission which can be regarded as a script-based replacement of the above browser-based interface. An example of such a dictionary file is given in a later section. The form for submitting such a file may be reached via the "Submit Dictionary File" link in the standard menu, or as an option under the "Add to or Edit the W3K" option on the server home page.
Private Context Trees
As mentioned when describing the standard menu 130, known users may optionally create a new top-level context tree which is private to that user or to one or more groups identified by the user. This mode of usage is advantageous for private usage without incurring collisions with links in the main "global" context tree shown in FIG. 3. It is further the only way a known user can submit large quantities of contexts, key phrases, and links by means of a dictionary file submission, since that operation is not permitted in the global public context hierarchy. Further details will be described.
UserAuthorzation
Whenever a user requests an operation on the server requiring information to be stored on the server (any "editing operation"), the user must be "known." Being known means the email address of the user has been given by the user to the server, and the email address has been verified by the server to work (reach the user). When an editing operation of any kind is requested (including the simplest form of link submission, or even a link rating from 1 to 10), if the user is not yet known, an "authorization process" is carried out as a preliminary step in the desired editing operation.
In the authorization process, the user submits his or her email address in a simple Web-page form, and the server (1) emails a randomly generated ASCII string to that email address, and (2) navigates the user to a Web page containing a form for receiving that random string from the user. The form also instructs the user to receive the email and to paste the random string into the second authorization form and submit it. This process verifies that the email address in fact reaches the user.
The email address and IP address of the user are then saved on the server. Additionally, the same information is written on the user's computer using a browser cookie. If the cookie goes away for any reason, or if the user later comes in from a different IP address for which authorization has never occurred (e.g., due receiving a new dynamically assigned IP address from an ISP, or using for the first time a different home computer connected directly to the Internet), authorization is triggered once again when any editing operation is requested. Users coming in over dynamically assigned IP address generally have to be authorized for each session until all such IP addresses have been seen and logged on the server along with the user's email address.
After a successful authorization, the user may use the "Back" button in his or her Web browser to find the page which triggered the authorization process, and resubmit the form successfully.
Link Properties
Many other properties can be specified for a link besides the URL 162 and email address 169. One of the most important properties, brought out in the main form, is educational level 165. Both a minimum 165 and maximum level 166 can be set. When the link-target document is written at a single well-defined educational level, such as ".sub.10 th grade", the min and max can be set to the same value (such as 10), or the max can be left unset (which defaults to level 100, meaning no maximum). When the document spans a wide range of educational levels, such as a well designed "topic home page" might do, the min and max can be set appropriately to cover the estimated range. The minimum level still sets the official "level" used in automatic level assignment for other documents, but the maximum level, if specified, may affect link installation when a specific level range is specified for that. An educational level is implemented a floating-point number, so that a level of 10.5 can be specified, e.g., in the form 165 or 166.
Another important link property, also on the main form, is resource type 167. Resource types include dictionary definition, encyclopedia article, unpublished article, conference paper, talk overheads, refereed journal article, book chapter, book, tutorial, lecture notes, course readers, and the like. Sometimes authors may wish to screen out non-refereed sources such as conference papers or unpublished works. Of course, refereed publications and books will typically be hosted on the website of a publisher, requiring some form of payment for access, such as a site subscription or, preferably, a per-page "micropayment" such as the well known Millicent system provides.
Additional optional properties may be specified on a second form by selecting the "Specify Additional Properties" link 61. Additional properties include source type (individual, educational institution, company, non-profit organization, etc.), geographical location, language (English is assumed by default), "viewer suitability" analogous to `PG-13`, `R`, etc., for movies, a list of groups to be granted editing access, and so on.
Link properties added automatically by the server when installing a link in a database include a unique integer ID, the email address and IP address of the link contributor, the date of submission, an initial rating of zero, an initial zero number of "hits", an initial zero number of "installs" in documents, and the like.
Link properties make it convenient to specify "virtual link database directories" which include only the links satisfying certain criteria specifiable in terms of link properties. For example, a user may ask to see only tutorials and books in a certain educational level range. Alternatively, an author may specify seeing only links belonging to that author's email address, or group. Thus, properties enable selective browsing (or listing) as well as more selective link installation. Such selective browsing may be specified using the Search feature on the site home page 105, standard menu 130, or at the bottom of any Web page seen while browsing.
Link properties may also be usefully included in installed links (within HTML "comments" or in specially defined XML tags) when indirect links are being installed (that is, when the installed link points to a centralized link server which forwards the user's browser to the ultimate destination). Installed link properties may be interpreted by the link server to provide additional control over link behavior. For example, a teacher using Internet documents for a 9.sup.th grade class could configure the link server to suppress all links having an educational level greater than 10. That way, when educational level properties are available for all links, as the present invention provides, documents may be populated with hyperlinks which can be configured not to refer a student to information at a more advanced level than the teacher desires. The teacher may further suppress any links with a viewer suitability rating below a certain value. In summary, installed link properties enable dynamically configurable link behavior based on link property values.
In another use of installed link properties (which requires either browser support and/or local editing of the HTML containing the installed links), link properties can be associated with "classes" in "cascading style sheets" (an add-on to HTML) in order to display links to dictionary definitions in one manner, encyclopedia articles in another manner, and home pages in another, etc.
Restricted Directories
When a subcontext is created, it can be marked as "restricted" to the owner (creating user) or to groups specified by the owner. Restricting a directory prevents anyone but the owner or specified groups from modifying the subdirectory. The restricted directory can optionally be made "invisible" to users other than those having modification rights, in which case the restricted directory is said to be "private". An unrestricted directory is said to be "public". A restricted directory can be deleted or renamed or otherwise reorganized no matter what it contains. Typical uses of restricted directories include
Retaining the ability delete the entire directory and rebuild it with a dictionary file submission.
Supporting a private dictionary corresponding to a particular project, such as a book, in which it is desired to have complete control over all links used in link installation.
The name of a restricted directory has the name of its first group (or owner, if no access groups are defined) automatically appended as a suffix to the name chosen by the owner in order to prevent conflicts with public directories and other restricted directories on the same topic. With this convention, any number of users may have restricted subdirectories on the same topic. For example, in the subdirectory "/Music/Computer_Music/Synthesis/" there could be
Commuted_Synthesis_by_mak@vipunen.hut.fi/
Commuted_Synthesis_by_jos@ccrma.stanford.edu/
In this way, any number of experts may provide their own "packages" of links on the same topic.
A known user may even create a new top-level hierarchy which may be designated public, restricted, or private. User- or group-owned hierarchies of this nature which lie outside the primary public hierarchy may be placed in a special standard menu item entitled "Alternate Universes", e.g., to indicate that they are not a part of the primary public context hierarchy.
Link Ratings and Reviews
When browsing reaches a key-phrase directory, as shown in FIG. 5, following each competing definition 152 or 153 is the hyperlink "Rate It" which navigates to a form where that definition (link) can be rated on a scale from 1 to 10, and/or a written review about that link can be submitted. If the user is not known, an attempt to submit a rating or review routes the user to the authorization page, and after a successful authorization, the rating or review is accepted by the server.
All ratings and reviews are stored on the server along with the email address (and IP address) of the contributor. Only one rating and review are allowed per item per email address, but the user owning the rating or review can modify either at any time. Certain "trusted" users, such as website editors or expert consultants enlisted to help with ratings and reviews, may be given higher weighting in the ratings, and the reviews may be organized by editors according to their quality. Otherwise, the rating system is straightforward and similar in functionality to the five-star rating and review system used at http://www.amazon.com for books.
Link Installation
A primary function of the invention is to facilitate the installation of hyperlinks in documents intended for the World Wide Web. This section provides a detailed description of link installation in the preferred embodiment.
Installed-Link Types
There are at least four alternative ways to install a link in a document.
In the first mode, a hypertext link is installed directly to the top-ranked source of information on the topic identified by the matching key phrase in the user's submitted text. This is the first choice presented in the "Link Type" radio-button-group 177 of the default link installation form (FIG. 7). A disadvantage of this approach is that links often become "stale" due to changing ISPs, changing filenames, etc., requiring the links to be re-installed from time to time. (The link installation server preferably tests all links in its databases periodically and eliminates them if they are unavailable for a prolonged period of time, such as more than a week. When all links containing a bad URL are automatically removed from the databases, all owners of the links are notified automatically by email and invited to submit an updated version of the link(s).)
The second approach is to install an indirect link which links via a centralized server (such as a website providing the link installation service). This choice is provided by the second radio button in the "Link Type" portion 177 of the default link installation form. Such an intermediate website acts as a so-called "proxy server" for the link. Indirect links may always point to the most up-to-date, top-ranked source of information on any given topic. An example URL syntax for this mode of operation is
http://www.w3k.org/jump.cgi?ID=35 where it is assumed that each link has a unique integer identifier on the proxy server, and jump. cgi is a CGI script which is passed the identifier as if it were a form submission in which the form contained a field named "ID" with the value 35. To avoid having to assign unique identifiers across all contexts, the context path can be included in the URL, e.g.,
http://www.w3k.org/jump.cgi?ID=3&PATH=Engineering+Signal_Processing
Context paths can similarly be assigned integer IDs in order to shorten indirect URLs.
A third approach is to insert a link to the "key-phrase page" itself at the centralized server (the page on the server listing all "competing definitions" for that key phrase). This is the third and final choice in the "Link Type" radio group 177. In this case, an end user following such an installed link will see all competing definitions, in ranked order, instead of only one. The end user can then request that the definitions be reorganized according to various criteria such as educational level, document size, type of resource (article, book, etc.), type of source (.edu, .org, .com, etc., individuals, etc.), and so on, by making requests of the server interactively, or by means of preferences registered with the server.
A refinement of the third approach is to build or generate a more helpful "key-phrase home page" on the link server. This page could provide for example, a brief definition, followed by an organized presentation of all available sources of information, organized by type and ranked according to quality in each case. In this format, the casual user may be satisfied with a mere dictionary-style definition, while the serious scholar can more readily pursue a wider variety of sources beyond merely the top-ranked source. Providing interactive reorganization of the definition page according to end user preferences is preferable in this case as well.
A fourth approach is to use JavaScript features to install a snapshot of the key-phrase home page at the time of link installation. In this approach, a JavaScript pop-up menu may hold a list of all competing links for the linked topic.
Example Key-Phrase Home Page Format
Below is an example of how a very simple "key-phrase home page" might be laid out:
TABLE 1
Key Phrase: Dictionary-style definition
Link to highest-rated online encyclopedia-style article
Link to highest-rated online tutorial, if available
Link to highest-rated textbook covering this topic, if any
Link to educational resources (online courses, degree programs, etc.)
Highest rated related links ("See also" type information)
Rank-ordered list of encyclopedia-style links
Rank-ordered list of online tutorials
Rank-ordered list of other online information
Rank-ordered list of contributed links of unknown type
. . .
Last unrated contributed link of unknown type
The link database server preferably provides periodic link testing, average ratings computation, link reordering, automatic educational level assignment, and so on. It is also straightforward for the server to format the key-phrase home page dynamically according to user preferences based on link properties and other criteria. For full generality, it is desirable to customize and differentiate key-phrase home pages on the basis of language, educational level, and other properties. (They are already segregated according to context by the context hierarchy in which they reside.) To address the potential enormity of this task, a mechanism for allowing known and trusted users to submit key-phrase home pages for installation on the server can be provided. For this purpose, the server can provide a template document containing variables that are filled in by the server, in a manner often found in website construction tools.
Link Color
While copious linking makes a set of documents very convenient to navigate among, the high density of links can be distracting to the eye. For this reason, the link installation submission form provides a checkbox for requesting that the hypertext links be set to the same color as the surrounding text. This leaves only an underline to indicate each link. Presumably, future versions of HTML and browsers will allow finer control over the display modes of links, and it may in some cases be possible to offer turning off all visual indications that a link is a link. This is because when links are installed at very high density, such as this invention makes possible, the reader can assume that essentially all nontrivial words are linked. Links become the rule rather than the exception for all "uncommon" words in a document.
Avoiding Links Altogether
In an alternate mode of usage, any word or phrase can be selected in text displayed by the user's browser and "looked up" at a server website containing the link databases. A similar mechanism is currently available in Microsoft Internet Explorer 5: The right-click menu contains an entry "See more with Lycos!" which, when selected, causes the selected phrase (or word last clicked with the mouse) to be looked up in the search engine at the Lycos website (http://www.lycos.com).
In the case of the present invention, in which the database server may act in place of the Lycos website, if the word or phrase is found in the link database, the user may be taken to the page of "competing definitions" (all links) for that topic. If the topic is available in multiple contexts, a list of all distinct contexts can be first displayed, so that the user can select which one he or she had in mind, and then be taken to the definition page in the selected context. If the term is not in the link database but coincides with a context directory name, that directory can be displayed by the browser. As a last alternative, the unrecognized phrase may be forwarded to an ordinary online dictionary (for single words), encyclopedia, or Internet search engine. The link-free look-up mode described in the previous paragraphs can be supported in any number of applications, not just Web browsers. For example, the word processor Microsoft Word already supports looking up an ordinary dictionary definition of a word by selecting the word and choosing the "Define" item in the right-click pop-up menu. Another item in that menu could be "Look it up at the W3K", for example. A link-free look-up service of this nature could be provided in any application which displays text and supports text selection by the end user. The service can be provided either over an Internet connection as described above, or, in the absence of an Internet connection (or supplementary to it), using the single-computer embodiment of the present invention described in .sctn.5.2.
In the preferred embodiment, end users of the link-free lookup service may optionally register with the database server in order to specify preferences such as whether a key-phrase lookup (sans link) should navigate to the key-phrase home page or more directly to the currently highest ranked definition for that key phrase. The user may also inform the server of his or her educational level, desired viewer suitability range, and the like.
To support link-free lookup mode, the database server may accept a URL containing a "virtual form submission" of a link-free lookup form. As a simple example, a lookup request for the phrase "Hubble constant" could be sent to the database server by "navigating" to the URL
http://www.w3k.org/linkfreelookup.cgi?TEXT=Hubble+constant
The CGI script linkfreelookup.cgi runs and may immediately issue a "navigation" output to the highest ranked link matching "Hubble constant", if any. The URL may also include a user name. If user preferences exist, the script may alternatively navigate to a key-phrase page of competing definitions for the Hubble constant, and so on. Additionally, any number link properties may be specified in the URL as well.
Link Installation Form Operation
FIG. 7 shows the default web page for submitting documents to have hypertext links installed by the server. The user pastes text to be "linkified" directly into the "Text or URL" textfield 170. In this example, a URL 77 has been specified, indicating that an entire website is being submitted for link installation, as will be described further below.
Three input submission formats may be specified by the "Input" radio-button group 171: HTML, Plain ASCII, and LaTeX source. In addition, there is a "Help" link 174 which navigates the user to documentation on the relevant considerations for each choice.
In the example of FIG. 7, submission of HTML format is selected in the input-format radio group 171. In the case of "plain ASCII" submission, the output is also normally received in HTML format; this facilitates fast construction of Web pages from simple ASCII text files. It also can be used to quickly obtain a browsable Web directory from a list of keywords generated by other means. Since some HTML editors support "drag and drop" link installation from another document, an automatically generated list of HTML links can be very useful even for manual link entry in an HTML editor.
In the case of LaTeX source format, links are installed in the form of an invocation of the macro .backslash.htmladdnormallink{text} {target}, which is defined in the widely used html.sty LaTeX style file.
When the input format is HTML, it is parsed to prevent accidental replacement of HTML tag data with links. In particular, it is important not to install links within the anchor text of existing links. HTML parsing can be accomplished using the HTML Perl package (see, for example, page 716 of the Perl Cookbook by T. Christiansen and N. Torkington, O'Reilly, 1998). In a similar manner, LaTeX directives are avoided in the text matching algorithm within LaTeX source. (Perl for LaTeX parsing is available in the latex2html Perl script, freely available at http://ctan.tug.org/ctan/.)
Linking is preferably suppressed when the recognized phrase coincides the name of the current section or document, i.e., a phrase that results in a link to the current page.
When "Link only the first occurrence . . . " is selected in the first half 175 "Occurrences" section of the link-installation submission form (FIG. 7), only the first occurrence of the phrase is linked each page (HTML) or section (LaTeX). Otherwise all occurrences are linked.
A second pair of radio buttons 176 exists for specifying that links be installed for either all emphasized words or phrases, or only emphasized words or phrases. Emphasized occurrences may appears as ".backslash.emph{. . .}" in LaTeX and as <I> . . . </I>or <B> . . . </B>in HTML.
The two radio-button-pairs 175 and 176 can be considered to specify "two bits" which select among the following cases:
TABLE 2
00 link all occurrence of a key phrase, whether emphasized
or not;
01 link all occurrences of a key phrase, but only when
emphasized;
10 link the first occurrence of a key phrase in each page
(whether emphasized or not), and all emphasized
occurrences; and
11 link only the first emphasized occurrence of a key
phrase.
As a further special case, any URLs found as plain text in the source are by default converted to links that display their own URLs as anchor text. Many email programs and word processors presently perform this transformation on URLs detected as plain text in received email.
The "Link Type" radio button group 177 selects among three of the basic installed link types discussed in the first subsection of this section.
The "Link Color" select pop-up list 178 provides for link color selection as discussed above. In addition to the standard color names, there is a "take default" selection which does not specify the link color, thereby leaving it to the HTML cascading style sheet or user's browser to choose link color.
The "Contexts" radio group 179 provides some high-level choices of context selection for link installation. The first choice, "All W3K contexts" corresponds to combining all context dictionaries in the entire context hierarchy. As the context hierarchy grows, this can become a computationally expensive option, even when the aggregate dictionary is maintained as an existing file at all times. When a "current context" exists (as result of browsing or user preferences), it and its extensions are preferably listed first in the aggregate dictionary, as will be clarified further below.
The second radio button in the "Contexts" radio group 179 selects only the "current context" (/Music/Computer_Music). The current context is normally established by browsing or by standing user preferences. (When "Install Links" is selected in the standard menu 130 while browsing, the last context displayed in the browser becomes the default current context.) A browser cookie is preferably used to remember the most recent "current context" for each user across sessions.
Installing links from only the current context is not as narrow as it may seem at first since normally the context dictionaries for /Music and `/` (the top-level generic dictionary) are included, as well as all subcontexts of Computer_Music. The two "Extensions" checkboxes 70 provide all-or-nothing control over appending parent and subcontext dictionaries to the current-context dictionary. Additionally, if the subcontext /Music/Computer_Music/Sound_Synthesis is a synonym for /Engineering/Signal_Processing/Sound_Synthesis, say, and if "multiple inheritance" is enabled at all subcontext hierarchy levels (an advanced link installation option), then the context dictionary for all of Signal_Processing and /Engineering would be folded in, at a lower precedence level, of course, since they are listed after all subcontexts of /Music. In summary, the aggregate dictionary list built for link installation by the server can be rather large even when only the current context is selected for link installation.
The third and final option in the "Contexts" radio group 179 is to provide an explicit list of context dictionaries. A list of context dictionaries can be accumulated via browsing in the manner described above, or a list can be submitted dictionary-file format. Additional "virtual context dictionaries" may be defined by means of the Search function, with the search results forming a link subset which can be assigned a name and treated as a dictionary. It is preferable to offer convenient hierarchical browsing of the selected portion of context hierarchy represented by the dictionary list. Any search result may also be displayed as a dictionary file. Dictionary files are discussed more detail below.
Dictionary lists may be stored on the server in a directory devoted to each user or in a file with user's email address forming part of the filename, as shown in the example of FIG. 7. They may also be stored on the user's computer via browser cookies.
The "Min Level" 71 and "Max Level"72 pop-up lists allow specification of a range of educational levels for link installation.
While any number of properties may be associated with links, the top-level default submission form of FIG. 7 for link installation invites link selection according to only a few properties such as context 179 (determined by dictionary selections) and educational level 71,72. Installation specifications based on additional properties may be obtained by following the "Specify Additional Properties" link 174 and filling out a larger form allowing specification according to more criteria, using well known principles of database subset selection according record properties.
When the user presses the "Submit" button 173 (or the submit button of a long-form submission form), the server receives the filled-out form specifying how links are to be installed, processes the submitted text in a CGI Perl script or other server-side software to install the links, and generates output consisting of the user's submitted text with all the new links embedded.
Link Installation on the Server
Actual link installation from an aggregate dictionary by the server, while one of the more complex and resource-demanding operations, is based on well known database technology and methods in computer science for string search and replacement. The Perl language is well suited for this task.
In the preferred embodiment, an aggregate dictionary file is prepared on the server based on the user's link-installation specifications and the current contents of the server's link database. This dictionary file is then "applied" to the user's submitted documents in order to replace key phrases by hypertext links. A Perl script illustrating link installation for HTML files is included in Appendix A.
The preferable details of the methods used depend on the relative sizes of the files involved. For example, if stringent conditions are specified on link properties for installation, and if a large file is submitted from the user, it may be the case that the aggregate link dictionary is much smaller than the combined size of the files submitted for link installation. In this case, it may be fastest to search the submitted file for each link in the aggregate dictionary.
If, on the other hand, the number of eligible links is large (e.g., "All W3K contexts" was selected in the Contexts section 179 of the link installation form), and if the submission itself is small, it may be preferable to search the aggregate dictionary file for each possible key phrase in the input file using well known "incremental search" techniques.
In either case, if the user has specified that only emphasized words or phrases are to be linked, then all phrase boundaries are known, and this can be used to greatly reduce the computational burden of the string-matching task.
Single-Page Submission
For single-page text submitted using the HTML form of FIG. 7, the output HTML may be returned to the user in the form of a "dynamic Web page." That is, the user's browser immediately "navigates" to the automatically generated HTML page as if it were already somewhere on the Web. At that point, the user can select "Save As" in the Web browser in order to save the HTML in a local file, or "View Source" can be selected in the browser to enable copy/pasting of the generated HTML into a text editor for further editing.
Submission of an Entire Website
In an alternative mode of submission, shown in FIG. 7, the user specifies a URL pointing to the submitted document in place of the text of the submission itself. This mode of submission is more convenient for linking entire websites. In a typical configuration, the server processes the submitted file and all files reachable from the first via hyperlinks, provided that the reachable files reside somewhere on the same website (as defined by its URL). In other words, links are followed provided the first portion of the URL matches that of the submitted URL in its entirety.
In the case of URL submission, the processed document is not returned as dynamic HTML, but rather as a hyperlink to a single binary output file on the server containing all the processed files. This output file may be created by combining all processed files into one using the freely available tar program, and further compressed using the freely available gzip program. The tar and gzip programs are available from the GNU Free Software Foundation (http://www.gnu.org/). The output file can then be "downloaded" to the client computer by clicking on the hyperlink pointing to the output file in the dynamically generated HTML. The user then unpacks the file on his or her local computer using, e.g., gunzip and tar, or the shareware program WinZip. As a third alternative, preferred for large submissions or over slow internet connections, the user may specify the URL of a single composite file in "tarred and compressed" format, i.e., created using gnutar and gzip in the same way that the server's output is prepared in the case of multi-file submissions.
The filename extension is used by the link installation server to distinguish between pointers to websites (.html or no extension, indicating a directory) and compressed tar files (.tgz or .tar.gz). If the text appearing in the "Text or URL" textfield of the link-installation submission form starts with "http:", "ftp:", or "gopher:", a URL is assumed.
The following sections will describe further details of the operations indicated above.
Dictionary Search Order
Link installation usually occurs within a "current context" or a list of contexts. In the example described above, the current context may be set according to the location of the browser when "Install Links" was selected by the user. Alternatively, one or more contexts may be set explicitly in a dictionary list provided by the user when filling in the link installation form of FIG. 7.
In the simplest mode, the current context dictionary is searched first for matches in the user-supplied text, and matches are transformed into links. The process is "idempotent" since matches will not occur within the link syntax itself (such as in HTML anchor specifications or LaTeX macro arguments). As a result, dictionary entries are ordered from longest to shortest phrasings, as discussed above.
As described above, the dictionary for the current context is optionally augmented by the union of all lower-level dictionaries within that context. Current-level definitions take precedence over lower-level definitions in any key-phrase collisions. Collisions among lower level dictionaries are not explicitly arbitrated (since that could be accomplished by listing them explicitly), so that the first occurrence of a lower-level definition will take precedence (when not defined at the main level). This follows simply from the convention that "the first match wins".
The purpose of adding in all lower level directories is to provide a reasonably complete dictionary at a high-level node without having to duplicate definitions from lower-level contexts. In principle, such duplication could be avoided by moving all lower-level definitions to the highest possible context. As a simple example, the term "idempotent" is a math term used in many technical fields, and it is not an English-language term (according to the Funk & Wagnalls Standard Desk Dictionary). Therefore, "idempotent" can be defined without conflict in the top-level dictionary for the English language. In practice, however, it works out better to define terms in their "most natural" subcontext, and let their definitions "float up" as far as they can go without collision. Positioning a term within its "most appropriate" context makes the hierarchical dictionary better organized and instructive when browsing.
When an undesired definition is encountered, it can be "fixed" (the first time) by defining the term in the current context, since that will take precedence over all subcontexts and parent contexts. A conflict cannot occur in the current context (in principle) because a context is by definition a name space in which every term has a unique definition. Another solution is to list a specific ordering of lower-level dictionaries so that the first match is the desired one.
After the current-level dictionary is "applied", including all subcontexts, the parent node is normally next in the aggregate dictionary. It is searched for further matches, so that more general terms in the higher context not "overridden" by the lower contexts will be linked to their definitions. This process continues until the top-level context node is reached in the aggregate dictionary.
Note that it is not necessary to create an explicit aggregate dictionary. It is equivalent to instead apply context dictionaries sequentially in the proper order.
As mentioned above, a list of context dictionaries may be specified explicitly in a variety of ways. This is analogous to specifying multiple libraries when linking a computer program. The order of specification is important since the first match is taken. This feature may be used by specifying ancillary fields after the main field of the author. For example, a physics professor might include certain math contexts after the appropriate context(s) within the field of physics.
Maximizing Match Length in Key Phrase String Matching
As discussed above, there may be several forms of a key phrase ("synonyms") corresponding to the same URL. It is normally preferable to match the longest form present in the text so as to avoid multiple generic matches such as
Taylor Series Expansion
when there exists a longer match
Taylor Series Expansion
having a completely different meaning. Maximal-length matching is implemented in the preferred embodiment by maintaining the key phrases in order of longest to shortest and then traversing the dictionary in the prescribed order.
Contributing Links
Link submission support on the server
enables all users to assist in the expansion of the "knowledge tree" represented by the link database dictionaries, and
enables individual users to augment the link installation system to meet their special needs.
For example, a known user can contribute his or her own link database, select only it for search during link installation, and thereby obtain full control over the links which may be installed.
A personal link database can be very useful to the author of a book typeset in LaTeX, for example. Since LaTeX supports the generation of an index file, and since the freely available latex2html Perl script will convert a book index into an HTML page, such an index can be easily and automatically be converted (e.g., in the Emacs text editor) to a dictionary file format acceptable by the server. The entire book can then be processed by the server to install links pointing somewhere into the book for every occurrence of an indexed word in the book. Other links can of course also be included.
Another application of LaTeX index files is to merge the indexes of related books in order to generate a link database for a particular "field," spanning a specific set of resources.
Only known users can submit links and/or create subcontexts or key phrases. All submissions are "owned" by the submitting email address or groups defined by the submitting user. (Email addresses are verified by the authorization process described earlier.) Only the owner, group member, or server webmaster may make changes in submissions (except for their ratings and reviews, of course, which any known user can affect).
Since any number of users may be trying to submit link databases simultaneously, one of many known schemes for "file locking" is needed for the database files and directories during a submission. To avoid periods of database unavailability, submitted public databases can be first prepared in a temporary directory and extensively checked for correctness by the server, including owner checking, name-collision checking, URL validations, format checks, and so on. During this process, the eventual destination directory is preferably write-locked. Since final installation may be carried out by rapidly renaming the two directories, downtime for read access is minimized. Implementing link databases as many files distributed throughout a context directory tree makes database updates simpler, since updates in one context need not affect activities going on in other contexts.
Dictionary File Format
A link database (or dictionary list) may be submitted in a documented ASCII format supported by the server. Since all properties are optional, the submitted file can be as simple as a list of key phrases and their corresponding URLs. Below is a "dictionary file" which can be used to initialize a context hierarchy for the examples seen in the FIGS. 3-5:
GROUPS = CM_DSP
PATH = /Education/Technology
KEY = W3K
URL = http://www.w3k.org
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - -
KEY = / Legal / GNU General Public License
URL = http://www.fsf.org/copyleft/gpl.html
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - -
PATH = /Music/Computer_Music/Signal_Processing/People
KEY = Julius O. Smith III .vertline. Julius O. Smith .vertline. Julius
Smith
URL = http://www-ccrma.stanford.edu/.about.jos/
KEY = JOS
URL = http://www-ccrma.stanford.edu/.about.jos/
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - -
PATH = /Music/Computer_Music/Centers/CCRMA
KEY = CCRMA Courses
URL = http://www-ccrma.stanford.edu/CCRMA/Overview/courses.html
KEY = CCRMA Research
URL = http://www-ccrma.stanford.edu/CCRMA/Overview/research.html
KEY = CCRMA Overview
URL = http://www-ccrma.stanford.edu/CCRMA/Overview/Overview.html
KEY = CCRMA
URL = http://www-ccrma.stanford.edu/
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - -
PATH = /Music/Computer_Music/Sound_Synthesis
SYNM = /Engineering/Signal_Processing/Sound_Synthesis
KEY = Lagrange Interpolation
URL = http://www-ccrma.stanford.edu/.about.jos/Lagrange_Interpolation.html
LEVEL = 12
KEY = Lagrange Interpolation
URL = http://www.acoustics.hut.fi/.about.vpv/publications/vesa_phd.html
KEY = Digital Waveguide Synthesis
URL = http://www-ccrma.stanford.edu/.about.jos/wg.html
KEY = Commuted Synthesis
URL = http://www-ccrma.stanford.edu/.about.jos/book2000/
CommutedSynth.html
KEY = Virtual Analog Synthesis
URL = http://www-ccrma.stanford.edu/.about.jos/VirtualAnalog/
VirtualAnalog.html
KEY = Physical Modeling Synthesis
URL = http://www-ccrma.stanford.edu/.about.jos/pmupd/PMSynthesis.html
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - -
PATH = Music/Computer_Music/Signal_Processing/
Sound_Synthesis/Vintage_Methods
KEY = Additive Synthesis
URL = http://www-ccrma.stanford.edu/.about.jos/SMS_PVC/
AdditiveSynth.html
KEY = Sampling Synthesis
URL = http://www-ccrma.stanford.edu/.about.jos/samplingsynth.html
KEY = Cross-Synthesis
URL = http://www-ccrma.stanford.edu/.about.jos/crosssynth.html
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - -
PATH=
Music/Computer_Music/Signal_Processing/Sound_Synthesis/
Acoustic_Instruments
KEY = Bowed String Synthesis
URL = http://www-ccrma.stanford.edu/.about.jos/book2000/
Bowed_Strings.html
KEY = Brass Synthesis
URL = http://www-ccrma.stanford.edu/.about.jos/pmupd/Brasses.html
Several features of the ASCII dictionary format may be noted:
The GROUP directive lists the names of all groups which share ownership the submitted links. In this example, only one group, CM_DSP, is specified. Group specification is optional.
The PATH directive sets the default context for subsequent entries.
Anything after `#` is interpreted as a "comment" and ignored.
An entry can override the default path by including its own "absolute path" specification, as illustrated by the entry for the "GNU General Public License".
Path components are separated by `/` as is conventional in UNIX file systems. Spaces before and after a `/` are removed by the interpreter, and spaces within KEYs are converted to `_`. (Any number of adjacent "whitespace characters" are converted to a single `_`.)
The SYNM directive declares a synonym for the current default context. In this example, /Music/Computer_Music/Sound_Synthesis is declared to be synonymous with /Engineering/Signal_Processing/Sound_Synthesis.
KEY synonyms may be declared in a single entry by separating them with vertical bars `.vertline.`.
KEY synonyms may also be created by specifying the same URL in two different entries (as in the JOS entry).
Order is important: The phrases "CCRMA Overview" and "CCRMA Research" will be transformed into links before the word "CCRMA", as a result of the ordering shown.
The only example of "competing definitions" in this dictionary is the case of "Lagrange Interpolation".
The first entry for "Lagrange Interpolation" is accompanied by an education level range specification using the LEVEL directive. It is set to 12 indicating that a high-school senior (at least one on the "math track") should be able to fully understand the main thrust of it. Alternatively, a minimum and maximum educational level could have been specified using the MIN_LEVEL and MAX_LEVEL directives. The arbitrarily set maximum value of 100 means "no maximum". Level ranges are more appropriate for "home pages" and the like which link to a variety of documents at a variety of educational levels.
Only trusted users can submit links and contexts wholesale in this manner to the link database server. However, any known user can submit such a set of links to a restricted or private directory. Otherwise, known users are allowed to submit one link at a time using the "Add a Resource" submission form described earlier.
If there are any pre-existing links in the same context directory with the same name and URL as a newly submitted link, the pre-existing link is retained unless the new submission is by the same owner. (Link properties could be updated or added in this manner, for example. Ratings and reviews are not affected since they may not be submitted in a dictionary file.) Rejected submissions are listed in a message from the server delivered in a dynamic web page, as is typical. Similar action is taken for other kinds of messages to the user as needed.
If the specified context directory does not exist, it is created, and the email address of the creating user is logged as its owner. The server automatically installs an encoding of the owner's email address in each link entry by means of an additional link property. Other properties, such as initial ratings, date-of-submission, etc., are installed by the server. Only the owner or group-member or server webmaster may modify an existing link or directory.
Similar submission protocols can perform editing operations which would otherwise be laborious over the browser-based user interface described above, such as deleting a database subdirectory and all its contents (provided, of course, that everything to be deleted is owned by the person or group making the request). For example, the directives
DELETE_LINK /Physics/Quantum_Mechanics/Planck's_Constant
DELETE_PATH /Physics/Quantum_Mechanics/Schroedinger's_Wave_Equation/ can be used in place of online interactive editing of the server link database. In general, there is preferably a script-style equivalent for all operations performable interactively via a graphical user interface such as Web browsers provide. In addition to performing the operations more quickly and conveniently, script-style alternative interfaces are very important for the visually impaired. Scripting also provides a means of conveniently resubmitting all links contributed by the user, thereby making it convenient for users to maintain "back-ups" of their submissions in a form that can be easily restored on the link-installation server. Browsing and Search features can be used to obtain a dictionary-file display of all links owned by the user.
There may be a limit placed on the number of database links and subdirectories that can be submitted by any one user (email address) or group. This is to guard against accidents, malicious "hacking," and to facilitate editorial tracking of contributed content. A certain amount of automatic checking for inappropriate content is possible, based on searching link targets for inappropriate words. Users can apply for "trusted" status by sending email to the server webmaster or other authorized agent. Trusted users may be given a higher contribution limit and perhaps also a higher weighting in link ratings. A group of users can be formed in which each member is trusted within that group.
Use of Dictionary File Format to Specify Context Lists and Dictionary Lists
When specifying a list of context dictionaries for link installation, it is convenient to be able to use dictionary file format. When used in this way, all PATH directives in the file are extracted to form a list of contexts. If any link sare specified for a particular context PATH, then only those links will be eligible for installation. Additional directives are provided which correspond to the options available for context dictionary specification, such as include parents, include subcontexts, and allow multiple inheritance. For convenience, these aggregate-dictionary-building directives are ignored when submitting a dictionary file as a means of submitting links.
Using the previous example dictionary file now to specify an aggregate dictionary for link installation gives results equivalent to the following dictionary file:
GROUPS = CM_DSP # Only operative if selecting based on group
PATH = /Education/Technology
PATH = /Music/Computer_Music/Signal processing/People
PATH = /Music/Computer_Music/Centers/CCRMA
PATH = /Music/Computer_Music/Sound_Synthesis
SYNM = /Engineering/Signal_Processing/Sound_Synthesis
PATH = Music/Computer_Music/Signal Processing/
Sound_Synthesis/Vintage_Methods
PATH=
Music/Computer_Music/Signal_Processing/Sound_Synthesis/
Acoustic_Instruments
Adding some typical directives and eliminating one redundant
specification leads to
GROUPS = CM_DSP # Only operative if selecting based on group
PATH = /Music/Computer_Music
MULTIPLE_INHERITANCE_DEPTH = 2
OWNERS_ONLY
MIN_LEVEL = 12
MAX_LEVEL = 100
SUITABILITY = PG-13 # Movie and V-chip names understood
SOURCE = ALL
TYPE = Refereed
PATH = /Education/Technology
Several features of this aggregate-dictionary specification may be noted:
MULTIPLE_INHERITANCE_DEPTH=1 means that the context dictionaries of linked parents are appended to the aggregate dictionary for context synonyms occurring 1 level below the current context or less. This is just sufficient to pick up the "engineering parents" of context Sound_Synthesis without also including linked parents of lower levels.
The OWNERS_ONLY directive restricts the aggregate dictionary to links owned by members of group CM_DSP.
The minimum and maximum educational level restrict link installation to links rated at 12.sup.th grade or higher.
Source "ALL" means any source. Other choices include EDUCATIONAL (.edu), COMMERCIAL (.com), and so on. As usual, multiple sources can be separated by vertical bar `.vertline.`.
The TYPE is resource type. "Refereed" is a symbol for all refereed source types (journal article, book, etc.) If no type was specified by the contributor, it is UNKNOWN.
Order is important: The listed contexts will be appended in the order given, with the first one listed being considered the "current context".
Dictionary combining directives as shown in this example are "sticky", meaning that they apply also to subsequently listed context paths unless they are explicitly reset, or set to "NIL" indicating no value (to obtain the system default behavior).
Security Considerations
The IP address is stored as well as the verified email address for security reasons. A user with "root privileges" on a personal machine can generate any number of return email addresses, while the number of IP addresses available to an individual is usually very limited. For example, if unusually many email addresses are found to belong the same IP address, a warning can be automatically emailed to the webmaster who can look into the matter further, such as by inspecting all contributions from that IP address. If an IP address turns out to belong to a malicious "hacker", it is straightforward using standard UNIX tools to eliminate all database entries and directories associated with that IP address, barring it from further contributions, and so on. When the IP address is dynamic, as is often the case when a commercial Internet Service Provider (ISP) hosts the user's account, it is less likely that many different email addresses will belong to the same person, and the ISP can be contacted for assistance. Note that it is very easy to arbitrarily set the "From:" field in any email message; therefore, the "Received" fields in received email may be analyzed by the server to get closer to the true originating location. In Netscape Navigator, for example, select "View/Headers/All" to see such fields in received email.
Link Database Implementation
Each link database may be implemented on the server as a plain ASCII file in a directory structure that corresponds to the hierarchical organization of the link databases.
The complete hierarchy can also be implemented in a single file which contains path information for each link entry. The initial prototype of the present invention used a single link database file based on the links-2.0 software scripts from Gossamer Threads, Inc. (http://www.gossamer-threads .com/scripts/links/). In this implementation, the context path information is included in what is called a "link category". In adapting the links-2.0 scripts, categories having no sub-categories are considered to be "key phrases", and actual links within a key-phrase (bottom-level category) are treated as "competing definitions".
For a variety of reasons, use of a single links database file is not considered the best mode of carrying out the present invention. Instead, a hierarchical file system implementation is preferred in which the directory path corresponds to the context, and the database file in a context directory contains only links for that context (along with perhaps a limited number of subcontexts).
Alternatively, an evolutionary path can be followed starting out with a single database file, followed by splitting into separate database files for top-level contexts, followed by further splits as the files grow too large, etc. (The links-2.0 system advises a limit of 10,000 links for its one-file link database system managed by Perl CGI scripts.) On each split, the first path component stored in the link database may be removed since it becomes implied by the directory in which the database file resides.
A database directory may contain both files and directories. Subdirectories are interpreted as subtopics, and the hypertext links for the current directory (when it is a key-phrase directory) may reside within a single ASCII file named "links.txt", for example, preferably located in a context directory containing the key phrase. The links.txt file contains a list of hypertext links for the current context in a plain ASCII format described below.
There may be a temporary "system file" for each active user which lists current selections and other state information pertaining to that user. Multiple selection configurations may be stored on client computers by means of the "cookie" mechanism supported by the major Web browsers. The name of a user's configuration file may include the user's email address, if known, and otherwise an arbitrarily assigned session ID for "unknown" users. All active sessions preferably time out after a period of inactivity, as is commonly implemented by websites featuring session management.
There may be a system file ratings.txt, parallel to links.txt in each directory, containing all contributed ratings for the links in links.txt. Information stored in ratings.txt for each link includes the email address of each contributor, and the contributed rating. When a new rating is contributed, an entry is appended to ratings.txt. If there is already a rating from that email address, it is replaced with the new one. A new average rating is computed, and the updated average rating and contributor count are entered into links.txt as properties for the affected link.
Another system file, reviews.txt, also parallel to links.txt, resides in each directory and contains all contributed "link reviews". Information stored for each link includes the email address of each contributor, and the contributed review. When a new review is contributed, it is appended to reviews.txt, replacing any previous review from that email address.
Link Database Details
Links may be stored on the server in the following simple ASCII text-file format:
ID .vertline. KEY .vertline. URL .vertline. PropertyName:Value .vertline.
PropertyName:Value .vertline. . . .
ID .vertline. KEY .vertline. URL .vertline. PropertyName:Value .vertline.
PropertyName:Value .vertline. . . .
. . .
This format uses explicit property names which are convenient when specifying sparse subsets of all possible properties (and also more clear for describing the invention). An alternative is the use of a fixed-format record in which the property names are implied by their field position within the record.
The ID is a unique integer assigned to the database record. The ID therefore uniquely identifies the record and can be used to identify it in various contexts, such as in the URL for indirect links.
For example, a link to a Web page about the "W3K" website could appear in the link database (in one long line which is broken for clarity below) as
23 .vertline. W3K .vertline. http://www.w3k.org .vertline.
Date:2-Sep-99 .vertline.
Context : /Education/Technology/W3K .vertline.
Level:All .vertline. Rating:5 .vertline. RatingCount:7
.vertline.
Hits: 20 .vertline. Installs: 4 .vertline.
Owner : Julius Smith .vertline.
Group : CM_DSP .vertline.
OwnerEmail : jos@w3k.org
In addition to link databases, there is preferably a user database holding information such as a list of IP addresses authorized for that email address, whether the user wants to receive the W3K newsletter, the list of groups to which the user belongs (being a "trusted user" means belonging to the "trusted" group), and information logging any inappropriate use of the service such as submitting offensive links. (See the system for dealing with "trolls" at http://www.slashdot.org for an example system.)
Example Link Properties
Example PropertyNames and their meanings are as follows:
TABLE 3
Property Meaning
Level Educational level of the link, if not a range (1-100, All)
MinLevel Lower bound of educational level range, if applicable
MaxLevel Upper bound of educational level range, if applicable
FullTitle Contents of URL's HTML <TITLE> tag in quoted string
Description Description of link by submitting user
Date Date link was submitted by user
Type Type of information (Encyclopedia, Tutorial, Book,
Course, . . .
Language English, French, German, Spanish, . . .
Suitability Similar to rating system used in the "V chip" for
television
Context Context path (when handling many contexts per database
file)
Synonyms List of equivalent phrases separated by `.vertline.`. Order is
important.
Hits Number of times link accessed by browsing
Installs Number of documents link has been installed in
Rating Quality rating as a number from 1 to 10
RateCount Number of users contributing ratings
isNew 1 if Date is sufficiently recent
isPopular 1 if Hits is large relative to other links
OwnerEmail Email address of link contributor
ReceiveMail 1 if link contributor wants our newsletter
Groups List of owning groups separated by `.vertline.`
User1 Property defined by user
User2 Property defined by user
. . . . . .
The properties can be used to limit the range of links installed by a link installation. For example, a certain educational level range can be specified, or links only of a certain type may be specified. Restriction to links contributed by the owner or owning group is also easily specified.
KEYs will match occurrences of any case by default. When a link is installed in a user's document, the user's original case is preserved in the anchor text. KEYs may be entered in singular form since the string matching algorithm will match will ignore a trailing `s`. A KEY is either a simple word or a phrase consisting of words separated by underbars, e.g., Funk_&_Wagnalls_Knowledge_Center. A word may not contain certain "meta-characters" such as ".vertline." or "#" which have system meanings, and all such meta-characters are stripped out by a regular expression (in Perl) on input. Similarly, context names must be "legal" UNIX file names after whitespace has been converted to underbars `_`, since the preferred embodiment uses a UNIX directory tree corresponding to at least part the context hierarchy. Restriction to legal filenames is easily relaxed by encoding the directory names in hexadecimal, as an example, or using the special character encodings of HTML. The string matching algorithm used in link installation "folds" the input case to "lower" and replaces underbars and hyphen with spaces in string comparisons. As a result, KEYs in text submitted for link installation can have any case and can include underbars, hyphens, or spaces separating words in the keyword phrases, yielding the same matching results in all such cases. In the above example, the link name is functionally equivalent "funk wagnalls knowledge center" for purposes of string matching. To include special characters where necessary, names may be quoted, as in
`Funk & Wagnalls Knowledge Center`
In the case of quoted names, string-matching is exact. Other details regarding string matching for link installation may be seen in the example of Appendix A.
Single-computer Implementation
The present invention can be adapted equally well to single-computer operation, requiring no network connection. In this case, the user can install a link database application from a CD-ROM, for example, in the fashion typical of many software products for personal computers. All functions formerly described as being provided by a Web browser and the remote link-installation and database server can be provided by the installed application. A Web version, if available, can serve to provide a supplementary collection of links.
There are several advantages to this mode of operation:
Since all data and software are local, response time can be greatly improved relative to use over the Internet.
A link database application may take advantage of native graphical user interface (GUI) facilities on the personal computer, which are typically more advanced than the platform-independent HTML and Web-browser facilities.
Since link database extensions may occur on the local hard disk instead of on a remote website, security requirements are alleviated, and user privacy is enhanced, especially for "private" database directories.
The link databases are not constantly changing, particularly the ratings, thereby automatically giving repeatable results on repeated link installations.
The link databases can be customized by manually setting alternative link orderings, and eliminating unwanted alternative links.
The following implementation differences apply to the single-computer embodiment:
Instead of one master link database directory, there may be two parallel link database directory trees having a common directory structure. The first may be "read only" so that it can be distributed and used on a CD-ROM, for example, while the second is "writable" and contains any, user-developed databases, as well as the temporary "system files" generated during use of the system. The writable directory tree will normally reside on a local hard disk.
In operation, the writable directory is searched first so that it takes precedence over the read-only directory,
Logically, the links.txt files in the writable and read-only directory images are treated as one file, with the read-only version being appended to the writable version.
Links on the CD-ROM may be "deleted" by adding a corresponding entry for them in the writable directory tree consisting of exactly the same keyword or phrase, the same URL, and the single property "DELETED". Read-only directories cannot be deleted or renamed, but they can be excluded from link searches in the normal way (which applies also to the corresponding directory in the writable tree, if any, since they are logically the same directory).
Link database updates may be obtained over the Internet and installed locally to keep the single-computer software up to date. To facilitate this process, it is convenient to maintain on the server listings of database directories and contents for each software release. During an update, the server can traverse the link database directory, compare against the listing applicable to the user's current release, and generate an incremental update to bring the user up to the latest state. The incremental update is installed in the writable database directory on the user's local computer, automatically shadowing any older corresponding information on the CD-ROM. Updates may be obtained at any time to obtain the latest links. Information can be stored locally on the user's machine to enable each update to be incremental relative to the previous update as opposed to the latest official release.
URLs submitted in the "Text or URL" textfield of the link-installation submission form may also include "file:" type URLs.
It may occur that the user has locally extended the link database in a way that conflicts with the server's extensions since the time, of the user's release or last update. The directory path, keyword or phrase, and URL all have to be identical to create a link conflict, and so actual conflicts can only occur in link properties. Link rankings can of course change at any time, and this is normal. However, since locally installed ranking information may be a rating override by the user (rather than the result of a previous upgrade), it is not necessarily correct to overwrite the locally installed rating properties. Similarly, other properties may have been added by the user to fine tune link installation results. During installation of the incremental update, the user may be given a choice of whether or not to accept conflicting information from the incremental update on a link by link, or property by property basis. The default action may of course be to avoid overwriting any user-developed information, and the default upgrade can proceed in this mode. In the default mode, all conflicting links can be installed in a third parallel directory tree for later inspection by the user. Another means for avoiding conflicts is to rename any pre-existing directories containing user modifications (by adding a private suffix to its directory name, say) before carrying out an update.
Educational Levels
The educational level of a definition is a number indicating how advanced the material is. Authors generally wish to minimize the educational level as much as possible consistent with the intended audience, the material being presented, and the desired length of the document.
Every definition (link) is assigned an educational level. A normalized educational level may be provided manually by the link contributor as a number between 0 and 100, with the number being loosely interpreted as "years of education likely required", for someone specializing in the subject. When no manual assignment is made by the link contributor, a level is automatically computed which interpolates the manually assigned levels that do exist.
Automatic Assignment of Educational Levels
The automatically assigned level of a definition is computed by first computing an integer "raw level" for the definition based purely on an analysis of definition interdependencies, followed by the computation and assignment of a "normalized level" which maps each raw level to the pre |