|
|
|
Structured document (e.g., HTML, SGML, ODA, CDA) |
Network distribution and management of interactive video and multi-media containers6573907
Abstract
Interactive interfaces to video information provide a displayed view of a quasi-object called a root image. The root image consists of a plurality of basic frames selected from the video information, arranged such that their respective x and y directions are aligned with the x and y directions in the root image and the z direction in the root image corresponds to time, such that base frames are spaced apart in the z direction of the root image in accordance with their time separation. The displayed view of the root image changes in accordance with a designated viewing position, as if the root image were a three-dimensional object. The user can manipulate the displayed image by designating different viewing positions, selecting portions of the video information for playback and by special effects, such as cutting open the quasi-object for a better view. A toolkit permits interface designers to design such interfaces, notably so as to control the types of interaction which will be possible between the interface and an end user. Implementations of the interfaces including editors and viewers are also disclosed.
Claims
What is claimed is:
1. A method of delivering video over a network, comprising:
receiving video data representing a video sequence;
generating a hyper-media container containing data associated with the video data;
storing the video data;
storing the hyper-media container;
providing the video data and the hyper-media container available over the network to a remote user.
2. The method as set forth in claim 1, wherein generating the hyper-media container comprises providing annotations to the video data.
3. The method as set forth in claim 1, wherein generating the hyper-media container includes providing segmentation data associated with the video data.
4. The method as set forth in claim 1, further comprising controlling access to the video data and the hyper-media container.
5. The method as set forth in claim 4, wherein controlling access comprises controlling access to annotations.
6. The method as set forth in claim 4, wherein controlling access comprises controlling access to annotation packs.
7. The method as set forth in claim 4, wherein controlling access comprises controlling access to versions of annotations.
8. The method as set forth in claim 1, wherein the providing the video data and the hyper-media container available to the remote user comprises publishing the video data and the hyper-media container.
9. The method as set forth in claim 1, wherein the providing the video data and the hyper-media container available to the remote user includes distributing at least the hyper-media container.
10. The method as set forth in claim 9, wherein the distributing comprises providing the hyper-media container available on-demand.
11. The method as set forth in claim 9, wherein the distributing comprises streaming the video data to the remote user over the network.
12. The method as set forth in claim 9, wherein the distributing comprises immerse streaming the video data to the remote user over the network.
13. The method as set forth in claim 9, wherein the distributing comprises broadcasting the hyper-media container over the network to the remote user.
14. The method as set forth in claim 1, further comprising indexing the video data.
15. The method as set forth in claim 1, further comprising receiving modifications to one of the hyper-media container and the video data from the remote user and modifying the corresponding one of the hyper-media container and video data.
16. The method as set forth in claim 1, further comprising allowing the remote user to collaborate with other remote users on at least one of the video data and the hyper-media container.
17. The method as set forth in claim 16, further comprising maintaining version control of modifications to at least one of the video data and the hyper-media container.
18. The method as set forth in claim 1, wherein generating the hyper-media container comprises including an identification of a location for the video data associated with the hyper-media container.
19. The method as set forth in claim 1, wherein generating the hyper-media container comprises including an identifier for the video data associated with the hyper-media container.
20. The method as set forth in claim 1, wherein the data in the hyper-media container that is associated with the video data comprises an identifier for a data object.
21. The method as set forth in claim 1, wherein the receiving comprises receiving the video data over the network.
22. The method as set forth in claim 1, wherein the receiving comprises receiving the video data from a second remote user.
23. The method as set forth in claim 1, further comprising enabling the remote user to send the hyper-media container directly to a second remote user.
24. A method of delivering video over a network, comprising:
receiving video data representing a video sequence;
generating a hyper-media container containing data associated with the video data;
storing the video data;
storing the hyper-media container;
providing the video data and the hyper-media container available over the network to a remote user;
wherein generating the hyper-media container includes analyzing the video data and associating results of the analyzing with the hyper-media container.
25. The method as set forth in claim 24, wherein the analyzing comprises selecting an object from within the video.
26. The method as set forth in claim 24, wherein the analyzing includes extracting an object from within the video.
27. The method as set forth in claim 24, wherein the analyzing includes ranking frames of the video.
28. The method as set forth in claim 24, wherein the analyzing includes analyzing camera motion.
29. The method as set forth in claim 24, wherein the analyzing includes generating zooming effects.
30. The method as set forth in claim 24, wherein the analyzing includes generating scripts effects.
31. The method as set forth in claim 24, wherein the analyzing includes generating special effects.
32. A method of delivering video over a network, comprising:
receiving video data representing a video sequence;
generating a hyper-media container containing data associated with the video data;
storing the video data;
storing the hyper-media container;
providing the video data and the hyper-media container available over the network to a remote user;
the method further comprising:
receiving modifications to one of the hyper-media container and the video data from the remote user and modifying the corresponding one of the hyper-media container and video data; and
publishing versions of the modifications from the remote user to other remote users.
Description
The present invention relates to network distribution and management of video and, more particularly, to distribution and management of interactive video and multi-media containers. The network distribution and management includes, but is not limited to, managing media files, creating and authoring media containers, publishing and indexing media containers, searching and browsing media containers, and distributing media containers.
BACKGROUND AND SUMMARY
Video information is being produced at an ever-increasing rate and video sequences, especially short sequences, are increasingly being used, for example, in websites and on CD-ROM, and being created, for example, by domestic use of camcorders. There is a growing need for tools enabling the indexing, handling and interaction with video data. It is particularly necessary for interfaces to be provided which enable a user to access video information selectively and to interact with that information, especially in a non-sequential way.
Conventionally, video information consists of a sequence of frames recorded at a fixed time interval. In the case of classic television signals, for example, the video information consists of 25 or 30 frames per second. Each frame is meaningful since it corresponds to an image which can be viewed. A frame may be made up of a number of interlaced fields, but this is not obligatory as is seen from more recently proposed video formats, such as those intended for high definition television. Frames describe the temporal decomposition of the video image information. Each frame contains image information structured in terms of lines and pixels, which represent the spatial decomposition of the video.
In the present document, the terms "video information" or "video sequences" refer to data representing a visual image recorded over a given time period, without reference to the length of that time period or the structure of the recorded information. Thus, the term "video sequence" will be used to refer to any series of video frames, regardless of whether this series corresponds to a single camera shot (recorded between two cuts) or to a plurality of shots or scenes.
Traditionally, if a user desired to know what was the content of a particular video sequence he was obliged to watch as each frame, or a sub-sample of the frames, of the sequence was displayed successively in time. (For purposes of this document, the terms "he," "him," or "his" are used for convenience in place of she/he, her/him and hers/his, and are intended to be gender-neutral.) This approach is still wide-spread, and in applications where video data is accessed using a personal computer, the interface to the video often consists of a displayed window in which the video sequence is contained and a set of displayed controls similar to those found on a video tape recorder (allowing fastforward, rewind, etc.).
Developments in the fields of video indexing and video editing have provided other forms of interface to video information.
In the field of video indexing, it is necessary to code information contained in a video sequence in order to enable subsequent retrieval of the sequence from a database by reference to keywords or concepts. The coded content may, for example, identify the types of objects present in the video sequence, their properties/motion, the type of camera movements involved in the video sequence (pan, tracking shot, zoom, etc.), and other properties. A "summary" of the coded document may be prepared, consisting of certain representative frames taken from the sequence, together with text information or icons indicating how the sequence has been coded. The interface for interacting with the video database typically includes a computer input device enabling the user to specify objects or properties of interest and, in response to the query, the computer determines which video sequences in the database correspond to the input search terms and displays the appropriate "summaries". The user then indicates whether or not a particular video sequence should be reproduced. Examples of products using this approach are described in the article "Advanced Imaging Product Survey: Photo, Document and Video" from the journal "Advanced Imaging", October 1994, which document is incorporated herein by this reference.
In some video indexing schemes, the video sequence is divided up into shorter series of frames based upon the scene changes or the semantic content of the video information. A hierarchical structure may be defined. Index "summaries" may be produced for the different series of frames corresponding to nodes in the hierarchical structure. In such a case, at the time when a search is made, the "summary" corresponding to a complete video sequence may be retrieved for display to the user who is then allowed to request display of "summaries" relating to sub-sections of the video sequence which are lower down in the hierarchical structure. If the user so wishes, a selected sequence or sub-section is reproduced on the display monitor. Such a scheme is described in EP-A-0 555 028 which is incorporated herein by this reference.
A disadvantage of such traditional, indexing/searching interfaces to video sequences is that the dynamic quality of the video information is lost.
Another approach, derived from the field of video editing, consists of the "digital storyboard". The video sequence is segmented into scenes and one or more representative frames from each scene is selected and displayed, usually accompanied by text information, side-by-side with representative frames from other segments. The user now has both a visual overview of all the scenes and a direct visual access to individual scenes. Each representative frame of the storyboard can be considered to be an icon. Selection of the icon via a pointing device (typically a mouse-controlled cursor) causes the associated video sequence or sub-sequence to be reproduced. Typical layouts for the storyboards are two-dimensional arrays or long one-dimensional strips. In the first case, the user scans the icons from the left to the right, line by line, whereas in the second case the user needs to move the strip across the screen.
Digital storyboards are typically created by a video editor who views the video sequence, segments the data into individual scenes and places each scene, with a descriptive comment, onto the storyboard. As is well-known from technical literature, many steps of this process can be automated. For example, different techniques for automatic detection of scene changes are discussed in the following documents, each of which is incorporated herein by reference:
"A Real-time neural approach to scene cut detection" by Ardizzone et al, IS&T/SPLE-Storage & Retrieval for Image and Video Databases IV, San Jose, Calif.
"Digital Video Segmentation" by Hampapur et al, ACM Multimedia '94 Proceedings, ACM Press-1
"Extraction of News Articles based on Scene Cut Detection using DCT Clustering" by Ariki et al, International Conference on Image Processing, September 1996, Lausanne, Switzerland;
"Automatic partitioning of full-motion video" by HoncJiang Zhang et al, Multimedia Systems (Springer-Verfaa, 199')), 1, pages 10-28-, and
EP-A-0 590 759.
Various methods for automatically detecting and tracking persons and objects in video sequences are considered in the following documents, each of which is incorporated herein by reference:
"Modeling, Analysis and Visualization of Nonrigid Object Motion", by T. S. Huang, Proc. of International Conf. on Pattern Recognition, Vol. 1, pp 361-364, Atlantic City, N.J., Jun. 1990- and
"Segmentation of People in Motion" by Shio et al, Proc. IEEE, vol. 79, pp 325332, 1991. Techniques for automatically detecting different types of camera shot are described in
"Global zoom/pan estimation and compensation for video compression" by Tse et al, Proc. ICASSP, Vol.4, pp 2725-2728, May 1991; and
"Differential estimation of the global motion parameters zoom and pan" by M. Hoetter, Signal Processing, Vol. 16, pp 249-265, 1989.
In the case of digital storyboards too, the dynamic quality of the video sequence is often lost or obscured. Some impression of the movement inherent in the video sequence can be preserved by selecting several frames to represent each scene, preferably frames which demonstrate the movement occurring in that scene. However, storyboardtype interfaces to video information remain awkward to use in view of the fact that multiple actions on the user's part are necessary in order to view and access data.
Attempts have been made to create a single visual image which represents both the content of individual views making up a video sequence and preserves the context, that is, the time-varying nature of the video image information.
One such approach creates a "trace" consisting of a single frame having superimposed images taken from different frames of the video sequence, these images being offset one from the other due to motion occurring between the different frames from which the images were taken. Thus, for example, in the case of a video sequence representing a sprinter running, the corresponding "trace" will include multiple probably overlapping) images of the sprinter, spaced in the direction in which the sprinter is running. Another approach of this kind generates a composite image, called a "salient still", representative of the video sequence--see "Salient Video Stills: Content and Context Preserved" by Teodosio et al, Proc. ACM Multimedia 93, California, Aug. 1-6, 1993), pp 39-47 which article is incorporated herein by this reference in its entirety.
Still another approach of this general type consists in creation of a "video icon", as described in the papers "Developing Power Tools for Video Indexinor and retrieval" by Zhang et al, SPIE, Vol.2185, pp 140-149-, and "Video Representation tools using a unified object and perspective based approach" by the present inventors, IS&T/SPIE Conference on Storage and Perusal for Image and Video Databases, San Jose, Calif., February 1995 which are incorporated herein by reference.
In a "video icon", as illustrated in FIG. 1A, the scene is represented by a number of frames selected from the sequence and which are displayed as if they were stacked up one behind the other in the z-direction and are viewed in perspective. In other words, each individual frame is represented by a plane and the planes lie one behind the other with a slight offset. Typically the first frame of the stack is displayed in its entirety whereas underlying frames are partially occluded by the frames in front. The envelope of the stack of frames has a parallelepiped shape. The use of a number of frames, even if they are partially occluded, gives the user a more complete view of the scene and, thus, a better visual understanding. Furthermore, with some such icons, the user can directly access any frame represented in the icon.
Two special types of video icon have been proposed, "object based" video icons and video icons containing a representation of camera movement. In an "object based" video icon, as illustrated in FIG. 1B, objects of interest are isolated in the individual frames and, for at least some of the stacked frames, the only image information included in the video icon is the image information corresponding to the selected object. In such a video icon, at least some of the individual frames are represented as if they were transparent except in the regions containing the selected object. Video icons containing an indication of camera movement may have, as illustrated in the example of FIG. 1C, a serpentine-shaped envelope corresponding to the case of side-to-side motion of the camera.
The video icons discussed above present the user with information concerning the content of the whole of a video sequence and serve as a selection tool allowing the user to access-frames of the video sequence out of the usual order. In other words, these icons allow non-sequential access to the video sequence. Nevertheless, the ways in which the user can interact with the video sequence information are strictly limited. The user can select frames for playback in a non-sequential way but he has little or no means of obtaining a deeper level of information concerning the video sequence as a whole, short of watching a playback of the whole sequence.
The present invention provides a novel type of interface to video information which allows the user to access information concerning a video sequence in a highly versatile manner. In particular, interactive video interfaces of the present invention enable a user to obtain deeper levels of information concerning an associated video sequence at positions in the sequence which are designated by the user as being of interest.
The present invention provides an interface to information concerning an associated video sequence, one such interface comprising:
information defining a three-dimensional root image, the root image consisting of a plurality of basic frames selected from said video sequence, and/or a plurality of portions of video frames corresponding to selected objects represented in the video sequence, x and y directions in the root image corresponding to x and y directions in the video frames and the z direction in the root image corresponding to the time axis whereby the basic frames are spaced apart from one another in the z direction of the root image by distances corresponding to the time separation between the respective video frames;
means for displaying views of the root image;
means for designating a viewing position relative to said root image; and
means for calculating image data representing said three-dimensional root image viewed from the designated viewing position, and for outputting said calculated image data to the displaying means.
According to the present invention, customized user interfaces may be created for video sequences. These interfaces comprise a displayable "root" image which directly represents the content and context of the image information in the video sequence and can be manipulated, either automatically or by the user, in order to display further image information, by designation of a viewing position with respect to the root image, the representation of the displayed image being changed in response to changes in the designated viewing position. In a preferred embodiment of the present invention, the representation of the displayed image changes dependent upon the designated viewing position as if the root image were a three-dimensional object. In such preferred embodiments, as the designated viewing position changes, the data necessary to form the displayed representation of the root image is calculated so as to provide the correct perspective view given the viewing angle, the distance separating the viewing position from the displayed quasi-object and whether the viewing position is above or below the displayed quasi-object.
In a reduced form, the present invention can provide non-interactive interfaces to video sequences, in which the root image information is packaged with an associated script defining a routine for automatically displaying a sequence of different views of the root image and performing a set of manipulations on the displayed image, no user manipulation being permitted. However, the full benefits of the invention are best seen in interactive interfaces where the viewing position of the root image is designated by the user, as follows. When the user first accesses the interface he is presented with a displayed image which represents the root image seen from a particular viewpoint (which may be a predetermined reference viewpoint). As he designates different viewing angles, the displayed image represents the root image seen from different perspectives. When the user designates viewing positions at greater or lesser distances from the root image, the displayed image increases or reduces the size and, preferably, resolution of the displayed information, accessing image data from additional video frames, if need be.
The customized, interactive interfaces provided by the present invention involve displayed images, representing the respective associated video sequences, which, in some ways, could be considered to be a navigable environment or a manipulable object. This environment or object is a quasi-three-dimensional entity. The x and y dimensions of the environment/object correspond to true spatial dimensions (corresponding to the x and y directions in the associated video frames) whereas the z dimension of the environment/object corresponds to the time axis. These interfaces could be considered to constitute a development of the "video icons" discussed above, now rendered interactive and manipulable by the user.
With the interfaces provided by the present invention, the user can select spatial and temporal information from a video sequence for access by designating a viewing position with respect to a video icon representing the video sequence. Arbitrarily chosen oblique "viewing directions" are possible whereby the user simultaneously accesses image information corresponding to portions of a number of different frames in the video sequence. As the user's viewing position relative to the video icon changes, the amount of a given frame which is visible to him, and the number and selection of frames which he can see, changes correspondingly.
As mentioned above, the interactive video interfaces of the present invention make use of a "root" image comprising a plurality of basic frames arranged to form a quasi-three dimensional object. It is preferred that the relative placement positions of the basic frames be arranged so as to indicate visually some underlying motion in the video sequence. Thus, for example, if the video sequence corresponds to a travelling shot moving down a hallway and tuning a comer, the envelope of the set of basic frames preferably does not have a parallelepiped shape but, instead, composes a "pipe" of rectangular section and bending, in a way corresponding to the camera travel during filming of the video sequence.
In preferred embodiments of video interfaces according to the present invention, the basic video frames making up the root image are chosen as a function of the amount of motion or change in the sequence. For example, in the case of a video sequence corresponding to a travelling shot, in which the background information changes, it is preferable that successive basic frames should include back-round information overlapping by, say, 50%.
In certain embodiments of the present invention, the root image corresponds to an "object-based video icon." In other words, certain of the basic frames included in the root image are not included therein in full; only those portions corresponding to selected objects are included. Alternatively, or additionally, certain basic frames may be included in full in the root image but may include "hot objects," that is, representations of objects selectable by the user. In response to selection of such "hot objects" by the user, the corresponding basic frames (and, if necessary, additional frames) are then displayed as if they had become transparent at all portions thereof except the portion(s) where the selected object or objects are displayed. The presence of such selectable objects in the root image allows the user to selectively isolate objects of interest in the video sequence and obtain at a glance a visual impression of the appearance and movement of the objects during the video sequence.
The interfaces of the present invention allow the user to select an arbitrary portion of the video sequence for playback. The user designates a portion of the video sequence which is of interest, by designating a corresponding portion of the displayed image forming part of the interface to the video sequence. This portion of the video sequence is than played back. The interface may include a displayed set of controls similar to those provided on a VCR in order to permit the user to select different modes for this playback, such as fast-forward, rewind, etc.
In preferred embodiments of interfaces according to the invention, the displayed image forming part of the interface remains visible whilst the designated portion of the sequence is being played back. This can be achieved in any number of ways, as for example, by providing a second display device upon which the playback takes place, or by designating a "playback window" on the display screen, this playback window being offset with respect to the screen area used by the interface, or by any other suitable means.
The preferred embodiments of interfaces according to the invention also permit the user to designate an object of interest and to select a playback mode in which only image information concerning that selected object is included in the playback. Furthermore, the user can select a single frame from the video sequence for display separately from the interactive displayed image generated by the interface.
In preferred embodiments, the interfaces of the present invention allow the user to generate a displayed image corresponding to a distortion of the root image. More especially, the displayed image can correspond to the root image subjected to an "accordion effect", where the root image is "cracked open", for example, by bending around a bend line so as to "fan out" video frames in the vicinity of the opening point, or is modified by linearly spreading apart video frames at a point of interest. The accordion effect can also be applied repetitively or otherwise in a nested fashion according to the present invention.
The present invention can provide user interfaces to "multi-threaded" video sequences, that is, video sequences consisting of numerous interrelated shorter segments such as are found, for example, in a video game where the user's choices change the scene which is displayed. Interfaces to such multi-threaded video sequences can include frames of the different video segments in the root image, such that the root image has a branching structure. Alternatively, some or all of the different threads may not be visible in the root image but may become visible as a result of user manipulation. For example, if the user expresses an interest in a particular region of the video sequence by designating a portion of a displayed root image using a pointing device (such as a mouse, or by touching a touch screen, etc.) then if multiple different threads of the sequence start from the designated area, image portions for these different threads may be added to the displayed image.
In preferred embodiments of interfaces according to the present invention, the root image for the video sequence concerned is associated with information defining how the corresponding displayed image will change in response to given types of user manipulation. Thus, for example, this associated information may define how many, or which additional frames are displayed when the user moves the viewing position closer up to the root image. Similarly, the associated information may identify which objects in the scene are "hot objects" and what image information will be displayed in relation to these hot objects when activated by the user.
Furthermore, different possibilities exist for delivering the components of the interface to the end user. In an application where video sequences are transmitted to a user over a telecommunications path, such as via the Internet, the user who is interested in a particular video sequence may first download only certain components of the associated interface. First of all he downloads information for generating a displayed view of the root image, together with an associated application program (if he does not already have an appropriate "interface player" loaded in his computer). The downloaded (or already-resident) application program includes basic routines for chancing the perspective of the displayed image in response to changes in the viewing position designated by the user. The application program is also adapted to consult any "associated information" (as mentioned above) which forms part of the interface and conditions the way in which the displayed image changes in response to certain predetermined user manipulations (such as "zoom-in" and "activate object"). If the interface does not contain any such "associated information" then the application program makes use of pre-set default parameters.
The root image corresponds to a particular set of basic video frames and information designating relative placement positions thereof. The root image information downloaded to the user may include just the data necessary to create a reference view of the root image or it may include the image data for the set of basic frames (in order to enable the changes in user viewing angle to be catered for without the need to download additional information). In a case where the user performs a manipulation which requires display of video information which is not present in the root image (e.g. he "zooms in" such that data from additional frames is required), this extra information can either be pre-packaged and supplied with the root image information or the extra information can be downloaded from the host website as and when it is needed.
Similar possibilities exist in the case of interfaces provided on CD-ROM. In general, the root image and other associated information will be provided on the CD-ROM in addition to the full video sequence. However, it is to be understood that, for reasons of space saving, catalogues of video sequences could be made consisting solely of interfaces, without the corresponding full video sequences.
In addition to providing the interfaces themselves, the present invention also provides apparatus for creation of interfaces according to the present invention. This may be dedicated hardware or, more preferably, a computer system programmed in accordance with specially designed computer programs.
Various of the steps involved in creation of a customized interface according to the present invention can be automated. Thus, for example, the selection of basic frames for inclusion in the "root image" of the interface can be made automatically according to one of a number of different algorithms, such as choosinbg one frame every n frames, or choosing 1 frame every time the camera movement has displaced the background by m%, etc. Similarly, the relative placement positions of the basic frames in the root image can be set automatically taking into account the time separation between those frames and, if desired, other factors such as camera motion. Similarly, the presence of objects or people in the video sequence can be detected automatically according to one of the known algorithms (such as those discussed in the references cited above), and an "object oriented" root image can be created automatically. Thus, in some embodiments, the interface creation apparatus of the present invention has the capability of automatically processing video sequence information in order to produce a root image. These embodiments include means for associating with the root image a standard set of routines for changing the representation of the displayed image in response to user manipulations.
However, it is often preferable actively to design the characteristics of interactive interfaces according to the invention, such that the ways in which the end user can interact with the video information are limited or channeled in preferred directions, This is particularly true in the case of video sequences which are advertisements or are used in educational software and the like.
Thus, the present invention provides a toolkit for use in creation of customized interfaces. In preferred embodiments, the toolkit enables a designer to tailor the configuration and content of the root image, as well as to specify which objects in the video sequence are "hot objects" and to control the way in which the displayed interface image will change in response to manipulation by an end user. Thus, among other things, the toolkit enables the interface designer to determine which frames of the video sequence should be used as basic frames in the root image, and how many additional frames are added to the displayed image when the user designates a viewing position close to the root image.
According to another aspect, the invention relates to network ditribution and management of interactive video and multi-media containers. A need exists for methods and systems for transmitting video and other multi-media files across a network, such as the Internet. U.S. Pat. No. 5,956,716 to Kenner et al. provides an example of a system and method for the delivery of video data over a computer network. In Kenner, a user uses a multimedia terminal to send a request for video clips from a database. A local storage and retrieval module receives and processes video clip requsts and a primary index manager causes the distribution of video clips among a plurality of extended storage and retrieval modules. The extended storage and retrieval modules store a plurality of databases including those that contain video clips. A data sequencing interface directs the extended storage and retrieval module to download the requested video clips. The video clips are then downloaded to the multimedia terminal via the local storage and retrieval module.
Systems and methods according to the invention provide for the network distribution and management of interactive video and multi-media containers. Systems and methods not only can distribute video and other multi-media files but they can also distribute multi-media containers. Consequently, users would be able to access information concerning the mult-imedia files in a highly versatile manner. Systems and methods according to the invention also enable for the transmission of information both to and from the users. Thus, systems and methods according to the invention provide for colloboration between users. For instance, work performed by one user in indexing or in providing annotations is not restricted to just that user but can be shared with others having access to the multi-media file. Other advantages and benefits of the invention are provided in the following description and will be apparent to those skilled in the art.
BRIEF DESCRIPTION OF THE DRAWINGS
Further features and advantages of the present invention will become apparent from the following description of preferred embodiments thereof, given by way of example, and illustrated by the accompanying drawings, in which:
FIG. 1 illustrates various types of video icon, wherein FIG. 1A shows an ordinary video icon, FIG. 1B shows an object-based video icon and FIG. 1C shows a video icon including a representation of camera motion;
FIG. 2 is a block diagram indicating the components of an interactive interface according to a first embodiment of the present invention;
FIG. 3 is a diagram illustrating the content of the interface data file (FDI) used in the first embodiment of the invention;
FIG. 4 is a diagram illustrating a reference view of a root image and three viewing, positions designated by a user;
FIG. 5 illustrates the displayed image in the case of the root image viewed from the different viewing positions of FIG. 4, wherein FIG. 5A represents the displayed image from viewing position A, wherein FIG. 5B represents the displayed image from viewing position B, and wherein FIG. 5C represents the displayed image from viewing position C;
FIG. 6 illustrates displayed images based on more complex root images according to the present invention, in which FIG. 6A is derived from a root image visually representing motion and FIG. 6B is derived from a root image visually representing a zoom effect;
FIG. 7 illustrates the effect of user selection of an object represented in the displayed image, in a second embodiment of interface according to the present invention;
FIG. 8 illustrates a user manipulation of a root image to produce an "according effect";
FIG. 9 illustrates a displayed image corresponding to a view of a branching root image associated with a multi-threaded scenario;
FIG. 10 is a flow diagram indicating steps in a preferred process of designing an interface according to the present invention;
FIG. 11 is a schematic representation of a preferred embodiment of an interface editor unit according to the present invention;
FIG. 12 is a schematic representation of a preferred embodiment of an interface viewer according to the present invention;
FIG. 13 is a block diagram of a network according to a preferred embodiment of the invention;
FIG. 14 is a more detailed diagram showing interaction between runtime components and both the database components and client applications;
FIG. 15 is a more detailed diagram showing interaction between video components and both OBVI components and database components;
FIG. 16 is a more detailed diagram showing interaction between OBVI components and both runtime components and database components;
FIG. 17 is a diagram of an object annotation comprised of two objects;
FIG. 18 is an example of a screen shot prompting a user for a site to publish an OVI;
FIG. 19A is a block diagram illustrating interaction between a Site Manager and services;
FIG. 19B is a block diagram illustrating a preferred schema for providing access control to a site;
FIG. 19C is a block diagram illustrating network protocols used during site management;
FIG. 20 is a block diagram depicting hierarchy of modules within the network architecture;
FIG. 21 is an example of a screen shot for providing Video Analysis and Measuring Tool (VAMT) Services;
FIG. 22 is an example of a screen shot of an interface with a VAMT Manager;
FIG. 23 is an example of a screen shot for adding a new job with the VAMT Manager;
FIG. 24 is an example of a screen shot for setting a job priority with the VAMT Manager;
FIG. 25 is an example of a screen shot illustrating a first step in creating an OIS database;
FIG. 26 is an example of a screen shot illustrating a second step in creating an OIS database;
FIG. 27A is an example of a screen shot illustrating a third step in creating an OIS database;
FIG. 27B is an example of a screen shot illustrating a fourth step in creating an OIS database;
FIG. 28 is an example of a screen shot illustrating a fifth step in creating an OIS database;
FIG. 29 is an example of a screen shot illustrating a sixth step in creating an OIS database;
FIG. 30 is an example of a screen shot illustrating a seventh step in creating an OIS database;
FIG. 31 is an example of a screen shot illustrating a eighth step in creating an OIS database;
FIG. 32 is an example of a screen shot illustrating a ninth step in creating an OIS database;
FIG. 33 is an example of a screen shot illustrating a tenth step in creating an OIS database;
FIG. 34 is an example of a map showing locations of units and services involved in a specific site;
FIG. 35 is an example of a screen shot of an interface to an Obvious Management Console, and more particularly showing an administration session on a video realm;
FIG. 36 is an example of a screen shot of the interface in FIG. 35 showing a contextual menu;
FIG. 37 is an example of a screen shot of the interface in FIG. 35 when "Edit Media" is selected;
FIG. 38 is an example of a screen shot of the interface in FIG. 35 showing an administration session on the security realm;
FIG. 39 is a diagram of interfaces between an Obvious Indexing Engine and filters;
FIG. 40 is an example of a screen shot illustrating a first step in which a user launches the publishing and indexing of OVI files;
FIG. 41 is an example of a screen shot illustrating a second step in which a user launches the publishing and indexing of OVI files;
FIG. 42 is an example of a screen shot illustrating a third step in which a user launches the publishing and indexing of OVI files;
FIG. 43 is an example of a screen shot illustrating a first choice a user has in the screen shot of FIG. 42;
FIG. 44 is an example of a screen shot illustrating a fourth step in which a user launches the publishing and indexing of OVI files;
FIG. 45 is an example of a screen shot illustrating a fifth step in which a user launches the publishing and indexing of OVI files;
FIG. 46 is an example of a screen shot illustrating a sixth step in which a user launches the publishing and indexing of OVI files;
FIG. 47 is an example of a screen shot illustrating a basic search screen;
FIG. 48 is an example of a screen shot illustrating a simple search;
FIG. 49 is an example of a screen shot illustrating an advanced search;
FIG. 50 is an example of a screen shot illustrating a main window for an Obvious Multicaster;
FIG. 51 is an example of a screen shot for adding a new channel; and
FIG. 52 is an example of a screen shot for configuring a channel.
DETAILED DESCRIPTION
I. Interactive Interface
The components of an interactive interface according to a first preferred embodiment of the present invention will now be described with reference to FIG. 2. In this example, an interactive interface of the invention is associated with video sequences recorded on a CD-ROM.
As shown in FIG. 2, a CD-ROM reader 1 is connected to a computer system including a central processor portion 2, a display screen 3, and a user-operable input device which, in this case, includes a keyboard 4 and a mouse 5. When the user wishes to consult video sequences recorded on a CD-ROM 7, he places the CD-ROM 7 in the CD-ROM reader and activates CD-ROM accessing software provided in the central processor portion 2 or an associated memory or unit.
According to the first embodiment of the invention, the CD-ROM has recorded thereon not only the video sequence image information 8 (in any convenient format), but also a respective interface data file (FDI.sub.i) 10 for each video sequence, together with a video interface application program 11. The content of a typical data file is illustrated in FIG. 3. Respective scripts 12 are optionally associated with the interface data files. When data on the CD-ROM is to be read, the video interface application program 11 is operated by the central processor portion 2 of the computer system and the interface data file applicable to the video sequence selected by the user is processed in order to cause an interactive video icon (see, for example, FIGS. 4 and 5) to be displayed on the display screen 3. The user can then manipulate the displayed icon, by making use of the mouse or keyboard input devices, in order to explore the selected video sequence.
The types of manipulations of the interactive video icon which are available to the user will now be described with reference to FIGS. 4 to 9.
FIG. 4 illustrates a simple interactive video icon according to the present invention. In particular, this video icon is represented on the display screen as a set of superposed images arranged within an envelope having the shape of a regular parallelepiped. Each of the superposed images corresponds to a video frame selected from the video sequence, but these frames are offset from one another. It may be considered that the displayed image corresponds to a cuboid viewed from a particular viewing position (above and to the right, in this example). This cuboid is a theoretical construct consisting of the set of selected video frames disposed such that their respective x and y axes correspond to the x and y axes of the cuboid and the z axis of the cuboid corresponds to the time axis. Thus, in the theoretical construct cuboid, the selected frames are spaced apart in the z direction in accordance with their respective time separations in the video sequence.
When the user seeks to explore the video sequence via the interactive video icon displayed on the display screen, one of the basic operations he can perform is to designate a position on the screen as a viewing position relative to the displayed image (e.g. by "clicking" with the computer mouse). In FIG. 4, three such designated viewing positions are indicated by the letters A, B and C. In response to this operation by the user, the displayed image is changed to the form shown in FIG. 5: FIGS. 5A, 5B and 5C correspond to "viewing positions" A, B and C, respectively, of FIG. 4. The image displayed to the user changes so as to provide a perspective view of the theoretical cuboid as seen from an angle corresponding to the viewing position designated by the user.
The above-mentioned cuboid is a special case of a "root image" according to the present invention. This "root image" is derived from the video sequence and conveys information concerning both the image content of the selected sub-set of frames (called below, "basic frames") and the relative "position" of that image information in time as well as space. It is to be appreciated that the "root image" is defined by information in the interface data file. The definition specifies which video frames are "basic frames" (for example, by storing the relevant frame numbers), as well as specifying the placement positions of the basic frames relative to one another within the root image.
The central processor portion 2 of the computer system calculates the image data required to generate the displayed image from the root image definition contained in the appropriate interface data file, image data of the basic frames (and, where required, additional frames) and the viewing position designated by the user, using, standard ray-tracing techniques. The data required to generated the displayed image is loaded into the video buffer and displayed on the display screen.
According to the present invention it is preferred that, when the user designates a viewing position close up to the interactive video icon, the image information in the area of interest should be enriched. This is achieved by including, in the displayed image, image data relating to additional video frames besides the basic video frames. Such a case is illustrated in FIG. 5B, where the basic frames BF5 and BF6 are displayed together with additional frames AF1 and AF2. As the user-designated viewing position approaches closer and closer to the displayed image the video interface application program causes closely spaced additional frames to be added to the displayed image. Ultimately, successive video frames of the video sequence may be included in the displayed image. As is clear from FIG. 5B, image information corresponding to parts of the root image distant from the area of interest may be omitted from the displayed "close-up" image.
Preferably, the interface data file includes data specifying how the choice should be made of additional frames to be added as the user "moves close up" to the displayed image. More preferably, this data defines rules governing the choice of how many, and which, additional frames should be used to enrich the displayed image as the designated viewing position changes. These rules can, for example, define a mathematical relationship between the number of displayed frames and the distance separating the designated viewing position and the displayed quasi-object. In preferred embodiments of the invention, the number of frames which are added to the display as the viewing position approaches the displayed quasi-object depends upon the amount of motion or change in the video sequence at that location.
The example illustrated in FIG. 4 is a simplification in which the displayed image corresponds to a root image having a simple, cuboid shape. However, according to the present invention, the root image may have a variety of different forms.
For example, the relative placement positions of the basic frames may be selected such that the envelope of the root image has a shape which reflects motion in the corresponding video sequence (either camera motion, during tracking shots and the like, or motion of objects represented in the sequence)--see the corresponding interactive icon shown in FIG. 6A. Similarly, the dimensions of the basic frames in the root image may be scaled so as to visually represent a zoom effect occurring in the video sequence -see the corresponding interactive icon shown in FIG. 6B.
It will be seen that the interactive icon represented in FIG. 6B includes certain frames for which only a portion of the image information has been displayed. This corresponds to a case where an object of special interest has been selected. Such object selection can be made in various ways. If desired, the root image may be designed such that, instead of including basic frames in full, only those portions of frames which represent a particular object are included. This involves a choice being made, at the time of design of the root image portion of the interface, concerning which objects are interesting. The designer can alternatively or additionally decide that the root image will include basic frames in full but that certain objects represented in the video sequence are to be "selectable" or "extractable" at user request. This feature will now be discussed with reference to FIG. 7.
FIG. 7A illustrates an initial view presented to a user when he consults the interface for a particular selected video sequence. In this sequence two people walk towards each other and their paths cross. The designer of the interface has decided that the two people are objects that may be of interest to the end user. Accordingly, he has included, in the interface data file, information designating these objects as "extractable". This designation information may correspond to x, y co-ordinate range information identifying the position of the object in each video frame (or a subset of frames).
If the user expresses an interest in either of the two objects, for example, by designating a screen position corresponding to one of the objects (e.g. by "clicking" on the left-hand person using the right-hand mouse button), then the interface application program controls the displayed image such that extraneous portions of the displayed frames disappear from the display, leaving only a representation of the two people and their motion, as shown in FIG. 7B. Thus, the objects of interest are "extracted" from their surroundings. The "missing" or transparent portions of the displayed frames can be restored to the displayed image at the user's demand (e.g. by a further "click" of the mouse button).
It is to be understood that, according to the present invention, interfaces may be designed such that particular "extractable" objects may be extracted simultaneously with some or all of the other extractable objects, or they may be extracted individually. Sophisticated interfaces according to the present invention can incorporate object-extraction routines permitting the user to arbitrarily select objects visible in the displayed view of the root image, for extraction. Thus, for example, the user may use a pointing device to create a frame around an object visible in a displayed view of the root image and the application program then provides analysis routines permitting identification of the designated object in the other basic frames of the root image (and, if required, in additional frames) so as to cause display of that selected object as if it were located on transparent frames.
It may be desirable to allow the user to obtain a close-up view of a particular portion of the interactive video icon in a manner which does not correspond to a strict perspective view of the reion concerned. Preferred embodiments of interface according to the invention thus provide a so-called "accordion" effect, as illustrated in FIG. 8. When the user manipulates the icon by an "accordion" effect at a particular point, the basic frames in the vicinity of the region of interest are spread so as to provide the user with a better view. Further, preferably, the function of displaying additional frames so as to increase detail is inhibited during the "accordion" effect.
In the case of "multi-threaded" video sequences, such as are traditionally found in video-based computer games and educational software and involve parallel video subsequences which are accessed alternatively depending upon the user's choices, these too can be the subject of interfaces according to the present invention. In such a case, the interface designer may choose to include frames from different parallel video subsequences in the interface's root image in order to give the user an idea of the different plot strands available to him in the video sequence. FIG. 9 illustrates an interactive video icon derived from a simple example of such a root image.
Alternatively, or additionally, the designer may create secondary root images for the respective sub-sequences, these secondary root images being used to generate the displayed image only when the user designates a viewing position close to the video frame where the sub-sequence begins. In the case of interfaces to such computer games or educational software, this is a logical choice since it is at the point where the video sub-sequence branches from the main sequence that user choices during playing of the game, or using of the educational software, change the experienced scenario.
Another manipulation which it is preferable to include in interfaces according to the invention is the traditional set of displayed VCR controls which permit the user to playback the video sequence with which the displayed video icon is associated. Furthermore, the user can select for playback portions or frames within the sequence by, for example, "clicking" with the mouse button on the frames of interest as displayed in the interactive video icon. The video playback can take place on a separate display screen or on a window defined on the display screen displaying the video icon.
As mentioned above, a particular video sequence may be associated with an interface data file and a script. The script is a routine defined by the interface designer which leads the user through the use of the interface. The script can, for example, consist of a routine to cause an automatic demonstration of the different manipulations possible of the displayed quasi-object. The user can alter the running of the script in the usual way, for example by pausing it, slowing it down, etc.
The script may, if desired, include additional text, sound or graphic information which can be reproduced in association with the displayed view of the root image either automatically or in response to operations performed by the end user. Script functionality according to the present invention allows creation and editing of viewing scenarios that may be subsequently be played, in part or in whole, automatically, or interactively with user inputs. For example, in a completely automatic mode, the user can cause the scenario to begin to play by itself and take the user through the scenario and any associated information by simply reading the scenario and changing the view. In other situations the script may call for interaction by the user, such as to initiate a transaction. In this case the user may be asked to specify information, e.g. if he wants to purchase the video or any other items associated with what has been viewed. In yet other situations the editor may leave visible tags which when activated by the user will cause some information to be displayed on the display device; e.g. associated text, graphics, video, or sound files which are played through the speakers of the display device. In certain cases these tags are attached to objects selected and extracted from the video sequence, such as so-called "hot objects" according to the present invention.
FIG. 10 is a flow diagram illustrating typical stages in the design of an interface according to the present invention, in the case where a designer is involved. It is to be understood that interfaces according to the present invention can also be generated entirely automatically. It will be noted that the designer's choices affect, notably, the content of the interface data file. It is to be understood, also, that not all of the steps illustrated in FIG. 10 are necessarily required--for example, steps concerning creation of secondary root images can be omitted in the case of a video sequence which is not multithreaded. Similarly, it may be desirable to include in the interface design process certain supplementary steps which are not shown in FIG. 10. Thus, for example, it is often desirable to include in the interface data file (as indicated in the example of FIG. 3) information regarding the camera motion, cuts, etc. present in the video sequence. During use of the interface, this information can permit, for example, additional video frames to be added to the displayed image and positioned so as to provide a visual representation of the camera motion. During the interface design process the information on the characteristics of the video sequence can be determined either automatically (using, known cut-detection techniques and the like) and/or may be specified by the interface designer. It may also be desirable to include in the interface data file information which allows the sequence, or scripting for it, to be indexed and retrieved. Preferably, the interface or sequence is accessed using such information applied according to a traditional method, such as standard database query language or through a browser via a channel or network; the interface data may be downloaded in its entirety or fetched on an as needed basis.
The present invention provides toolkitd for use by designers wishing to create an interactive video interface according to the present invention. These toolkits are preferably implemented as a computer program for running on a general purpose computer. The toolkits present the designer with displayed menus and instructions to lead him through a process including steps such as the typical sequence illustrated in FIG. 10.
The designer first of all indicates for which video sequence he desires to create an interface, for example by typing in the name of a stored file containing the video sequence information. Preferably, the toolkit accesses this video sequence information for display in a window on the screen for consultation by the designer during the interface design process. In such preferred embodiments of the toolkit, the designer may make his selection of basic frames/objects for the root image, extractable objects and the like by stepping slowly through the video sequence and, for example, using a mouse to place a cursor on frames or portions of frames which are of interest. The toolkit logs the frame number (and x, y locations of regions in a frame, where appropriate) of the frames/frame portions indicated by the designer and associates this positional information with the appropriate parameter being defined. Preferably, at the end of the interface design process the designer is presented with a displayed view of the root image for manipulation so that he may determine whether any changes to the interface data file are required.
Different versions of the application program can be associated with the interface data file (and script, if present) depending upon the interface functions which are to be supported. Thus, if no script is associated with the interface data file, the application program does not require routines handling the running of scripts. Similarly, if the interface data file does not permit an accordion effect to be performed by the end user then the application program does not need to include routines required for calculating display information for such effects. If the interface designer believes that the end user is likely already to have an application program suitable for running interfaces according to the present invention then he may choose not to package an application program with the interface data file or else to associate with the interface data file merely information which identifies a suitable version of application program for running this particular interface.
The present invention has been described above in connection with video sequences stored on CD-ROM. It is to be understood that the present invention can be realized in numerous other applications. The content of the interface data file and the elements of the interface which are present at the same location as the end user can vary depending upon the application.
For example, in an application where a video sequence is provided at a web-site, the user may first download via his telecommunications connection just the interface data file applicable to the sequence. If the user does not already have software suitable for handling manipulation of the interactive video icon then he will also download the corresponding application program. As the user manipulates the interactive video icon, any extra image information that he may require which has not already been downloaded can be downloaded in a dynamic fashion as required.
This process can be audited according to the present invention if desired. The user's interaction with the interface can be audited, and he can interact with the transaction/audit functionality for example to supply any information required by a script which may then be recorded and stored. Depending upon the application, the transaction/audit information can be stored and made available for externally (optional) located auditing and transaction processing facilities/applications. In a typical situation, the auditing information can be transmitted at the end of a session whereas the transaction information may be performed on-line, i.e. the transaction information is submitted during the session. Real time transmission can also occur according to the present invention, however.
Another example is the case of a catalogue on CD-ROM including only interfaces rather than the associated video sequences, in order to save space. In such a case, rather than including a pointer to the image information of the basic frames of the root image, the interface data frame includes the image information. Some additional image information may also be provided.
The following disclosure relates to a preferred implementation according to the present invention, with reference to FIGS. 11 and 12.
A. Interface Editor Unit
Editors, readers and viewers according to the present invention can be implemented in hardware, hardware/software hybrid, or as software on a dedicated platform, a workstation, a personal computer, or any other hardware. Different units implemented in software run on a CPU or graphics boards or other conventional hardware in a conventional manner, and the various storage devices can be general purpose computer storage devices such as magnetic disks, CD-ROMs, DVD, etc.
With reference to FIG. 11, the editor connects to a database manager (101) and selects a video document and any other documents to be included in the interface by using a data chooser unit (102). The database manager may be implemented in various ways; e.g., as a simple file structure or even as a complete multimedia database. The data storage (100) contains the video data and any other information/documents required and can be implemented in various modes; e.g., in a simple stand-alone mode of operation it could be a CD-ROM or in a networked application it could be implemented as a bank of video servers. Typically the user operating through the user interaction unit (120) is first presented a list of available videos or uses a standard database query language to choose the desired video and then chooses any other documents required.
The creation of an interface using the editor is discussed below in three phases: (1) Analysis, (2) Visual layout and (3) Effects creation.
1. Analysis.
The video document chosen by the editor is first processed by the activity measure unit (103). The activity measure unit is responsible for computing various parameters related to the motion and changes in the video. This unit typically will implement one of a number of known techniques for measuring changes, e.g., by calculating the statistics of the differences between frames, by tracking objects in motion, or by estimating camera motions by separating foreground and background portions of the image. In other implementations this unit may use motion vector information stored in an MPEG-encoded sequence to detect important frames of activity in the video document. The activity measures template store is optional but would contain templates which can be used to calculate the frame ranking measure and could be specified by the user through the user interaction unit.
These parameters are then used to calculate a frame ranking measure which ranks the different frames as to whether they should be included in the interface. The frame ranking measure is derived heuristically from these measures [e.g., by normalizing the values and taking an average of the parameters, and can be tailored for different kinds of sequences (traveling shots, single objects in motion, etc) or applications]. The editor may choose a pre-defined set of parameters from the activity measures template store (108) to detect or highlight a specific kind of activity (rapid motion, abrupt changes, accelerations, etc.)
The frame ranking measures can be employed by the user acting through the user interaction unit on the frame selection unit (104) to select the frames to be included within the interface. For example, if 10 frames are to be included in the interface then in default mode the 10 frames corresponding to the 10 largest frame making measures are selected for inclusion in the interface. The user can then interactively de-select some of these frames and add other frames.
The camera motion analysis unit (105) is an optional unit which typically will implement one of a number of known techniques for measuring camera motion parameters. This information can be used to determine what shape to give to the outer envelope of the interface as shown in FIG. 1C; a default shape, stored in the interface template store (116) can be chosen. This information may be optionally stored in the FDI file.
The object selection unit (106A) is responsible for selecting or detecting individual objects in the video document. There are various modes possible: in a completely manual mode the editor may visually select and outline an object of interest in a given frame through the user interaction unit (120); in a semi-manual mode, the editor simply points at an object and chooses from the object templates store (107) features and associated algorithms to use for extracting and tracking the chosen object; in another mode the editor may chose one of a set of pre-defined templates of objects and known pattern matching techniques are used to detect whether any objects of interest are preset. The user may even assign a name/identifier to the object and add the object to the object templates store (107). In this latter case searches for multiple occurrences of the same object can be initiated by the user. The information regarding the properties of the object may be optionally stored in the FDI file.
The object extraction and tracking unit (106B) is now responsible for extracting the object of interest from the frame and then tracking it by using known tracking algorithms. The algorithms used are either chosen by the user or by default. It is understood that the object selecting, detection, extraction, and tracking process may be highly interactive and that the user may be called upon or choose to intervene in the process a number of times. The information about the presence and location of objects may be optionally stored in the FDI file.
In certain applications the FDI file can be made available to an external program, for example when the interface editor is associated with an indexing program, the task of which is to attach indexes (identifiers) to the video documents, to portions thereof, or to objects located within the video document.
2. Visual Layout.
The user acting through the user interaction unit (120) on the interface creation unit (109) determines the visual layout of the interface.
He can shape the outer envelope of the interface in any way that he desires; two examples are provided in FIGS. 6 and 9; in particular, multiple sequences can be concentrated and so implement branching effects representing alternatives to the user. Default shapes are stored in the interface template store (116). The user can also choose to vary the spacing of the frames seen on the interface; that is the distance between frames of the interface as perceived on the display unit. The user can also insert selections of the extracted and tracked objects from unit (106B) as illustrated in FIG. 7B. In this case, the corresponding frames are rendered transparent except at the locations of the objects.
The different pieces of information generated by the units described above are gathered together by the interface creation unit (109) into an FDI file containing a description of the interface in terms of its layout i.e. shape and structure, the image frame numbers and their positions, and if available, the extracted features the ranking of the frames and the camera motion information. This information is transmitted to the interface effects creation unit (117).
3. Effects Creation.
The editor can also specify three classes of interface features which serve to convey additional information to the user and which allow the user to interact with the interface. The editor performs this specification through the interface effects creation unit (117).
The zooming effects creation unit (110) is used by the editor to specify which frames will be made visible, and also which will be rendered invisible to the user when he moves up closer to the interface (FIG. 5B) so as to view it from a new viewing position. The choice of frames to add depends upon factors such as, the distance of the viewing point from the interface, the degree of motion, the degree of scene change, the number of frames that can be made visible and optionally the frame ranking measures calculated by the activity measure unit (103). The editor can choose to use one or more of the default zooming effect templates contained in the zooming effect templates store (113) and assign these in a differential manner to different parts of the interface; alternatively the editor can choose to modify these templates and apply them differentially to the interface.
The special effects creation unit (111) is used by the editor to create special visual effects on the interface. One such example is the accordion effect illustrated in FIG. 8 where parts of the interface are compressed and other parts are expanded. Another example is illustrated in FIG. 7A and 7B where the editor has designated an extractable object and which is then shown in its extracted form; in other words, the background is removed. The editor creates the scripts by calling up templates from the specific effects templates store (114) and instantiating them by defining the positions where the special effect is to take place and by setting the appropriate parameters.
The script effects creation unit (113) allows the editor of the interface to build an interface viewing scenario that may be subsequently be played, in part or in whole, automatically, or interactively with user inputs. For example, in a completely automatic mode when the user calls up the interface it begins to play by itself and takes the user through the interface and any associated information by simply reading the scenario and changing the view of the interface. In other situations the script may call for the user to interact with the interface, e.g. to initiate a transaction. In this case the user may be asked to specify information, e.g. if he wants to purchase the video or any other items associated with the interface. In yet other situations the editor may leave visible tags which when activated by the user will cause some information to be displayed on the display device; e.g. associated text, graphics, video, or sound files which are played through the speakers of the display device. In certain cases these tags are attached to objects selected and extracted from the video sequence by units 6A and 6B and become so-called "hot object." The editor creates the scripts by calling up templates from the script effects templates store (115) and instantiating them by defining the tag and the locations of the information to be called up.
The interface effects creation unit (117) creates 4 files which are passed to the interface database manager (118) which will store these files either remotely or locally as the case may be: (1) The FDI file, completed by the special effect and script tags, text and graphics which have been added to the interface and which are directly visible to the user. (2) The zoom effect details, scripts and special effects. (3) The application programs (optional) to view the interface; i.e., allow the user to view the interface from different perspectives, traverse the interface, run the script, perform the special effects, or coded information which indicate which application program residing on the users machine can be used to perform these operations. (4) The video sequence and any other associated information (data) required for reading the interface.
These files are shown stored in storage unit (119) but depending upon the embodiment they may be physically located in the same storage device, in separate storage devices (as shown) either locally (as shown) or remotely.
During the editing process, the user/editor can view the interface under construction, according to the current set of parameters, templates and designer preferences, on the interface viewer unit (121) (presented in FIG. 12 and described below), thus allowing the editor to interactively change its appearance and features.
B. Interface Viewer Unit
Having chosen an interface through a traditional method, for example by using a database query language or by using a browser such as are used for viewing data on the Web, the interface viewer unit is then employed to read and interact with the interface.
In a typical application the storage units (201) are remotely located and accessed through the interface database manager (202) by way of a communication channel or network; depending upon the size and characteristics of the channel and the application the interface data may be loaded in its entirety or fetched on a as need basis.
The data are then stored in a local memory unit (203) which may be either a cache memory, a disk store or any other writable storage element. The local memory unit (203) stores the 4 files created by the editor (see above) and in addition a transaction/audit file. In certain cases the applications programs are already resident in the interface viewer unit and so do not need to be transmitted.
The CPU unit (204) fetches the application program, deduces which actions need to be performed, and then fetches the relevant interface information contained in the local memory unit (203). Typically the CPU unit fetches the required application program for the user interaction unit (205), the navigation unit (206), and the transaction/audit unit (207), then interface information is read from the local memory unit (203) passed to the interface renderer unit (208) which then calculates how the interface is to appear or be rendered for viewing on the display device (209).
The user interacts with the interface through the user interaction unit (205) to the navigation unit (206) and all his actions are audited by the transaction/audit unit (207). In addition, the user can interact with the transaction/audit unit (207) for example to supply any information required by the script which is then recorded and stored in the transaction/audit portion of the local memory unit (203). Depending upon the application, this transaction/audit file or a portion thereof is transmitted by the interface database manager to the appropriate storage unit (201). This information is then available for externally (optional) located auditing and transaction processing facilities/applications. In a typical situation, the auditing information is transmitted at the end of the session whereas the transaction information may be performed on-line, i.e. the transaction information is submitted during the session.
Through the navigation unit (206) the user can choose the point of view from which to view the interface (or a portion of the interface). The interface rendered unit (208) then calculates how the interface is to appear or be rendered for viewing on the display device (209).
If the user chooses to zoom in or zoom out, then the zoom effects unit (210) fetches the required application program, reads the zoom effect parameters stored in the local memory store (203), determines the frames to be dropped or added and supplies this information (including the additional frames if needed) to interface renderer unit (208) which then calculates how the interface is to appear or be rendered for viewing on the display device (209).
If the user chooses to view part of the underlying video then the video play effects unit (211), fetches the required application program, then reads the required video data from the local memory unit (203) and plays the video on a second display device (209) or in a new window if only one display device is available.
If the user chooses to interact with a hot pre-extracted object (created by the special effects unit), then the special effects unit (212), fetches the required application program, reads the locations of the object and the corresponding frames are modified so as to be transparent wherever the objects do not occur; the new frames are passed to interface renderer unit (208) which then calculates how the interface is to appear or be rendered for viewing on the display device (209). In cases where the extracted object is to be played as a video the frames are passed to the video effects unit (211) which then plays the video on a second display device (209) or in a new window if only one display device is available. Similarly if the user chooses to view an accordion effect then the special effects unit fetches the accordion effect store (203), determines the frames to be dropped or added and calculates parameters stored in the local memory the relative position of all the frames and supplies this information (including the additional frames if needed) to interface renderer unit (208) which then calculates how the interface is to appear or be rendered for viewing on the display device (209).
If the user designates a tag created by the script then the script effects unit (214) fetches the required application program, reads the corresponding portion of the script and the related information required to carry out the portion of the script associated with the tag designated. If the interface is to be played in automatic mode then the script effects unit (214) fetches the entire script and all the related information required to carry out the script. When needed the zoom effects unit (210), the video play unit (211), and the special effects unit (212) may be called into play. If the script calls for user input such as required for carrying out a transaction, then a new window may be opened on the display device (or on a second display device) where the information is supplied and transmitted to the transaction/audit unit (207). In semi-automatic mode control of the viewing of the interface is passed between the script effects unit (214) and the navigation as instructed by the user through the user interaction unit (205).
Although the above-discussed preferred embodiments of the present invention present certain combinations of features, it is to be understood that the present invention is not limited to the details of these particular examples. Firstly, since image processing is performed on image data in digital form, it is to be understood that in the case where the video sequence consists of data in analogue form, an analogue-to digital converter or the like will be used in order to provide image data in a form suitable for processing. It is to be understood that the present invention can be used to create interfaces to video sequences where the video data is in compressed form, encrypted, etc. Secondly, references above to user input or user selection processes cover the use of any input device whatsoever operable by the user including, but not limited to, a keyboard, a mouse (or other pointing, device), a touch screen or panel, glove input devices, detectors of eye movements, voice actuated devices, etc. Thirdly, references above to "displays" cover the use of numerous different devices such as, but not limited to, conventional monitor screens, liquid crystal displays, etc.
Furthermore, for ease of comprehension the above discussion describes interfaces according to the present invention in which the respective root images each have a single characteristic feature, such as, giving a visual representation of motion, or giving a visual representation of zoom, or having a multi-threaded structure, etc. It is to be understood that a single root image can combine several of these features, as desired. Similarly, special effects such as object extraction, the accordion effect, etc. have been described separately. Again, it is to be understood that interfaces according to the invention can be designed to permit any desired combination of special effects.
II. Global Topology
1 Glossary and Concepts
ODA
Obvious Database Annotation
An OBVI can handle several types of annotations. One of them involves external databases. An ODA is an annotation formatted as a database template, i.e. a list of fields. Each field has a type definition (integer, string, date, etc.). The content of the fields are stored in the OBVI file and can be exported or imported to/from a remote database via ODBC. This export/import functionality is handled by third-party plugins. Obvious Technology provides a sample ODBC export/import plugin.
OSF
Obvious Streaming Format
The OVI file format is suitable for desktop editing and most distribution schemes. However, for stream-based distribution schemes, an alternative format is available. It is called Obvious Streaming Format. OBVIs stored in this format can be efficiently streamed on various network types and under different protocols.
Obvious Site
An Obvious Site (or site) is a logical group of services that manage a set of OBVIs with the same security policy. Several Obvious Sites can share the same physical resources. For example, Obvious Technology will be able to host several sites on the same set of machines. These sites will have different identifiers and will not interfere.
Portal
Synonym for Obvious Site
OSM
Obvious Site Manager. Proposed new name for the Obvious Media Router.
The Obvious Site Manager is the main server in a site. It constitute the entry-point of a site and manages all others servers.
OAS
Obvious Administration Server.
The Obvious Administration Server is located between administration tools and the OIS database. Administration tools never talk directly to the OIS database. They send requests to the OAS.
OAM
Obvious Asset Manager.
The Obvious Asset Manager is a server that acts as a repository for archiving media files. It also hosts the VAMT, the video analysis engine.
OSD
Obvious Site Directory.
The Obvious Site Directory is a global directory service that handles the list of all sites. It is hosted by Obvious Technology. An OVI file doesn't contain the address of the site to which it relies. It contains a Site Identifier. The mapping between Site Identifiers and IP addresses is done by using the services of the OSD.
OMS
Obvious Media Server.
The Obvious Media Server distributes images, annotations, structure and OBVIs.
Identifiers
Various objects defined in the Obvious Network Architecture have unique identifiers.
These objects can be video documents, media files, streams, sites, OBVIs, groups, users, machines, services, categories, etc. Identifiers are unique in a given site. That means that 2 objects from 2 different sites may have the same identifier. However, since each Site also has a unique identifier, we can combine the Site Identifier with the object identifier to make a global unique identifier (GUID) for each object. The specification of this GUID is given below in section XX entitled XML Format for Object Annotations.
Vdoc and Media
A Video Document (Vdoc) is a format-independent concept of a video. Any physical copy of a Vdoc, in whole or in part, is called a Media, regardless of the copy's format. For example, from a Vdoc representing a TV movie, you can create 3 media:
a BetaSP copy of the whole program
a MPEG stream corresponding to the encoding of the first 45 minutes of the program
an AVI file corresponding the last 10 minutes of the program
2 Components
The Obvious Network Architecture is a distributed systems composed of several software components. These components can be grouped into 5 categories:
1. Runtime Components
The Runtime Components constitute the core elements that are involved during the visualisation of OBVIs. When an OBVI is opened from a client application, several Runtime Components are used for streaming the video, retrieving the images of the blocks, accessing the annotations and the structure, etc.
2. Video Components
The Video Components are involved in the process of video acquisition, storage, registering of media pieces. Video Components play an important role in the Obvious Network Architecture. OBVIs are bound to a video and, except for OVI files created from scratch, this video must be properly registered in the system before it can be used.
3. OBVI Components
The OBVI Components concern the authoring, indexing and publishing of OBVIs. OBVIs are created from a registered or a non-registered video file. Then, they are published on the system as database objects. Their content is indexed and end-users can do a search for retrieving an OBVI in various forms.
4. Distribution Components
The Distribution Components are involved for the distribution of OBVIs from a repository location to the end-user. Some of these components simply use traditional distribution channels (Email, Web, FTP). Others try to embed OBVIs into video streams (ASF, RM, QT).
5. Database Components
The Indexing Components offer indexing facilities for the others components. These components use database technologies for providing a reliable and efficient way to store, index, query and transact on various objects in the system. In particular, Database Components handle video, OBVIs, streams, users, machines as database objects.
FIG. 13 gives an overview of the whole network architecture.
2.1 Runtime Components
The Runtime Components are detailed in FIG. 14. This diagram shows that Runtime Components interact directly with client applications and with the Database Components. Runtime Components include several types of servers: video server, web server, OSM, OMS, OSD.
2.1.1 Obvious Site Directory
The Obvious Site Directory is a directory service for Obvious Sites. By using the services of an Obvious Site Directory, an end-user can get the list of all available Obvious Sites in the world with their characteristics, security policies, description of their content, etc. All sites that are Obvious-compliant, i.e. being using our technology or being hosted on our portal, will have an entry in this directory.
An OBVI contains a Site Identifier. A Site Identifier is a number that uniquely references a site. The mapping between the Site Identifier and the actual address of the site involves a mechanism called Site Resolution. This mechanism is described in below in Section V entitled The Obvious Site Directory.
If the administrator of a site does not want to be referenced on this directory then the site is said to be autonomous. In that case, site Resolution is done locally, at each client application, since only one site/IP pair is necessary.
2.1.2 Obvious Site Manager
This component is responsible for the management of an Obvious Site. It handles access control and service replication. Client applications must first connect to an Obvious Site Manager before accessing resources in a given Obvious Site. The Obvious Site manager is the entry point for a given portal.
2.1.3 Obvious Media Server
The Obvious Media Server is involved during the visualisation process of an OBVI. Its primary goal is to distributes still images corresponding to OBVI blocks (because video servers can not distribute still images efficiently). The OMS allows remote applications to retrieve individual still images that can be used for 2D/3D storyboard, thumbnails of video documents, etc. Aside the distribution of still image, the OMS also accomplished important tasks such as the distribution of annotations and structure, the generation of OVI and XML files, etc.
2.1.4 Video Server
Several video servers can be used in the Runtime Components. The architecture does not rely on a specific streaming technology. Any third-party video server can be incorporated in the Obvious Network Architecture, if client applications (such as the Obvious Media Viewer) can support it. For example, the OMM/OME/OMV suite of applications support RealServer G2, NetShow and NetShow Theater streams.
RealServer G2 and NetShow should be used on low-bandwidth networks. NetShow Theater is the preferred choice for delivering broadcast-quality MPEG video streams across high-bandwidth networks and intranets.
2.1.5 Web Server
In the Obvious Network Architecture, several web servers can be used for accessing annotations. These servers distribute any content type that can be displayed in the IE-based annotation viewer of the OMM/OME/OMV tools. When an OVI file is published/indexed, its embedded annotations are converted into HTML and automatically published on these Web servers.
2.1.6 External Database
In the case of database annotations (ODA), several external databases can be used. Access to these databases is handled by a specific plugin, via ODBC, ADO or any other database access mechanism.
2.2 Video Components
The Video Components are detailed FIG. 15. These components interact with the Database Components and the OBVI Components.
2.2.1 Video Cataloging Tools
Video cataloging tools perform the following tasks:
Video acquisition
Closed-Captions acquisition
Video registering in the OIS database
Video analysis and video archiving (on the Obvious Asset Manager)
In current implementation, these tasks are accomplished from a unified interface: the Obvious Management Console. The Obvious Management Console, described in details below in section XI, entitled Obvious Management Console, is an application that allows the administration of the whole system. In particular, it allows to run these video cataloging tools.
Video analysis and video archiving are handled by the Obvious Asset Manager.
2.2.2 Obvious Asset Manager
The Obvious Asset manager has 2 roles. It acts as a repository for the video files, for archiving purposes and it also hosts the VAMT for video analysis purposes.
Concerning video archiving, the Obvious Asset Manager is basically an FTP server that allows remote clients applications to upload their video files. This upload is never done manually. It is automatically handled from the Obvious Management Console, when the user registers a new video file. Video registering (composed of Video Document registering and Media registering) plays an important role in the Obvious Network Architecture.
Concerning video analysis, 3 modules are used: the VAMT Engine, the VAMT Service and the VAMT Manager. The VAMT Engine is a server that runs several simultaneous analysis jobs. It internally uses the Obvious VAMT Engine which handles the core analysis. Each analysis job involves the analysis of a specific video file from a timecode in to a timecode out. The Obvious VAMT Manager is the graphical user interface that allows remote administration of the Obvious VAMT Service. The Obvious VAMT Manager allows the user to define, launch, stop and edit analysis jobs. These modules are described in details in FIG. 15 and in more detail in Section IX entitled the Obvious Asset Server.
2.3 OBVI Components
FIG. 16 describes the OBVI Components. These components interact with the Runtime Components and the Database Components.
2.3.1 Authoring and Editing Applications
There is only one authoring tool in the current state of development: the Obvious Media Manager. The Obvious Media Editor is an editing tool.
2.3.2 Viewer Applications
The Obvious Media Viewer is a viewer application implemented in Visual Basic and Visual C++. The Obvious Java Viewer is a cross-platform, lightweight and web-enabled OBVI viewer.
2.3.3 Obvious Publisher
The Obvious Publisher is the end-user graphical interface that allows to publish and index OVI files. Publishing involves the transformation of the OVI file into an OBVI object that can be stored in the OIS database. Indexing concerns the OBVI metadata and annotations. In current version, the indexing process can use the Microsoft Index Server or the Oracle 8 database (with ConText Cartridge). In both cases a specific filter is needed to parse the OVI files, extract meaningful information and populate index information. The Obvious Publisher doesn't do the publishing and the indexing job. It is a GUI that collects all the information necessary for publishing. The OVI file and the collected information are then uploaded to another machine where resides the Obvious Publishing Engine.
2.3.4 Obvious Publishing Engine
The Obvious Publishing Engine is the core module that handles the publishing and indexing of an OVI file. It acts as a daemon that automatically takes OVI files, parse them, publish their annotations, and create corresponding entries in the database.
2.3.5 Obvious Publishing Manager
The Obvious Publishing Manager is an application that can be used by administrators to remotely control and manager the Obvious Publishing Engine. From that application, the user can see the status of the publishing/indexing process, see the number of jobs, configure the scanned directories, etc.
2.4 Distribution Components
2.4.1 Classical Distribution
OVI files can be distributed by Web servers, FTP servers or by email. An OVI can be bound to its meia content in several ways. Concerning video, it can have the whole media file embedded or it can be linked to a registered Vdoc/Media. Concerning the annotations, they can be embedded or distributed on-demand at runtime.
2.4.2 On Demand Distribution
The Obvious Media Server is able to distribute OBVIs on demand (as OVI, XML or OSF files) or multicast (as OSF files). The distributed OVI, XML and OSF files can be either generated on the fly or pre-calculated.
2.4.3 Streamed Distribution
The Obvious Streaming Architecture defines the mechanisms that allows:
the conversion of OBVI files into OSF streams (Obvious Multicast Builder)
the multicast of OSF streams on IP channels (Obvious Multicaster)
the receiving of OSF streams at the client side (Obvious Multicast Listener)
These elements are detailed in section XVI, entitled The Whole Picture.
2.4.4 Immerse Streamed Distribution
MPEG2 Embedding
Several hardware/software technologies allow the encapsulation of bit-streams, byte-streams, and IP data into MPEG-2 packets. These MPEG-2 packets can then be injected into a DVB-compliant transport stream that can be carried over satellite, cable, or terrestrial digital transmission systems. By combining audio/video and OBVI data streams, these technologies enable a multitude of point-to-multipoint OBVI delivery applications for traditional broadcasters looking for new business models.
QT Embedding
QuickTime movies can handle user-defined tracks of information. A new track can be created for carrying OBVI data in a traditional QuickTime movie. The track identifier, as defined by the QuickTime specification, is "OBVI". The embedding of an OBVI in a QuickTime is an alpha functionality. It is available in the OMM and is implemented in a separate DLL called LibQT.dll. It currently uses QuickTime 3. Porting to QuickTime 4 is expected in next major release of the OMM.
A new QuickTime player has also been implemented. It acts as a classic QuickTime player. However, if the input movie file contains the OBVI track, a dialog box asks the user if he wants to launch the OMM. In that case, the OVI file is extracted from the QuickTime file is opened with the OMM.
ASF Embedding
Microsoft has recently defined a new ASF format. NetShow 2.0 was using ASF version 1.ASF version 2, available in NetShow 3.0 is more powerful and flexible. It allows the embedding of user-defined packets of data. Experiments have shown that embedding OBVI data into an ASF stream should be easy. However, we should wait until the end of the OSF specification because it should be possible to directly embed OSF packets in an ASF stream.
RealMedia Embedding
The RealServer G2 SDK exposes a way for embedding user-defined packets of data in a RealMedia stream. Here again, there will be a strong relation with OSF. OSF packets can be directly embedded in RealMedia streams.
2.5 Database Component
The Database Component is also called the Obvious Indexing System (OIS). It acts as a repository for registering and indexing the various entities in the system: video documents, media, streams, OBVIs, users, groups, etc. The Database Component interact with all others components.
3 Implementation Issues
3.1 Technologies Involved
The Obvious Network Architecture is designed for Windows technologies. Microsoft Internet Information Server is extensively used in servers that need to provide services to remote application clients using HTTP. Others servers. A three-tier model is used. The OIS constitute the backend store, the database. The business logic, as defined in the three-tier model, is implemented as a set of ISAPI scripts and NT services. Client applications interact with the system by using HTTP. XML is heavily used at different levels as a standard format for data exchange between client/server and server/server communications.
Some parts of the system use direct TCP/IP communications, essentially for internal server/server data exchange.
All server components are implemented in Visual C++/MFC. This include NT services (implemented with ATL) and ISAPI extensions. MFC is used for all graphical parts.
3.2 Porting
Server components can easily be ported to others platform, especially Unix. ISAPI scripts can directly be converted into CGI scripts. NT services can be rewritten as Unix daemons. Windows-specific code has been, wherever possible, coded in a separate DLL.
Concerning the database, standard built-in types have been used in the definition of the tables. Data access is handled by ADO and should be easily replaced buy direct ODBC calls, if necessary.
III. THE OBVI
1 The OBVI
An OBVI is not a file. An OBVI is a hyper media container. This general definition does not imply any particular storage format. However, in the current implementation of the Obvious Architecture, an OBVI is stored and managed as a database object. This is the primary storage format.
From this primary, database-centric, storage format, an OBVI can the be exported in any additional storage format (an XML file, an OVI file or an OSF file), called secondary storage formats. In particular, the OVI file format is one of the storage format that has been defined fort storing OBVI objects outside the database, on a regular file-system. The OVI file format is a way to store an OBVI in a binary file. This file can be read by client applications for visualising and interacting with the OBVI.
Historically, the first OBVI format storage that has been developed is the OVI file format. This is the format that is natively read from the OMM/OME/OMV suite of client applications. However, this file-based storage format is only one of the possible secondary storage formats to which an OBVI can be exported.
The Obvious Network Architecture, described in details in this document, focuses on the specification of the primary storage format: the database format. It will also show how OVI files can be converted into OBVIs (this is the publishing/indexing process) and how OBVIs can be extracted from the database in an OVI or a XML form.
2 Annotations
In current version of the OVI file format, the following annotations types can be found:
1. HTML
2. Wordpad
3. Text
4. Closed-Captions
5. SpeakerID
6. Database template
7. Audio
8. Object
An annotation can be either embedded in the OVI file or located on another server. In the second case, the OVI stores the URL of the remote annotation. While the internal mechanism is already implemented in the OBVIKernel, HTML annotations are the only kind of annotations that fully use this feature: the user can create embedded HTML annotation or linked HTML annotations.
However, as it will explained in future chapters, the Obvious Network Architecture heavily relies on this mechanism for providing a global framework for publishing indexing and distributing OBVIs. Future versions of the OMM will implement this dual mechanism for all others annotations types.
2.1 HTML Annotation
In an OVI file, HTML annotations are internally represented by a description file. This is a text file describing the material that constitute the HTML annotations. In the case of a remote HTML annotation, the description file contains the full URL of the document. In the case of an embedded HTML annotation, the description file gives the names of all temporary files that constitute the annotation. These can be HTML files, JAVA applets, images or any other file that can be part of an HTML page.
To ensure a proper migration into the Obvious Network Technology framework, this description file will be encoded in XML. This change is expected for next major release of the OMM.
Another expected change concerns the edition of the HTML annotations. An in-place HTML editor should be used for allowing the user to directly modify the annotation HTML, in WYSIWYG mode, without launching any external editor. Microsoft's DHTML editor is the preferred one.
In current version of the OMM, HTML annotations are viewed with Internet Explorer. That means that we can virtually handle any document that can viewed on the Web (Word document, PowerPoint presentation, VRML files, etc.).
2.2 Wordpad Annotation
A Wordpad annotation is represented by an RTF file. In the OMM, a standard "Rich Edit" component is used for visualising and editing these annotations.
2.3 Text Annotation
2.4 Closed Caption Annotation
In current version of the OMM, closed-caption annotations are created by converting a Virage VDF file into an OBVI. The user can directly create closed-caption annotation in the OMM. However, this functionality should be removed because closed-captions are supposed to be the result of an automated process (for example, during video acquisition, a hardware module grabs the closed-captions from the analog video signal). And, since closed-captions are a transcription of the speech track, allowing the user to create closed-captions will be very confusing.
2.5 SpeakerID Annotation
A SpeakerID annotation is a small line of text describing the person who is speaking. These annotations are currently created by converting a Virage VDF file into an OBVI. They can also be created manually by the user.
2.6 Database Template Annotation
Database template annotation (formerly Obvious Database Annotations, ODA) are internally represented by a description file. This text file gives the name and type definition of each field of the template.
To ensure a proper migration into the Obvious Network Technology framework, this description file will be encoded in XML. This change is expected for next major release of the OMM.
2.7 Audio Annotation
An audio annotation is constituted by a title, a description and an audio file. The audio file can be either embedded in the OVI file or located on another server. In that case, its URL is stored in the OVI file.
2.8 Object Annotation
An object annotation describes one or several object paths in the video. It has a title, a description, a URL, a video range and a list of object paths, one for each object of the annotation. Each object path is defined by a title, a description, a URL and a list of bounding boxes. The list of bounding box gives the successive positions of the object in the video sequence. It is not required to have a bounding for each frame in the range. Each bounding box is represented by the corresponding frame number, its position (x, y, width and height), a description and a URL. Text descriptions are plain ASCII.
FIG. 17 depicts an object annotation composed by 2 object: A and B. Each object has a path, described by a list of bounding box, one for each image. The range of the object annotation is 35-45. However, only frames 35, 36, 43 and 45 have a the bounding box of the objects. This allows to represent sparse and non contiguous sets of object positions. Metadata (represented by a title, a description and a URL) can be attached at different levels: at object annotation level, at the level of each object path or at the level of each bounding box. An object annotation is internally represented by an XML file. the following gives a sample XM file corresponding to the 2 objects depicted below. We suppose here that object A represents a bird and that object B represents a cat.
<?xml version="1.0"?>
<ObjectAnnotation RangeMin=36 RangeMax=45>
<Title>Bird and cat</Title>
<Description>This annotation contains a bird and a cat</Description>
<URL></URL>
<ObjectPath>
<Title>Bird</Title>
<Description>Motion tracking of the bird</Description>
<URL>http://www.company1.com/page1.html</URL>
<Path>
<Box x=140 y=126 width=76 height=65 frame=35>
<Description>The bird is sleeping</Description>
<URL>http://www.company1.com/birds.html</URL>
</Box>
<Box x=140 y=126 width=76 height=65 frame=36>
</Box>
<Box x=140 y=126 width=76 height=65 frame=43>
<Description>The bird is dead</Description>
</Box>
<Box x=140 y=126 width=76 height=65 frame=45>
</Box>
</Path>
</ObjectPath>
<ObjectPath>
<Title>Cat</Title>
<Description>Motion tracking of the cat</Description>
<URL>http://www.company1.com/page2.html</URL>
<Path>
<Box x=100 y=229 width=127 height=134 frame=35>
<Description>The cat is hungry</Description>
</Box>
<Box x=145 y=190 width=133 height=130 frame=36>
</Box>
<Box x=149 y=198 width=120 height=139 frame=43>
<Description>Cat eating the bird</Description>
<URL>http://www.company1.com/cats.html</URL>
</Box>
<Box x=142 y=220 width=128 height=155 frame=45>
</Box>
</Path>
</ObjectPath>
</ObjectAnnotation>
Object annotations can be manually created or they can be the result of an automated process. In the first case, the use manually draws a bounding box around objects of interests. In the second case, the motion tracking algorithms developed by the Phoenix team will directly provide the set of bounding boxes.
Concerning the visualisation of Objet Annotations, a new module is under development. It will allow to draw the bounding boxes corresponding to a given set of object paths. This module will be implemented as an ActiveX Control, to ensure an easy integration with the OMM.
3 Publishing and Indexing an OBVI
Publishing an OBVI means converting the OBVI from the OVI file format to the database-centric file format. Thus, the publishing operation must be considered as a storage conversion: from a secondary (file-based) storage format to the primary (database-centric) storage format. Once an OBVI is published and indexed, it can be re-extracted from the database as an OVI or an XML file. Then the extracted OVI file can be modified and re-published later, creating a new OBVI version.
4 Storage Scenarios
Before discussing the details of the various storage formats, let's present some possible scenarios that show how different storage formats can be used.
The primary storage format is essentially used in the following scenario:
SCENARIO 1:
From a particular client application, the user browses the OIS for existing OBVIs. He makes a boolean search that gives a list of possible matches. These matches correspond to OBVIs that are stored in the OIS, i.e. with the primary storage format. When the user selects an OBVI for viewing (read-only mode), the client application dynamically interact with the OIS (via various others runtime components) for getting all the information needed for visualizing the OBVI (metadata, structure, images, annotations, etc.).
The OVI file format is essentially used in the following scenarios:
SCENARIO 1:
From a particular client application, the user browses the OIS for existing OBVIs. Then he selects an OBVI and requests a write mode, meaning that we would like to be able to modify the OBVI. In that case, an OVI file is generated on the fly by the OIS (with the help of others runtime components) and is sent to the client application. The OBVI that the user manipulates is therefore stored as a file and can be saved locally on the client machine. This OVI file can be sent by email, modified and then re-published on the OIS. The Obvious Network Architecture specifies a version management scheme that allows the system to handle and track multiple versions of the same OBVI.
SCENARIO 2:
From a particular client application, the user creates an OBVI from scratch. He first selects a media by either browsing the local file system or browsing the OIS for a pre-registered media. Then an OBVI is created by using the VAMT (local or remote) or an EDL. The corresponding OBVI is then stored as an OVI file that can be modified at any time. Finally, this OBVI is published in the OIS. That means that a module will take care of the conversion of the OBVI from the OVI file format to the database-centric storage format.
The XML file format is used in the following scenario:
SCENARIO 1:
From a particular client application, the user browses the OIS for existing OBVIs. He makes a boolean search that gives a list of possible matches. These matches correspond to OBVIs that are stored in the OIS, i.e. with the primary storage format. When the user selects an OBVI for viewing (read-only mode), the client application downloads the corresponding XML file and interprets its content.
The OSF file format is used in the following scenario:
SCENARIO 1:
From a particular client application, the user selects an OSF channel. On that specific channel, OBVIs are transmitted as OSF packets, in multicast IP. The client application automatically stores the receives OSF files on the user hard-drive. The OSF files are converted into OVI files that can be opened with the OMM/OME/OMV suite of client applications.
SCENARIO 2:
The user is viewing a video stream (RelMedia or NetShow stream, for example). He has a special player that will automatically, while the user is viewing the video stream, decode any embedded OSF packet.
In conclusion, an OBVI is stored in the primary storage format (i.e. in the database) when it is published. Then, it corresponds to an identified object that is securely managed by the database. This object has a unique identifier and is attached to a version number. Its content is indexed and client applications can do search queries for browsing and retrieving a particular OBVI. In the other hand, an OBVI is stored as an OVI file (i.e. a secondary storage format) when editing is necessary (because the user created a new OBVI or because the user wanted to save an OBVI on its local machine for further editing tasks).
The ability to export an OBVI from the database to an OVI or XML file gives a clear separation between the OBVI (as an object, independently of its storage format) and the file format that can be used for storing the OBVI outside the database.
5 Storage Format
5.1 Primary Storage Format
As previously described, the primary storage format is based upon a relational database system. The OBVI is handled as an object in the database. Concerning implementation issues, two possibilities are of interest:
1) The OBVI can be stored as a user-defined datatype. That means that a new datatype must be created in the database. The database must be object oriented. Procedures and policies must also be defined to handle this new datatype. This object is stored in database tables and is indexed.
2) The OBVI can be stored by using a set of built-in datatypes. The OBVI is decomposed as a set of elements (pieces of metadata, individual images, timecodes, annotation links, etc.) and all these elements are stored in database tables, using built-in datatypes (integers, floating points, character strings, date values, etc.). This solution is portable because it doesn't imply the creation of a new datatype. The current version of the Obvious Network Architecture is based on this solution.
5.2 Secondary Storage Formats
5.2.1 The OVI File Format
The OVI file format is used by the OMM/OME/OMV suite of client applications. These applications use a low-level library (the OBVIKernel, which is an ActiveX control) for accessing and interpreting this binary file format.
An OVI can reference a video in two ways: the video file can be local (local hard-drive or on the LAN) or remote. In the second case, the OBVI is said to be bound to a registered media, meaning that a connection to the server components is necessary.
The header of the OVI file contains a number of fields. However the following fields are of special interest here because they are concern the interaction between client applications and server components.
Field Name Type Description
SiteID Number Site Identifier
ObviID Number OBVI Identifier
VersionID Number Version Identifier
VdocID Number Video Document Identifier
MediaID Number Media Identifier
MediaLocation String Location of the media file
The following tables give sample values of these field corresponding to different kind of OVI files
Sample 1: The OVI is bound to a local media file
This is the case of OBVIs created from scratch by using the local VAMT of the OMM. The corresponding OVI file contains the file path of the video file. Here is a sample set of values for the header fields:
SiteID =0
ObviID =0
VersionID =0
VdocID =0
MediaID =0
MediaLocation
="c:.backslash.test.backslash.media.backslash.movie.avi"
All the fields are empty (an null value is used to specify an empty or irrelevant field) excepting the MediaFile field which contains the full path of the media file.
Sample 2: The OVI is bound to a remote video file but doesn't correspond to an OBVI in the database
SiteID =0
ObviID =0
VersionID =0
VdocID =5894
MediaID =488
MediaLocation =""
In that case, VdocID and MediaID are nor relevant.
Sample 3: The OVI is bound to a remote video file and corresponds to an OBVI in the database
This is the case of OBVIs extracted from the database in write mode. The corresponding OVI file contains a Video Document Identifier and a Media Identifier that uniquely determine a registered media file in the OIS database. Here is a sample set of values for the header fields:
SiteID =178
ObviID =66
VersionID =965
VdocID =5894
MediaID =488
MediaLocation =""
In that case, all the fields are relevant excepting the MediaLocation field.
This kind of OVI files contain a valid OBVI Identifier and a valid Version Identifier that allow to track the original OBVI and the parent version. If the user modifies this OVI file and re-publishes it, then
5.2.2 The XML File Format
The tools that allow to store an OBVI as a XML file are currently under development. However the DTD is already defined. Basically, the XML file will provide enough information to client applications for starting a runtime session. The runtime session starts by connecting to various server components. Query are sent to these runtime components for retrieving all the material needed for the visualisation/editing of the OBVI.
In current implementation, the XML file corresponding to an OBVI is used by the Obvious Java Viewer. It can be generated and distributed by the Obvious Media Server.
5.2.3 The OSF File Format
OSF stands for Obvious Streaming Format. An OSF and an OVI file contain the same information. However, the internal structure of the OSF file, composed of chunks, allows efficient streamed transmission of OBVIs. As an example of a possible use of the OSF file, a set of beta tools have been developed. They allow to build an OSF file from an OVI file, and multicast the OSF file on an IP channel. At the client level, a multicast receiver downloads the OSF streams and stores them on the client hard drive. These tools are described in more detail below in Section XVI.
6 Versioning
OBVI version management is handled in the OIS database. Several versions of the same OBVI can be stored. Each version has a unique identifier and is represented by an author and a creation date. Different versions of the same OBVI can share the same blocks and annotations. Thus, it is possible to track changes from version to version. Version is automatically incremented during the publishing operation.
Versioning is handled by using 2 identifiers: an OBVI identifier (OBVIID) and a version identifier (VERSIONID). Different versions of the same OBVI have the same OBVIID but different values of VERSIONID.
The following gives a sample cycle of life of an OBVI. This cycle shows how versioning works and how OVI are used for extracting an OBVI from the database. It is constituted by 5 phases:
In phase 1, an OBVI is created from scratch: a local video file is used for the VAMT analysis. The resulting OVI file is annotated and shared among various users. The header of the OVI file has a null OBVIID and a null VERSIONID, meaning that this OBVI is not related to the database it is a working OBVI, created locally for authoring purposes. This OBVI can be sent by email, provided that the receiver has a way to access the corresponding video (the transmitter can either send the video separately, embed the video in the OVI file or give LAN access to the original video file).
In phase 2, the OVI is published. It is sent to a specific server that handles the publishing and indexing process. The publishing and indexing process is basically a matter of creating a new OBVI entry in the database with a new version identifier. OBVI annotations are converted into HTML and published on a Web server. The OBVI has now a valid OBVIID and a VERSIONID (lets say OBVIID=123 and VERSIONID=1).
In phase 3, another user browses the database and selects the previously published OBVI in read-only mode. An XML version is generated on the fly and is rendered on the client browser by using the Obvious Java Viewer.
In phase 4, another user browses the databases and selects the same OBVI in write mode. In that case, an OVI is generated on the fly and is sent to the client machine. The OMM/OME is automatically launched and allows the user to modify the OVI. For that purpose, the generated OVI contains the whole set of annotations (as a set of embedded HTML pages). The header of this OVI has an OBVIID equal to 123 and a VERSIONID equal to 1.
In phase 5, the user modifies the OBVI by merging some blocks and by modifying some annotations. Then, he re-publishes the OBVI (OBVIID=123, VERSIONID=1). A new version is then created in the database (OBVIID=123, VERSION=2). The new OBVI version has the same blocks and annotations except for those that have been modified by the user. Since most of the blocks and annotations are common to both versions, the new OBVI version is stored in a very efficient way and no duplication of common material is necessary.
IV. OBVI SDK
1 Concepts
The OBVI SDK allows third-party applications to access OBVI objects from either the OVI file format, the XML file format or from the OIS database. The OBVI SDK is supposed to give the same level of functionalities than the OBVI Kernel.
In current implementation, the OBVI SDK is available as a DLL that exposes basic functions for opening, reading, editing and saving OVI files. As |