Advanced tools for speech synchronized animation5613056Abstract A random access animation user interface environment referred to as interFACE enabling a user to create and control animated lip-synchronized images or objects utilizing a personal computer for use in the users programs and products. A real-time random-access interface driver (RAVE) together with a descriptive authoring language (RAVEL) is used to provide synthesized actors ("synactors"). The synactors may represent real or imaginary persons or animated characters, objects or scenes. The synactors may be created and programmed to perform actions including speech which are not sequentially pre-stored records of previously enacted events. Furthermore, animation and sound synchronization may be produced automatically and in real-time. Sounds and visual images of a real or imaginary person or animated character associated with those sounds are input to a system and may be decomposed into constituent parts to produce fragmentary images and sounds. A set of characteristics is utilized to define a digital model of the motions and sounds of a particular synactor. The general purpose system is provided for random access and display of synactor images on a frame-by-frame basis, which is organized and synchronized with sound. Both synthetic speech and digitized recording may provide the speech for synactors. Claims We claim: Description BACKGROUND OF THE INVENTION
TABLE I
______________________________________
interFACE MENUS
______________________________________
1 100 Apple
2 101 File
3 110 Edit
4 102 Go
5 106 Actors
6 107 Sounds
7 104 Dressing Room
8 108 Speech Sync
9 114 Stage
10 122 Clip Art
______________________________________
With particular reference to FIG. 4, a system block diagram illustrating the various operational modes of the interFACE system is shown. The interFACE system comprises four basic screens or modes: the Dressing Room 59, the Speech Sync Lab 61, the Stage 63 and the Help mode 65. Each mode is represented by a screen button on a navigation window 57. When a user initiates the interFACE system, a startup screen 55 is displayed by the host system. The startup screen (not shown) comprises one card and informs a user that he or she has initiated and is running the interFACE system. The startup screen also provides the user with bibliographic information and instructions to begin use of the interFACE system. After a short pause to load the program, the RAVE and RAVER drivers are loaded and called to perform predesignated system checks. The RAVE driver is a portion of the interFACE system that handles much of the programmatic functions and processes for synactor handling. The RAVER driver contains a number of programmatic functions related primarily to synactor editing. It is only used in the authoring system. The segregation of these functions reduces the memory requirements of the use system 40, which includes only the RAVE driver. After the initial system checks have been completed, the interFACE system displays the navigation window or palette 57, unless other preferences were previously set up by the user. The navigation palette 57 displays four buttons 571, 573, 575, 577 for accessing the interFACE system modes. Access may also be accomplished from various modes within the system utilizing menu commands. Normally, the navigation palette 57 is positioned on the current screen display immediately below the menu bar 579 to conserve screen space and avoid blocking of other interFACE windows. The user, however, may position the navigation panel 57 anywhere on the screen display. The navigation palette 57 is provided with "handles" (not shown) to allow it to be repositioned on the display. To move the navigation palette 57 the cursor is moved to either handle with a mouse (or other means). With the mouse button depressed, the navigation palette is then moved (dragged) horizontally or vertically on the display to the desired position. The four navigation buttons allow the user to go to the Dressing Room 59, go to the Stage 63, go to the Speech Sync Lab 61 or to go to the interFACE context sensitive Help mode 65. With the exception of the help button 577, the navigation buttons take the user to different screens within the interFACE system. The help button 577 initiates the context sensitive Help mode 65 which provides assistance when a user positions a "?" cursor over a desired item displayed on the screen currently in use. Referring now to FIG. 6, the Dressing Room 59 is used to create new synactors or to edit existing synactors. A user may go to the Dressing Room from any mode within the interFACE system. Generally, the Dressing Room can be selected by the navigation 57 when in any mode within the interFACE system. The dressing room may also be selected from the GO menu (as shown in FIG. 9d). The Dressing Room can also be selected as the opening startup section for interFACE by selecting dressing room on the interFACE Preferences window (as shown in FIG. 9f). When the Dressing Room 59 as initiated, two windows 67, 71 automatically appear on the display. The dressing room window 67 which provides the working space for creation and editing of synactors. The control panel window 71 allows the user to navigate between the various images of the current synactor and displays information related to the image 711 currently displayed in the control panel window 71. The title bar 671 at the top of the window 67 displays the name of the selected or current synactor 85. If a new synactor is being created/edited, the title 673 of the window will be "Untitled". The dressing room window 67 can be moved to any location on the screen 59. In addition, the size of the dressing room window 67 may be changed as desired. The synactor image 85 is displayed on the easel 83 in the dressing room window 67. Referring now to FIG. 7a, the Stage screen 63 provides a display for examining and testing the lip-synchronization of newly constructed synactors with a voice provided by a speech synthesizer. Similar to the Dressing Room screen 59, a user to get to the Stage screen 63 by three methods. Select the Stage as the startup screen; from the navigation palette 57; or use the stage command from the Go menu. Similarly, a user may go to the Stage screen 63 from any mode within the interFACE system. The Stage screen can also be designated as the startup screen. When the Stage screen is the startup screen, a synactor must be selected before proceeding. The Stage screen 63 enables a user to test a synactor's sound animation. Sample text is entered via the keyboard or other suitable means in text field 871 provided in the stage window 87. Depressing the Return key or Enter key on the keyboard, or using a mouse to click on the talk button in the stage window or selecting the Speak Text window command under the Stage Menu causes the synactor to speak using the speech synthesizer which was last selected with the sound window to provide synthesized audio from the sample text in the text field 871. Synthesized speech, as used in the stage screen, is automatically synchronized by the RAVE driver as described in more detail hereinbelow. For digitized sounds, the process of ensuring that the synactor face image has the correct lip position at the time the sound is produced is referred to as speech synchronization. Digitized recordings of the user, famous people, or any other sound can be used with synactors. Referring now to FIG. 8a, the Speech Synchronization Lab (Speech Sync Lab) 61 is responsible for the creation of commands which automatically synchronize synactor animation and digitized sounds. The Speech Sync Lab 61 provides automatic approximations of the animation required for a particular sound utterance. The animation generated by this approximation will not always be perfect, but serves as a starting point for editing or fine tuning the synchronization. Various tools then are used to aid in fine tuning the speech synchronization process. The Speech Sync Lab 61 basic synchronization methodogy (referred to as "phonetic proportionality") and additional enhanced speech synchronization tools will be described in greater detail hereinbelow. InterFACE's online Help mode 65 provides useful information related to the various functions and features of the interFACE system. The interFACE Help mode can be accessed in two ways: (1) selecting the HELP command from the file menu (as shown in FIG. 9a) or (2) selecting "Context Sensitive Help" by use of the "?" button 577 on the navigation palette 57. Selection of help on the file menu (shown in FIG. 9a) provides the help dialogue window shown in FIG. 9j. When a topic is selected from the dialogue window, the help system 65 retrieves and displays the selected information. Context sensitive help is an advanced extension of the interFACE Help mode that allows a user to quickly and intuitively find information help related to a window or menu command or other feature. When the context sensitive help mode is entered, the cursor changes into a "?" and the navigation palette context sensitive help button 577 becomes highlighted. The contact sensitive help mode is used by positioning the "?" cursor over or near the screen or window item that related information is desired. To exit the Help mode, press command-? again or click on the navigation panel's context sensitive help button 577. The cursor will return to its previous setting and the context sensitive help button 577 will no longer be highlighted. Various elements of the interFACE system can be altered from their default setting using the preferences window (shown in FIG. 9f). The preferences window is opened by choosing the preferences command from the file menu. The preferences window can be used to choose a startup screen, ie, the screen which appears immediately after the initial interFACE screen, designate which windows are automatically opened and where the windows open up. The Preferences window is described in greater detail herein below. With continuing reference to FIGS. 4 and 9a-9k, each of the three basic modes 59, 61 and 63 has its own unique menus and windows. A number of pull down command menus common to all of the basic modes provide operational control of the interFACE system. The menus common to all modes are the File Menu, Edit Menu, Go Menu, Synactor Menu and Sounds Menu shown in FIGS. 9a-9e. The common menus are selected from the command menu displayed on the command menu line 66 at the top of the display as shown in FIGS. 6a, 7a and 8a. While the common menus may be accessed from any of the interFACE modes, some of the menu commands provided by a given menu may only be available in certain, applicable interFACE modes. File Menu Referring now to FIG. 9a, the menu commands under the File Menu allow the user to view and modify the preference settings, access online help, import and export scrapbook files, print images, and exit (quit) from the interFACE system. The preferences command displays the Preferences window (FIG. 9f), which provides system settings for the interFACE system. The preferences window allows a user to view and modify the system settings. The Dressing Room screen 59 always opens to create a new synactor as specified by preferences or settings (that is, settings made to the parameters specified in the preferences window). To view or modify these default values, click on the file menu Synactor Info button. The Synactor Information window, FIG. 9i, will be displayed showing the synactor size default width and height, and the synactor type. The default width and height can be modified by typing a new value in the width and height fields. The screen representation 901 displayed on the right-hand side of the synactor information window illustrates the approximate size of the synactor as indicated in the width and height fields. The screen representation is initially set as a function of the host system display dimensions. The default width and height can also be altered by simply clicking and dragging the handle 903 on the synactor representation. As the size of the synactor is changed, the width and height fields 905 update automatically. The help menu command opens the Help mode window (shown in FIG. 9j), which contains a list of all available topics. Double-click on any topic to retrieve a brief description of the selected topic. When the context sensitive help menu command is selected, the cursor changes to the context sensitive help cursor as described above. The context sensitive help button 577 on the navigation panel 57 will also become highlighted. When in the context sensitive help mode a menu command or window button can be selected or clicked on to open the Help mode window related to that command or window. The Help mode window will appear, automatically opened to that item's help information. The Import from Scrapbook command opens a window prompting a user to select a scrapbook file for importing into the current synactor. The scrapbook file selected must be one that contains framed synactors created using the Export to Scrapbook command. This option is only available while in the Dressing Room 59. The Export to Scrapbook command displays the Export Images window (shown in FIG. 9g) allowing a user to place several or all of the current synactor images into a scrapbook file. The scrapbook file may then be used to easily move synactor images to other applications. When the export images window is displayed, first and last image numbers 907, 909 are placed in the From and To Fields, respectively. The values are selected to include only the images that are desired to be exported. A user can also select whether or not each synactor image is to be surrounded by a black border referred to as the frame. Frames are necessary to be able to automatically import images back into an synactor. The export to scrapbook command is only available while in the Dressing Room 59. The Print Images command displays the Print Image window (as shown in FIG. 9h). When the window is displayed, the current synactor image number is placed in the From and To Fields 911, 913, respectively. The values are adjusted to include only the synactor images desired to be print. A synactor image can be referenced either by its label or its number. A user can also specify whether or not a frame is to be printed around each synactor image. With the print and easel option selected, the easel will be placed around each synactor image in the printout (not shown). With the print and easel option deselected, only the synactor image will be printed. The print images dialogue command is only available while in the Dressing Room 59. The File menu quit command closes the interFACE application and exits to the host system. Edit Menu Referring now to FIG. 9b, the Edit Menu provides commands for adding, deleting, converting, resizing, and cropping images of the current synactor 85 while in the Dressing Room 59. The Cut command allows a user to remove selected graphic or text information. The removed information is be placed in the Clipboard. The Copy command allows a user to duplicate selected graphic or text information. The copied information is placed in the Clipboard. The Paste command allows a user to place the current graphic or text information from the Clipboard into a graphic or text field. The Clear Easel command erases the artwork currently on the easel 83 in the Dressing Room 59. This menu command is active only while in the Dressing Room. The Paste to Easel command transfers or copies the current graphic information from the Clipboard into the easel 83. This menu command is active only while in the Dressing Room. The Revert Image command restores the most recent image for the current image 85 being edited in the Dressing Room easel 83. The restored image will replace the current image in the easel 83. This menu command is active only while in the Dressing Room 59. The Add New Image command creates a new image and places it as the last image for the current synactor. After a new synactor image is added, the easel 13 deletes the current image and is set to display the added image. The control window 71 is also updated accordingly. A user can continue adding images to a synactor (that is, to the synactor file) until a predetermined maximum has been reached. In the preferred embodiment, the maximum limit is set at 120. In general, the number of allowable images is limited only by available memory. For a particular implementation, if the memory limit or the maximum limit is reached, the add new image menu command is disabled until synactor images are removed or additional memory is made available. The add new image feature is available only while in the Dressing Room 59. The Delete Last Image command removes the last image from the current synactor. Synactor images can be deleted until a predetermined minimum number of images remain. In the preferred embodiment, the minimum number of images for Standard or Extended synactors is 16 and for Coarticulated synactors is 32. When the minimum number of images remain, the delete last image menu command is disabled until a new image is added. This menu command is available only while in the Dressing Room 59. The Copy Rest to All command will copy the rest image of the current synactor to all of the current synactor's images. This command is available only in the Dressing Room 59. The Synactor Info command displays a window (as shown in FIG. 9i) used to change the current synactor setup of the Dressing Room 59. The synactor info window is similar to the Synactor Info menu command on the interFACE Preferences window (FIG. 9f), except that the displayed synactor information pertains to the current new synactor in the Dressing Room. Only new synactors created when the Dressing Room is initially opened or created by closing the current synactor can be modified utilizing the synactor info window. The attributes of a synactor having more than one image cannot be modified using the synactor info command. Once the first image is added to a synactor (by displaying a synactor image other than the rest image), the synactor info window command becomes disabled. The menu command is active only while in the Dressing Room 59. The synactor info window is also used in conjunction with the Preferences window (FIG. 9f) to set the attributes of the opening synactor. The Convert Synactor command opens a window which allows a user to convert the current synactor to a different type of synactor. For example, a Standard synactor can be changed to an Extended synactor using this command. This menu command is active only while in the Dressing Room 59. The Resize Synactor command opens a window which allows a user to change the size of the current synactor. Resizing a synactor will stretch or shrink all of the associated synactor images to the new specifications. This menu command is active only while in the Dressing Room 59. The Crop to Guides command allows a user to change the size of the current synactor. The crop to guides command differs from the resize synactor command in that it allows a user to add or subtract area from the current synactor. All existing images of the synactor will be automatically cropped to fit the new size. To crop a current synactor 85, first select the Show Guides menu command from the Dressing Room menu 69. Utilizing the margin controls of the dressing room window 67, position the guides 68 to new locations indicating the desired size and select the crop to guides command. As the interFACE system crops the current synactor, it will cycle through each image to display what portion of each image is being cropped. This menu command is only active while in the Dressing Room and while the guides 68 are visible. Go Menu Referring to FIG. 9c, the Go menu provides commands which allow the user to navigate among the interFACE basic modes, the Dressing Room 59, the Stage 63 and the Speech Synch Lab 61. The navigation window 57 may also be displayed or hidden from a Go menu command. Synactor Window Referring now to FIG. 9d, the Synactor window provides various commands for opening, closing, saving, and manipulating synactors in files. Special commands for navigating are also provided. The synactors menu is always available. The Open Synactor menu command displays a window (not shown) for selecting a synactor to be opened. In the Dressing Room 59, the selected synactor rest image will be displayed and the Dressing Room 59 will be set up accordingly. In the Speech Sync Lab 63 or the Stage 61, the selected synactor will be displayed. The Close Synactor command removes, deletes, the current synactor. Closing the current synactor while in the Dressing Room 59 will reset the Dressing Room for a new synactor. The close synactor menu command is not active while in the Stage 63 or the Sync Lab 61 because a synactor must always be displayed while in these modes. The Save Synactor command saves the current synactor to its file. The Save Synactor As command displays a window for storing the current synactor. In order to save a synactor, the name for the synactor that it is desired to save must be entered. The save synactor as command may also be used to save the current synactor under a new or different name. The Prey Image menu command displays the image immediately before the synactor's current image. The previous image is copied to the easel and the control window 71 is properly updated. The prey image command is available only while in the Dressing Room. The Next Image menu command displays the image immediately after the synaetor's current image. The next image is copied to the easel 83 and the control window 71 is properly updated. The next image command is available only while in the Dressing Room 59. The Go to Image command brings up the go to image window (not shown) which is used to immediately see an synactor's image. The control window 71 is automatically updated. This menu command is available only when in the Dressing Room 59. The Copy Synactor command displays a window (not shown) which enables a user to copy a synactor from a source file to a destination file. The copy synactor command is always available. The Delete Synactor command displays a window (not shown) which enables a user to remove a synactor from a file. The delete synactor command is always available. Sounds Menu Referring now to FIG. 9e, The Sounds Menu provides commands for manipulating sounds within the interFACE basic modes. Sounds can also be copied and deleted from files. The sounds menu is always available. The Open Sound menu command allows a user to select a sound to work with while in the Speech Sync Lab 61 (FIG. 8a). If there is a RECITE command currently in the recite window 811, the name of the selected sound will automatically be placed into that RECITE command. The open sound command is available only in the Speech Sync Lab 61. The Close Sound command removes the current sound from memory. If a RECITE command is currently in the recite window 811, the name of the sound will automatically be replaced with "??????" in the RECITE command. This indicates that there is no sound available to use with the RECITE command. In order to continue, a new sound will have to be opened. When a sound is opened, the command changes to Close Sound:soundNAME where "soundNAME" is the name of the open sound. This indicates which sound is currently open. This menu command is available only in the Speech Sync Lab 61 and only if a sound has already been opened. The Play Sound command will play the current sound file without any synchronized animation. This menu command is available only in the Speech Sync Lab 61 and only if a sound has been opened. The Record Sound displays an audio digitizer window 813 for recording sounds with the audio digitizer. The appearance of the window and the controls available to a user of the window will vary as a function of which audio digitizer is being used. The audio digitizer window 813 allows a user to control the recording process of sound. It includes simple commands and options for recording sound. After recording a sound, the recorded sound may immediately be used within the interFACE basic modes. The Record Sound command is available only in the Speech Sync Lab 61. The Copy Sound command allows a user to copy a sound from a source file to a destination file. The Delete Sound command allows a user to remove a sound from a file. Dressing Room Referring now particularly to FIG. 6, when the Dressing Room mode is entered or opened, the Dressing Room screen 59 is displayed along with the dressing room window 67 and dressing room control window 71. The Dressing Room menu 69 is a pull down menu which provides various commands for controlling Dressing Room features and access to various windows and tools for working with a synactor. The Dressing Room menu 69 is available only when in the Dressing Room 59 and is always displayed on the Dressing Room screen as shown. The Browse Tool command 691 changes the cursor to the browse tool icon 811. If the paint tools palette 81 is displayed, the icon 811 will be highlighted to indicate that the browse tool is currently selected. The browse tool allows a user to manipulate the Dressing Room easel 67 and guides 68. The Paint Tools command 693 displays the Paint Tools Palette 81. If the paint tools palette 81 is already displayed, the paint tools command 693 will close it. The paint tools palette 81 provides interFACE various paint tools for working with synactors. A check mark adjacent the paint tools menu command 693 indicates that the paint tools palette 81 is currently open and displayed. The Brush menu command 695 displays the Brush Palette 73. If the brush palette 73 is already visible, it will be closed. The brush palette 73 is used to select a brush shape for use with the Brush Tool 815. A check mark by the brush menu command indicates that the brush palette is currently open. The Color menu command 697 displays the Color Palette 75. If the color palette is already visible, it will be closed. The color palette 75 is used to select any of the available 256 colors to be used with the paint tools. The color menu command 697 and color palette 75 are only available when in 256 color mode. A check mark by the color menu command indicates that the color palette is currently open. The Line menu command 699 displays the Line Palette 79. If the line palette is already visible, it will be closed. The line palette 79 is used to select a line thickness for use with various paint tools. A check mark by the line menu command indicates that the line palette is currently open. The Pattern command 692 displays the Pattern Palette 77. If the pattern palette 77 is already visible, it will be closed. The pattern palette is used to select a pattern for use with various paint tools. A check mark by the pattern menu command indicates that the pattern palette is currently open. The Show Guides command 696 displays the horizontal and vertical pair of guides 68 in the Dressing Room window 67. The guides 68 are used as reference points for viewing differences/similarities between synactor images. The guides 68 are also used for cropping the images of a synactor. A guide 68 may be repositioned by clicking and dragging on it with the browse tool 811. Once a user clicks on a guide 68, the cursor changes to indicate that a guide 68 is being moved. Guides are positioned over the easel 83. The guides 68 remain at the last assigned location as a user navigates among the current synactor images. The show guides menu command 696 will change to Hide Guides if the guides 68 are visible. The hide guides menu command removes the horizontal and vertical pair of guides 68 from the Dressing Room window 67. As shown in FIG. 6, the guides 68 are illustrative only and, in the preferred embodiment, are not displayed when the show guides menu command 696 is displayed. The Fat Bits command 698 provides a window (not shown) which displays a magnified view of a portion of the synactors image 85 currently displayed on the easel 83, which the user can then modify a pixel at a time. The Set Background Easel command 6911 places the current synactor image 85 into the background easel (not shown). The Background Easel is displayed immediately behind the easel 83 and is used for comparing and contrasting synactor images. Any synactor image can be placed in the background easel. The synactor image displayed in the background easel allows a user to see and adjust changes and contrasts with the synactor image displayed in the easel 83. A striped bar (not shown) appears at the top of the Dressing Room window 67 indicating the presence of the background easel. When this command 6911 is selected, the background easel is automatically displayed and the Hide Background Easel menu command 6913 is enabled. The hide background easel command 6913 removes the background easel from the dressing room window 67. This menu command 6913 is not available until the background easel has been set with the set background easel command 6911. The Clip Art command 6915 displays the Clip Art menu (not shown) on the Dressing Room screen. The Clip Art menu is a pull down menu which is displayed adjacent the dressing room menu 69. The Clip Art window allows a user to choose and navigate among the various clip art windows provided in the interFACE Clip Art library. For example, the Clip Art menu default command is the "mouths" window (not shown) which provides a user with a selection of different mouth forms and configurations to use with a synactor. The Clip Art menu is available only in the Dressing Room 59. The Dressing Room Window 67 displays the Easel 83 in which art copied from a Clip Art window, for example, is placed and edited to create a synactor 85. The size and position on the screen of the dressing room window 67 may be altered at any time. The easel 83 is a drawing window or container located within the dressing room window 67. Paint tools are utilized to create artwork that is, to create or edit a synactor 85, within the easel. Imported art may also be placed within the easel. The border of the easel 83 is outlined and shadowed to indicate its dimensions and location within the dressing room window 67. The easel may be moved about within the dressing room window. Clicking on the easel with the browse tool 811 causes the cursor to change its shape thus indicating that the easel may be moved. The size of the easel 83 can be changed with the browse tool by clicking within the lower right-hand corner of the easel. When the cursor is within the lower right-hand corner of the easel, clicking with the mouse changes the cursor shape to indicate that easel size may be modified. The size of the easel can only be changed when working on a new synactor. The dressing room window 67 also includes two pairs of horizontal and vertical guides 68 which are used for image positioning and for cropping a synactor image. The guides 68 are controlled as described herein above by the Show/Hide Guide menu commands 696 under the dressing room menu 69. Once visible, the guides 68 can be moved by clicking and dragging on them with the Browse tool. When the cursor is positioned over a guide, clicking with the mouse changes the cursor shape indicating that guide 68 can be repositioned. The Dressing Room Control Window 71 allows a user to navigate among the images of the current synactor. Information concerning the current synactor (the synactor displayed on the easel 83) is displayed within the control window 71. The characteristics of a new synactor can also be changed from the control window. The control window 71 displays five fields which provide image and synactor information. The Image Label Field 715 displays the label for the current image; for example, the displayed lip position 711 is the "REST" lip position (which is also the default lip position). The Image Number Field 713 displays the number of the current image; the Width/Height Field 719 indicates the width and height of the current synactor; the Speaking/Total Image Number Field 719 indicates the number of speaking images and total number of images that are in the current synactor; and the Key Image Field 711 displays the associated lip position for each speaking image. For expression images, the title "Expression Image" appears in the key image field 711. The Control Window Image Scroll Bar 721 provides four ways to view the images of the current synactor. The left and right directional arrows allow a user to view the previous and next image. Clicking and holding the mouse down on either directional arrow will enable a user to cycle through the images in either direction. The image scroll bar 721 also has a thumb box which can be dragged either left or fight. Clicking on the gray area adjacent the thumb box advances the image proportionately. If the current synactor is new and has not yet been modified, clicking on the lower portion of the control window 71 calls up the synactor information window (FIG. 9j) in a manner similar to selecting the synactor infor menu command from the Edit Menu (FIG. 9b). The Paint Tools Window or Palette 81 provides various tools for editing artwork within the dressing room window 67. The paint tools palette can be closed or moved to any location on the Dressing Room screen. To select a paint tool, position the cursor over the desired tool with the mouse and click on its icon. The icon will become highlighted indicating that it is the current tool. The cursor will also be displayed as the icon of the current tool when positioned over the dressing room window 67. The Browse tool 811 allows a user to manipulate the easel 83 and guides 68, as described above. The Pointer tool 813 is used to select and move graphics within the Dressing Room 59. The Brush Window or Palette 73 enables a user to change the shape of the brush used with the Brush tool 815. The brush palette 73 can be closed or moved to any location on the Dressing Room screen. The Color Window or Palette 75 displays the 256 colors available when in color mode (as selected from the dressing room menu 69). The fill and pen colors can be selected from the color palette. A color is selected by positioning the color cursor 753 over the desired color and clicking with the mouse. An arrow marker 751 indicates whether a selected color is a fill or pen color. To change the color type, click on the desired color type graphic. As colors are selected, the type graphics change to the selected color. The color palette 75 also provides four special Color Memory buttons 75 which save color selections. When a color is selected, the top color memory button changes to the new selection. The next color memory buttons, in order, assume the color which the previous color memory button was set to. Selecting the Color Memory Command 694 from the dressing room menu 69 will disable the color memory buttons' ability to save new colors. In this manner, frequently used color can be easily accessed. The Pattern Window or Palette 77 displays 64 patterns for use with the dressing room paint tools. Fill and pen patterns can be selected with the pattern palette. The Stage Referring now to FIGS. 7a and 7b, the interFACE Stage mode 63 provides windows for testing synactor animation and speech synchronization using a voice provided by a speech synthesizer. A speech synthesizer window is also provided for adjusting the settings for the available speech synthesizer. The navigation palette 57 is also displayed on the Stage screen to provide access to the other interFACE screens from the stage. As described hereinabove, the Stage mode can be initiated from any screen in the system or as the startup screen when the interFACE program is initiated. If the Stage screen 63 is initiated from startup, an Open Synactor Window (not shown) will be displayed immediately after the interFACE startup screen to allow a user to select or open a synactor for the stage. If the Stage screen is selected from the Speech Sync Screen 61, the synactor currently selected will also be used in the stage screen 63. If the Stage is selected from the Dressing Room 59, the synactor currently being edited will be used in the stage screen 63. Even if the Dressing Room synactor 85 is only partially created, the Stage screen can be selected and that synactor will be transferred. This feature allows a user to test animation with speech as images are created. The transferred or selected synactor is displayed on the stage screen 63 in the synactor window 89. The Stage Menu 63 (FIG. 7b) is a pull down menu which is displayed on the stage screen and allows a user to select a Speech Synthesizer to provide a voice and to initiate the Express Window 86 and Text Window 87. The Stage Menu is available only while in the Stage 63. The Speech Synthesizer Window 88 is displayed on the Stage screen 63 when a specific speech synthesizer is selected from the stage menu 631. The speech synthesizer window 88 provides various controls adapted to the selected speech synthesizer which enable a user to change the pitch and speed rates and other selected characteristics of the speech synthesizer. The MacinTalk Window menu command 633 displays the MacinTalk speech synthesizer control panel in the speech synthesizer window 88. The MacinTalk control panel provides a pitch slide bar 881 and a speed slide bar 883 to change the pitch and speed, respectively, of the MacinTalk voice. The BrightTalk Window menu command 635 displays the BrightTalk speech synthesizer control panel (not shown) in the speech synthesizer window 88. The BrightTalk control panel is utilized to control the speech characteristics of the BrightTalk voice. The Speak Text menu command 639 initiates and displays the Stage Window 87 on the Stage screen 63. The stage window 87 allows a user to test a synactor's speech animation by making the current synactor 89 pronoun the text string entered in the Text Field 871 of the stage window 87. To test the current synactor 89 speech animation, the test text string for the synactor is entered in the stage window Text Field 87 via the keyboard. Then pressing or clicking on the talk button 873 causes the synactor 89 to pronoun the test text string in the stage window text field 871. The text will be either pronounced utilizing the MacinTalk or BrightTalk speech synthesizer depending on which was last selected from the stage menu 631. The speech animation test mode initiated by clicking on the talk button 873 causes the RAVER and RAVE drives to automatically segment the test text string and generate the appropriate RECITE command to display the synactor animation synchronized with the speech. The test text is segmented in accordance with the speech segmentation scheme utilized by the selected speech synthesizer. Differences between the selected speech synthesizer segmentation scheme and the RAVEL-defined segmentation scheme are automatically reconciled utilizing a voice reconciliation phoneme table, described in greater detail hereinbelow. The Express Window menu command 637 initiates and displays the Express Window 86 on the Stage screen 63. The stage window 86 is used to test the animation, without speech, of the displayed synactor 89. An EXPRESS command 861 is utilized to build a sequence of synactor images to produce animation. EXPRESS commands 861 can be constructed via the keyboard within the Express Field 863. An EXPRESS command comprises a sequential list or string of specified images, each image followed by the time that the image should be displayed. Timing values are entered in 60ths of a second. Pressing or clicking on the Test button 865 executes an EXPRESS command entered in the Express Field 863. When executed, the EXPRESS command causes the RAVE driver to execute and display the indicated animation with the current synactor 89. Once an EXPRESS command 861 has been executed, the last synactor image in the string 861 will be displayed in the synactor window 89. The Speech Sync Lab Referring now to FIGS. 8a and 8b, the interFACE Speech Sync lab 61 provides a variety of tools and methods for synchronizing animation with digitized sound or audio previously recorded on compact disc. As described hereinabove, the Speech Sync Lab can be initiated from any screen in the interFACE system or as the startup screen when the system is initiated. Similar to the Stage mode, a synactor must always be assigned when in the Speech Sync Lab. If the Speech Sync Lab 61 is initiated from startup, the Open Synactor Window is displayed initially to allow t open a synactor. If the Speech Sync Lab 61 is entered from one of the other modes, then the current synactor for the other mode becomes the current synactor for the Speech Sync Lab 61. Initiation of the Speech Sync Lab displays the Speech Sync Lab screen 61 as shown in FIG. 8a. The Speech Sync Lab screen 61 automatically displays the current synactor 815 when the Speech Sync Lab is entered. When the Speech Sync Lab is opened, the synactor 815 is displayed at top-center of the screen, but may be repositioned to any desired location with a mouse. The Text Window 817 and Recite Window 811 also open automatically when the Speech Sync Lab is entered. As in the other interFACE modes, the navigation palette 57 is also displayed. The primary purpose of the Speech Sync Lab 61 is to provide a capability for the user to create and edit, "fine tune", a RECITE command. The RECITE command 827 comprises the command or instruction to the RAVE driver to produce sound synchronized animation with a currently selected synactor and sound. The RECITE command includes the sound name 831 followed by a sequence of phonetic/time pairs 829. The phonetic/time pair comprises a phoneme label or name followed by a time value (in 1/60's of a second) which is the amount of time that a facial expression associated with the named phoneme is to be displayed. The phoneme name identifies both the sound segment and its associated facial expression for animation of the synactor. The sound name 831 identifies the sound resource to be used to pronoun the desired audio. The Speech Sync Lab screen 61 displays three windows provided for the purpose of creating and editing the RECITE command. The text window 817 displays the text which is to be spoken by the synactor. The phonetic window 819 displays a phonetic translation or conversion of the text in the Text Window 817. The recite window 811 displays the RECITE command derived from the text and its phonetic translation. The RECITE command 827 can be edited directly in the recite window 811 or indirectly in the phonetic window 819. The Speech Sync Menu 611 (FIG. 8b) is a pull down menu which is displayed on the Speech Sync Lab screen. The speech sync menu provides various menu commands for use while in the Speech Sync Lab 61. These commands allow a user to open and control the various speech synchronization windows and several special editing tools. The speech sync menu 611 is available only while in the Speech Sync Lab 61. The Text Window menu command 613 displays the text window 817 on the Speech Sync Lab screen 61. If the text window 817 is already open, then the text window command 613 closes the text window. The Text Window 817 provides a text field 833 to enter text via the keyboard as the first step of the speech synchronization process. The text convert button 821 is provided to transform the entered text information into either or both a phonetic string or a RECITE command. If the phonetic window 819 is open, the text convert button 821 will transform the text information into phonetics and display it in the phonetic window 819. The text button 821 also transforms the text information into the associated RECITE command and displays it in the recite window 811. Utilizing a mouse or other suitable means, a word or group of words can be selected from the text displayed in the text field 833 depressing the keyboard Return key will highlight the phoneme/time pairs in the RECITE command 827 corresponding to the selected text. If the Phonetics Window is open, the phonetic information corresponding to the selected text will also be highlighted. The Phonetic Window menu command 615 opens and displays the Phonetic Window 819 on the Speech Sync Lab screen 61. The phonetic window 819 is used to display and modify text from the text window 817 in phonetic form. The phonetic translation is displayed in the phonetic field and can be modified by adding, deleting or changing one or more entries in the phonetic string 835. As in the text window 817, a word or group of words can be selected from the phonetic field 835. Depressing the keyboard return key highlights the phoneme/time pairs in the RECITE command corresponding to the selected portions of the phonetic string. The phonetic convert button 823 transforms the displayed phonetic string to a RECITE command to be displayed in the recite window 811. Although this is not a necessary step, it does provide for more control in the speech synchronization process. The Recite Window 811 displays RECITE command 827 in the recite field. A RECITE command is created by clicking on the convert button in either the text or phonetic windows. In the recite window, the RECITE command can be modified and tested. Clicking on the recite window Test button 825 will play the entire RECITE command enabling the user to see and hear the sound and animation synchronization. Alternatively, selecting the Test menu command 617 runs the animation and sound synchronization specified in the RECITE command. The test menu command 617 is available only in the Speech Sync Lab and when a sound is open. Selecting a Phonetic/Timing Value Pair or group of Phonetic/Timing Value Pairs from the RECITE command and clicking on the recite test button 825 tests only the selected portion of the RECITE command. The RECITE command is edited or modified by changing either the phonetic name or the value of the time or both in the phonetic/time pair. Generally, the phonetic name will be correct and only the time value will require adjustment to correct for the duration of the phonetic image while displayed. Also, if a phoneme image is appearing too early or too late, the time values for phoneme/time pairs preceding the particular phonetic image will have to be adjusted. When one or more time values have been changed, the RECITE command must be resynchronized to adjust for the changed time values. Typically, only that portion of the RECITE command following the changed time value is resynchronized. To resynchronize the RECITE command, the cursor is inserted in the phonetic/time pair sequence immediately after the changed time value and the keyboard return key is depressed. The RECITE command from the point of insertion of the cursor to the end of the sequence (left to right) will be resynchronized. If only a portion of the RECITE command is to be resynchronized, a stop character, ".", is inserted in the RECITE command to designate the right-most end of the resynchronization process. The Insert Stop menu command 625 inserts the stop command at the point in the RECITE command that the cursor has been inserted. The RECITE command requires a sound name 831. The sound name identifies the sound resource which is to be used to provide the audio for the synactor. If a sound has not been opened, "????" appears rather than a sound name appears in the RECITE command. A sound resource, that is, a prerecorded, digitized sound stored in a file in RAM or ROM is opened from the Sounds Menu (FIG. 9e) described hereinabove. If desired, rather than selecting a sound from existing sound files, a new sound can be recorded utilizing a sound digitizer. The Record Sound command from the Sounds Menu opens and displays the Digitizer Window 813 on the Speech Sync Lab screen 61. The digitizer window provides the various controls for the audio digitizer to allow a user to record a new sound for use with interFACE. (MacinTalk, a sound digitizer available from Farallon Computing, Inc. is suitable for this purpose.) The Sampling Rate box 833 provides buttons to select from four sampling rates or to select a 4:1 compression ratio for sound recording. Sound recordings may be sampled at 22, 11, 7, and 5 KHz. Port Selector buttons are provided to designate which port the digitizer is currently connected to. The Message Field displays vital information and messages used in the recording process. The amount of time for the recording is specified (entered) in the Record Time Field 835. As a recording proceeds, the Record Time Bar 837 fills to indicate progress of the recording. The Mouse Interrupt box 839 provides the capability to interrupt or stop a recording with the mouse at any time. If mouse interrupt is selected, moving or clicking on the mouse stops the recording process. If the Mouse Interrupt is not selected, the recording will record for the entire amount of time specified in the record time field 835. Selecting the Auto Trim box 841 automatically clips any silence from the beginning and end of the sound after it is recorded. With Auto Trim, the resulting sound may be shorter than the time specified in the Record Time Field because of the silence removed. If Auto Trim is not selected, the entire sound including the silence is saved. To record a sound the name of the sound is entered in the Sound Name Field 837. To begin the recording process, Click on the Record button. The message "Click to Begin" will appear in the Message Field. Then clicking on the mouse button commences the recording. The recording is terminated when the recording time ends or when a mouse movement or click interrupts the recording process if the mouse interrupt by 839 is selected. After the sound is recorded, the total length of the recorded sound is displayed in the Message Field. To play back the recorded sound, click on the Play button. The recorded sound becomes the current sound for use with the RECITE command and is automatically saved to RAM or ROM under the sound name. Referring now to FIG. 10, a functional block diagram of the internal data structures comprising a synactor model are illustrated. The various data structures are compiled from the RAVEL source program and are stored in RAM 20 to provide the RAVE driver with sufficient data to implement its functions. A dynamically allocated synactor model is the basis for these data structures and contains one or more synactor model block records 129, one model block record for each synactor model which has been defined. The synactor model block record 129 defined for each synactor is included in that synactor's file stored in memory 39. Table VIII below lists the various fields included in the synactor model block record 129. Table Ill(a) provides a short definition for each field listed in Table VIII. TABLE VIII Model Block Record Owner Rules Phonemes Phonemes Count Syncopations Syncopations Count Lip Positions Lip Positions Count Betweens Betweens Count Width Height Model Flags Position Pointers Face Top Left Unused Pointer Bias Base Unused Pointer #2 In Memory Flag Depth Actor Type Coarticulatrons Coarticulatrons Count Coart Types Clut Pointer TABLE III(a) Model Block Description Owner--what type of voice is used by this model Rules--location of the rules table Phonemes--location of the phonemes Phonemes Count--how many phonemes used by the Model Syncopations--location of the block of solo phoneme sound Syncopations Count--how many syncopatrons present Lip Positions--location of lip position block Lip Positions Count--how many lip positions Betweens--location of betweens face table Betweens Count--how many betweens present Width--width in pixels of the model Height--height of the model in pixels Model Flags--Model status flags Position Pointers--locations of the faces Face Top Left--upper left corner of face Unused Pointer--no longer used Bias Base--pointer to beginning of model block Unused Pointer #2--no longer used In Memory Flag--indicates whether or not the model is loaded into memory Depth--Number of bits per pixel Actor Types--indicates how many speaking/total faces present Coartrculatrons--location of coartrculatrons block Coartrculatrons Count--number of coartrculatrons Coart Types--location of coart type information CLUT pointer--pointer to table of color values used by this With continuing reference to FIG. 10, the synactor internal data structure comprises a voice table 121, a dynamic synactor record (DAR) 125 and the above described synactor model block record 129. The voice table 121 is a variable size table comprising a list of "handles" to DAR records. The voice table automatically increases or decreases in size to provide enough space for the number of synactor models currently in use. Each DAR handle describes the location of a DAR pointer 123, which holds the location of a corresponding DAR. The DAR stores information concerning the current state of a synactor model while the synactor model is stored in memory. The DAR is created when a synactor model is first read (saved) into memory and is deallocated when the synactor model is no longer being used. One of the data fields in the DAR is a model handle which holds the location of a model pointer. The model pointer points to an associated synactor model block record 129. Table IX below is a list of the data fields contained in a DAR. Table III(b) provides a short description of each DAR field. TABLE IX Actor Name Model Handle Resource File Face Window Restring Face Window Status Layer Locked Purgeable Frozen Tred To Which Face Top Left Speed Pitch Volume Stored Depth Current Width Current Height Current Depth Visible TABLE III(b) Dynamic Actor Record Description Actor Name--Name of the actor that Model Handle points to Model Handle--Location of the Model Block Record Resource File--ID of file which contains Model Block Record Face Window--Pointer to the Model's Window Resting Face--Which face to show when at rest Window Status--Code which determines if model is associated with a window on the screen Layer--which layer of windows is the Model associated with Locked--True if the Model is locked in memory Purgeable--True is the Model can be removed from memory Frozen--True if the Model cannot be dragged on screen Tied--True if the Model moves when its window is moved To Which--Pointer to window which Model is associated with Face Top Left--Upper-left corner of face Speed--Current speaking speed Pitch--Current speaking pitch Volume--Current speaking volume Stored Depth--original depth of this model Current Width--Current width of this model Current Height--Current height of this model Current Depth--Current depth of this model Visible--True if the Model is visible on the screen The RAVE driver access individual synactor files via the voice table 121. Similarly the RAVER driver obtains the location of the current synactor model DAR via the RAVE driver whenever a user is utilizing the RAVER driver to modify the current synactor model. The RAVE and RAVER drivers comprise two parts, each having different functionality. The first driver/functionality is editing of synactors and editing of the sound synchronization. The second driver/functionality is to bring life to a synactor. The RAVER driver is concerned almost exclusively with creating and editing synactor models and speech synchronization while the RAVE driver is responsive to the RECITE command to pronoun the sound and to display the animated synactor in synchrony with the audio. Table IV and Table V provide a list of RAVE commands and a summary of the RAVE commands, respectively. Table VI(a) provides a brief description of the RAVER commands and Table VI(b) provides a summary of the RAVER commands.
TABLE IV
__________________________________________________________________________
RAVE Command Table
COMMAND PARAMETER
RETURN
A B C D E F
__________________________________________________________________________
ACTOR name, location
boolean .check mark.
ACTOR.sub.-- INFO 5 items
.check mark.
.check mark.
CLEAR.sub.-- ACTORS boolean
EXPRESS string boolean
.check mark.
.check mark.
.check mark.
FREEZE default boolean .check mark.
.check mark.
GET.sub.-- SNAP.sub.-- VALUE
number
HIDE name boolean
.check mark.
.check mark.
.check mark.
HIDE.sub.-- COPYRIGHT boolean .check mark.
INTERMISSION boolean .check mark.
.check mark.
INTERRUPTIBLE boolean .check mark.
LIST resource type
list
LOCK default boolean .check mark.
.check mark.
MAKE.sub.-- SNAP.sub.-- WINDOW
boolean
MOVE location
boolean
.check mark.
.check mark.
.check mark.
NEW.sub.-- REST
label boolean
.check mark.
.check mark.
PHONETIC phonetic text
boolean
.check mark.
.check mark.
.check mark.
PITCH number boolean .check mark.
.check mark.
RECITE string boolean .check mark.
.check mark.
RECITE.sub.-- PARTIAL
string boolean .check mark.
.check mark.
RETIRE name boolean
.check mark.
.check mark.
.check mark.
SCREEN.sub.-- RELATIVE
default boolean .check mark.
.check mark.
SHOW name boolean
.check mark.
.check mark.
.check mark.
SPEED number boolean .check mark.
.check mark.
STATUS 6 items
text string boolean
.check mark.
.check mark.
.check mark.
TIE default boolean .check mark.
.check mark.
TOP.sub.-- LAYER
default boolean .check mark.
.check mark.
UNFREEZE default boolean .check mark.
.check mark.
UNINTERRUPTIBLE boolean .check mark.
UNLOCK boolean .check mark.
.check mark.
UNTIE default boolean .check mark.
.check mark.
UPDATE name boolean
.check mark.
.check mark.
.check mark.
USE name boolean
.check mark.
.check mark.
VISABLE boolean
.check mark.
.check mark.
WINDOW.sub.-- LAYER
default boolean .check mark.
.check mark.
WINDOWLESS boolean
WINDOW.sub.-- RELATIVE
default boolean .check mark.
.check mark.
__________________________________________________________________________
A Requires actor.
B Works on current actor.
C Works on named actor.
D Sets default.
E Global setting.
F Animates.
TABLE VI(a) RAVER Commands Summary AUTOTRIM--trim leading and trailing silence from a recorded sound BLOCKPTR--give driver access to the caller's data and routines CLOSEACTORFILE--close the currently open actor file CLOSESOUNDFILE--close the currently open sound file CONVERT--convert a text string to phonetic equivalents COPY--copy an image COPYACTOR--copy an actor resource from one file to another COPYEMPTY--erase the current face, and the easel graphic COPYSOUND--copy a sound resource from one file to another CUT--cut an image DELETEACTOR--remove an actor resource from a file DELETESOUND--remove a sound resource from a file DIGIMAKE--convert a phonetic string into a recite string DIGIMAKE2--convert a phonetic string into a recite string EDITINFO--provide info about current actor: size, #faces, #speaking faces EXCHANGE--get access to rave driver actor info EXPORT--copy faces from an actor into a scrapbook file FNORD--install a patch routine for debugging GETEXCHANGE--get access to RAVE driver actor info for debugging purposes GET.sub.-- REF--tell the caller which file to get the actor resource from GRAPHIC.sub.-- EMPTY--return true if the specified graphic is empty IMPORT--read in images from a scrapbook file OPENACTORFILE--open a file and use actors from that file OPENSOUNDFILE--open a file and use sounds from that file PASTE--paste an image from the clipboard into the actor PATCH--install a patch routine RECORDEDSOUND--close previous sound file, and tell driver to use new one REST2ALL--copy the rest image to all the images SAVE--save the current actor to disk SAVEAS--choose a file and a name, and save the actor to them SYNCSOUND--prep sound for speech sync
TABLE VI(b)
__________________________________________________________________________
RAVE Commands Summary
COMMAND NAME
PARAMETERS RETURN VALUE
__________________________________________________________________________
AUTOTRIM # T/F
BLOCKPTR # T/F
CLOSEACTORFILE
none T/F
CLOSESOUNDFILE
none T/F
CONVERT text string phonetic string
COPY new/next image # T/F
COPYACTOR none T/F
COPYEMPTY none T/F
COPYSOUND none T/F
CUT none T/F
DELETEACTOR name T/F
DELETESOUND name T/F
DIGIMAKE snd name, image/timing value string
image/timing value string
DIGIMAKE2 snd name,#,image/timeing value string
image/timing value string
EDITINFO none spk#,surFace,tot,l,t,w,h,d,s
EXCHANGE none/# T/F
EXPORT none T/F
FNORD on/off/# T/F
GETEXCHANGE none T/F
GET.sub.-- REF
actor/sound #
GRAPHIC.sub.-- EMPTY
none T/F
IMPORT none T/F
OPENACTORFILE
none T/F
OPENSOUNDFILE
none T/F
PASTE image # T/F
PATCH text string T/F
RECORDEDSOUND
name T/F
REST2ALL none T/F
SAVE name T/F
SAVEAS name T/F
SYNCSOUND none T/F
__________________________________________________________________________
The preferred embodiment of the present invention provides several modes for generating speech synchronized animation. The basic mode for speech synchronization is referred to as phonetic proportionality. The phonetic proportionality method is the primary mode for speech synchronization utilized in the Speech Sync Lab 61 and is described briefly below. A more detailed, complete description is provided in the above referenced U.S. patent and copending U.S. patent applications incorporated herein. A user enters desired text which is to be pronounced by a synactor. A corresponding prerecorded, digitized sound resource is retrieved from RAM or ROM and it's length in time ticks (60ths of a second) is measured and stored. Then, utilizing a phonetic string that corresponds to the entered text (and the recorded sound), the corresponding sound utterances from the sound resource are converted to a list of phocodes. The phocodes thus derived are then mapped to a look up table of relative timing values which provides a time value for each associated synactor face or position image. The timing value table can be coded in the interFACE program for general use or generated from the RAVEL file using extensions to the RAVEL language to provide a unique voice for a synactor model. Thus synactors can be provided with voices having varying accents, drawls and other speech mannerisms or languages. During the speech synchronization process, this table is utilized to look up timing values for each phocode. Each line or entry in the table represents a phocode and its associated relative timing value. The first line is all null characters and is used as a place holder so that the indexing of phocodes will be useful numbers. The first character is the first letter corresponding to the phocode, the second character is the second letter corresponding to the phocode, if there is one, or an end of string character. The third character in each line is an end of string character for the two-letter phocodes or a space filler. The fourth character is the relative timing value associated with that phocode. The last line is again all null to mark the end of the table. Once the two parallel lists, phocodes and relative timings, and the length of the associated sound have been established, the actual process of synchronizing the speech to the animation is initiated by refining the timing list. The first step is to calculate the sum of all the values in the timing list. The sum is then compared to the sound length. The timing value given to each phocode is then adjusted proportionately with the compared sums and rounded to whole numbers. If the total of the timings is less than the sound length, then the timings are incremented until the total of the timings match the total sound length. If the total of the timings is greater than the total sound length, then the timings are similarly decremented until they match. This is done to deal with cumulative rounding errors because the timings must be integer values so the RAVE real time phase can operate. The result is a list of phocodes and associated timing values which represent the synchronization of the synactor facial images to the corresponding sound. To create a RECITE command, the phocodes are used again to look up the corresponding phonetic codes to form phonetic code/time value pairs. The RECITE command will coordinate the actual sound/motion combination. The user can edit the command on the screen to fine tune it, test it, and edit it more until it looks satisfactory. (Editing is particularly useful for unusually-timed speech segments, for example, with one word pronounced more slowly or differently, or with silences or throat clearings not reflected in the text and/or not amenable to speech recognition.) To help a user fine tune the RECITE command, the interFACE system provides methods, described hereinabove, to isolate, test, and/or programmatically resynchronize individual portions of the sound and animation. In the Stage mode 63, the voice utilized for a synactor in the sound-animation synchronization process is provided by an independent voice synthesizer rather than by the prerecorded, digitized RAVEL sound resources. In the preferred embodiment, the interFACE system supports both the MacinTalk and the BrightTalk speech synthesizers. Since a given voice synthesizer generates a unique set of phonemes particular to that voice synthesizer, the set of phonemes utilized by the interFACE system for a synactor model may be different and therefore one or more phonemes generated by the voice synactor corresponding to a text string may not be recognized by the interFACE system. The RAVE driver is synchronized with any voice synthesizer by creating two tables for a particular voice synthesizer in use. The first table, referred to as the voice reconciliation phoneme table, represents all phonemes that the particular voice, that is, the voice synthesizer, is capable of generating. The second table, referred to as the generic phoneme table, is similar to the voice reconciliation phoneme table except that it substitutes equivalent phonemes from the synactor model resource in place of unrecognized phonemes generated by the voice synthesizer. The entries in the genetic phoneme table are in the same order as the corresponding entries in the voice reconciliation table to provide a one-to-one mapping between the two tables. These two tables are stored as resources and adapted for each unique voice synthesizer supported by the interFACE system. When generating speech synchronized animation using a voice synthesizer, the phoneme table for the synactor model, the voice reconciliation table, and the generic phoneme table are used to create a runtime reconciled phocode table that enables the interFACE system to work with any voice synthesizer without the need to modify the existing synactor phoneme table. The runtime reconciled phocode table is allocated the same size as the voice reconciliation phoneme and genetic phoneme tables so that there will be a one-to-one correspondence between all three tables. Each phoneme in the genetic phoneme table is looked up in the synactor model phoneme table to get its equivalent phocode. The phocode position is then stored in the runtime reconciled phocode table at the same relative position as its phoneme equivalent in the generic phoneme table. Thus, when using a voice synthesizer instead of looking up phonemes in the synactor model phoneme table to get equivalent phocodes, phonemes will be looked up in the voice reconciliation phoneme table to get their positions in the runtime reconciled phocode table which, in turn, provides the phocode position in the synactor model phoneme table used for the interFACE process. FIG. 17 illustrates a voice reconciliation phoneme table and a genetic phoneme table. The interFACE system allows a user to create three different classes of synactors. These classes are referred to as synactor types and are differentiated by the number of assigned speaking images. The three types of synactors are referred to as Standard, Extended and Coarticulated. Each synactor type provides an increased number of speaking images over the previous type. The most simple synactor type, standard, provides reasonable animation ability with minimal user effort and minimal system memory usage. A standard synactor provides eight images devoted to speaking and at least eight expression images. The total minimum number of images for a standard synactor is 16. The extended synactor type is a middle track between the standard and coarticulated types. Seven additional images devoted to speaking are added to the images provided by the standard synactor type to create the extended type. The extended type, therefore, provides nearly twice as many speaking images. The additional images enable synactors to provide more realistical speech animation than standard types. Extended synactors also require at least one expression image to provide the minimum 16 of synactor images. The most complicated synactor that can be created in the interFACE system is the coarticulate synactor type. Coarticulated synactors provide 32 speaking images. For speaking, coarticulated synactors provide the best possible animation because each speaking image varies as a function of the phoneme context of the sound utterance as well as its primary associated phoneme. To create more natural animation, the RAVE driver includes facilities to handle variations in facial positioning that occur as a result of coarticulation. Coarticulatory patterns of speech exist when two or more speech sound overlap such that their articulartory gestures occur simultaneously. To some extent, this affects practically all natural speech sounds. A major effect is of the lip, jaw, and tongue position of various vowels on the articulator's position for consonant production. A dramatic example of this is to compare the lip configuration in forming the consonant "b" when it is in the following vowel environments, for example, "eebee" compared with "ooboo". There are two major types of coarticulation, both of which are operating at the same time. Inertial Coarticulation is the result of the articulatory apparatus (i.e., lips and face) being a mechanical system, i.e. "mechanical slop". The articulator positions for a previous sound are retained and affect the articulator positions of the next sound. Anticipatory coarticulation is the result of neural control and preplanning for increased efficiency and speeds of articulator movement, i.e., natural articulators are controlled in parallel. The articulator positions for the target sound are affected by the anticipated positions for the next sound. Rather than adding new images to provide more complex animation (as was done with the extended type), the coarticulated synactor type provides variations on existing consonant face positions. The two variations are known as retracted and protruded images and are generated by folding or merging existing facial images to reduce the overall number of images required to be stored in the system. The coarticulated synactor type does not require any default expression images. Any number of expression images may be added while in the Dressing Room 59. Referring to FIG. 11, exemplary spealdng images are illustrated. A speaking or key image for each synactor speaking position is provided along with sample words and suggestions for use of each key image. A complete listing of available speaking images for the preferred embodiment is provided in the "User's Guide" previously incorporated by reference. Table X, below, provides a synactor overview which summarizes which speaking images are available for use with each of above-described synactor types. For example, the REST image (#1) is used with all three synactor types: standard(s), extended (E) and coarticulated (C). By contrast, the OH image (#9) is available only with the extended (E) and the coarticulated (C) synactor types.
TABLE X
______________________________________
# Image S E C
______________________________________
1 Rest X X X
2 F X X X
3 M X X X
4 R X X X
5 W X X X
6 IH X X X
7 AH X X X
8 E X X X
9 OH X X
10 S X X
11 D X X
12 T X X
13 C X X
14 L X X
15 K X X
16 F< X
17 M< X
18 R< X
19 S< X
20 D< X
21 T< X
22 C< X
23 L< X
24 K< X
25 F> X
26 M> X
27 S> X
28 D> X
29 T> X
30 C> X
31 L> X
32 K> X
______________________________________
Table VII provides a listing of short phases which facilitates easy extraction of the proper speaking images from a sound (or video) recording.
TABLE VII
______________________________________
The Word List
______________________________________
##STR1##
##STR2##
##STR3##
##STR4##
##STR5##
##STR6##
##STR7##
##STR8##
##STR9##
##STR10##
______________________________________
The words and phrases provide instances of all lip positions required in the standard, extended and coarticulated synactor types. The corresponding speaking image number for the desired lip position is listed above the appropriate speech or word segment. RAVEL language source code providing synactor model definitions for each synactor type described above are provided in FIGS. 14-16. Graphical Speech Segmentation Referring now to FIG. 12, the Graphical Use Interface for Speech Synchronization (GUI speech sync mode) provides a window 110 in which the computer speech synchronization process is displayed. The GUI speech sync mode is initiated by a user from the Speech Sync Lab 61 via the speech sync menu (as shown in FIG. 8b). The GUI speech sync mode provides an alternate method for creating and editing the RECITE command. Utilizing the GUI speech sync window 110, the user can visually edit the RECITE command to provide highly accurate speech synchronization. The GUI Speech Sync window displays a digital representation 1101 of the current sound. The digital sound wave representation 1101 is generated by the system microprocessor utilizing well-known analytical techniques. Beneath the digital sound display 1101 a Phoneme Field 1103 is provided which illustrates the spatial relationship between phonemes (which represent facial images) and the components of the displayed digital sound wave 1101. A text field 1105 immediately below the phoneme field 1103 displays the word or words the displayed digital wave 1101 and phonemes represent for easy identification. The GUI speech sync window 110 also includes a standard scroll bar 1107 for working with sounds that are too large to represented entirely within the window. Beneath the scroll bar are two fields 1109, 1111 for entering the text string representation of the sound and for displaying the RECITE command, respectively. The RECITE command can be edited or modified either by changing the timing values in the command or by using the cursor to adjust the position of the phonemes displayed in the phoneme field 1103 with respect to the associated digital acoustic wave representation 1101. A change in either field will cause the other field to update appropriately. The user may also request bar indicators to be drawn from the location of the phoneroes on the acoustic wave representation. Using a mouse or other suitable input means, a user can select any component of the above described fields. Any such selection will cause corresponding selections to appear in the remaining four fields. For example, when a word or character string is selected from the entered text in the text field 1109, the RAVER driver automatically displays the corresponding digital representation and computes and displays the phonemes and the RECITE command. The RECITE command timing values can then be edited to provide more accurate synchronization. Clicking on the play button 1121 with a selection active will play/animate that selection. Two modes of speech synchronization are provided by the GUI Speech Sync window: fast and slow automatic speech synchronization. The user selects the desired speech synchronization mode utilizing the "turtle" or "hare" icon, 1115 or 1117, respectively, or other suitably designated icon, from the speech sync control window 1113. Fast synchronization is quick, but not accurate. Slow synchronization is slow but yields better results. The GUI speech sync window 110 provides the capability for a user to work with RAVE sound resources stored in RAM or ROM or externally available sounds recorded on industry standard compact disc (CD) or other storage media. Sound samples provided by an external source must be provided in acceptable digital format in order for the processor to generate the digital audio signal 1101 for display by the GUI screen 110. Prerecorded or real time sound samples may be input via the audio input and digitizer circuits 9 and 8, respectively. Alternatively, digitized sound may be directly input via an audio digitizer 6, such as a MacRecorder, under control of the Speech Sync Lab 61. A record button 1119 enables the user to record and synchronize sounds directly within the GUI Speech Sync window 110. Multiple GUI Speech Sync windows may be opened when working with more than one sound or types of sound. Dendrogramic Segmentation Referring now to FIGS. 13a-13e, digitized speech may also be synchronized with phonetic images utilizing techniques referred to as "speech segmentation" from the art of signal processing. One technique which has provided good results with respect to the task of phonetic recognition utilizes a dendrogramic representation to map segments of an acoustical signal to a set of phonological units, phonemes, for example. Utilizing dendrogramic methodology, a continuous speech signal is segmented into a discrete set of acoustic events. The segmentation process is then followed by a labelling process, that is, assigning phonetic labels, phonemes, to each of the segments. Such dendrogramic representations are well-described in the literature, for example, see Glass, J. R. and Zue, V. W., "1988 International Conference on Acoustics, Speech, and Signal Processing" (ISASSP 88), Vol. 1, pages 429-432. Glass et al and others represented the speech signal by Discrete Fourier Transforms. Utilizing Auto-Regressive parameterizations (LPC) as a basis, good results have been obtained building dendrograms to perform both manual and automatic speech segmentation. Speech segmentation provided by such dendrogramic methods is readily adaptable to speech/animation synchronization systems. In the interFACE system, the dendrogramic methology described above has been implemented to provide an alternate mode of speech synchronization. The Dendrographic mode of speech synchronization is initiated by a user from the Speech Sync Lab 61 via the "Dendrogram" command 631 provided by the speech sync menu 611 (as shown in FIG. 8b). When the Dendrogram mode is selected, the desired speech utterance is automatically segmented via the dendrogram process by the microprocessor utilizing well-known algorithms. Once the various phonemes are identified, the RAVER driver computes the corresponding RECITE command with the time values for each phonetic code derived from the relative position in the dendrogram of the associated phoneme with respect to the overall time for the sound utterance respected by the horizontal axis of the dendrogram. Essentially, for each sound utterance to be segmented, a digital sound sample is first sampled at 11 kHz. Next, an 8-parameter PARCOR representation is calculated every 1/120 second using a triangular window 1/60 second wide. The PARCOR parameters are then converted to log-area-ratios and a dendrogram is built using a standard Eucidean norm to associate the resulting representational vectors. Referring now to FIG. 13a, a screen display of a dendrogram derived from a recording of the Japanese sound utterance "nagasaki" is illustrated. The dendrogram contains nodes (represented as variously sized, shaded rectangles) corresponding to various subsections of the sound utterance, including the desired phonetic images. By selecting relatively simply choice-making commands provided by a Dendrogram Menu (not shown) a user can manually trace a path across the dendrogram corresponding to the phonetic stream present in the sound utterance. The user can listen to any node by selecting it and initiating a "Play Node" menu command. The user can force the path across the diagram to flow through the selected node via a "Use Node" menu command. The phoneme stream can also be modified to delete poorly articulated phonemes or to include additional phonemes with an "Edit Labels" menu command. The user can then convert the labeled path into the RECITE image/timing pair string that can then be tested and fine-tuned for use with the sound by using "Make Recite" and "Edit Recite" menu commands. Referring now to FIG. 13b, a dendrogram diagram part way through the manual labeling process is shown. The user has already identified the nodes representing the phonemes "QX", "N", "AA", "G" and "AA". By using the "Play Node" command the user has determined that the selected node (the node with the thick black border) 113 is the terminal "IY" phoneme in "nagasaki". The user now initiates a Use Node command which instructs the RAVER driver to change the path so that it flows through the selected node. This forces a number of other nodes to be automatically added to the diagram with the results as shown in FIG. 11a. As a result of this choice the rest of the diagram is now correctly labelled. Referring now to FIG. 13c, a screen display of the REC | ||||||
