APPLICATION PROGRAM INTERFACE (API)

Playing audio files at high priority

6354748

Abstract

An application programming interface (API), implemented on a general-purpose host processor, receives functions calls from an application to control the play of an audio file. The API translates the function calls into host-to-board messages and transmits the host-to-board messages to an audio task, implemented on a digital signal processor. The audio task plays the audio file based on the host-to-board messages received from the API at a higher priority than one or more other audio files. The API and audio task allow the application to select a Microsoft.RTM.-standard Wave file for play at a higher priority than other Wave files. By using the API, the application has control of the number of times the file is played back and when playing stops. The application gets notification of playback complete from the audio task via the API.


Claims

What is claimed is:

1. A computer-based subsystem for processing audio signals, comprising:

(a) an application programming interface (API); and

(b) an audio task, wherein:

the API is implemented on a general-purpose host processor of a computer system;

the audio task is implemented on a digital signal processor of the computer system;

the API receives one or more function calls from an application to control the play of an audio file;

the application is implemented on the general-purpose host processor;

the audio file is stored in a storage device of the computer system;

the API translates the function calls into one or more host-to-board messages;

the API transmits the host-to-board messages to the audio task; and

the audio task plays the audio file based on the host-to-board messages received from the API at a higher priority than one or more other audio files preempting playback of the other audio files.

2. The subsystem of claim 1, wherein the audio file is a Microsoft.RTM.-standard Wave file.

3. The subsystem of claim 1, wherein the API is a Microsoft.RTM. Windows.TM. operating system dynamic link library.

4. The subsystem of claim 1, wherein:

the host-to-board messages comprises an open-file message;

the open-file message indicates a path name for the audio file;

the audio task opens the audio file in response to the open-file message received from the API; and

the audio task transmits a callback message to the API in response to the open-file message.

5. The subsystem of claim 4, wherein:

if the open-file message is successful, then the callback message indicates that the audio file is valid; and

if the open-file message is unsuccessful, then the callback message indicates one of:

(1) the audio file cannot be opened;

(2) the audio file comprises unexpected file data;

(3) the audio file comprises data of an invalid audio type;

(4) the audio file is of an invalid file type;

(5) the audio file comprises a bad format type;

(6) the audio file comprises a bad header type;

(7) an on-board resource error occurred;

(8) the audio file comprises data at an unsupported rate;

(9) the audio file comprises data at an unsupported sample size; and

(10) a high-priority audio channel is already in use.

6. The subsystem of claim 1, wherein:

the host-to-board messages comprises a play-file message;

the play-file message comprises:

(1) a first parameter indicating a path name for the audio file;

(2) a second parameter indicating how many times to play the audio file;

(3) a third parameter;

(4) a fourth parameter indicating a volume level at which to play the audio file; and

(5) a fifth parameter indicating an output device for playing the audio file; and

the audio task plays the audio file in response to the play-file message received from the API.

7. The subsystem of claim 6, wherein:

the third parameter is ignored;

a value of 1 for the fifth parameter indicates that the output device is an on-board speaker;

a value of 2 for the fifth parameter indicates that the output device is a line-out device; and

a value of 4 for the fifth parameter indicates that the output device is a head phone device.

8. The subsystem of claim 1, wherein:

the host-to-board messages comprises a stop-file message; and

the audio task stops play of the audio file in response to the stop-file message received from the API.

9. The subsystem of claim 1, wherein the audio task transmits a play-done message to the API at completion of playing of the audio file.

10. The subsystem of claim 1, wherein:

the audio file is a Microsoft.RTM.-standard Wave file;

the API is a Microsoft.RTM. Windows.TM. operating system dynamic link library;

the host-to-board messages comprises an open-file message;

the open-file message indicates a path name for the audio file;

the audio task opens the audio file in response to the open-file message received from the API;

the audio task transmits a callback message to the API in response to the open-file message;

the host-to-board messages comprises a play-file message;

the play-file message comprises:

(1) a first parameter indicating the path name for the audio file;

(2) a second parameter indicating how many times to play the audio file;

(3) a third parameter;

(4) a fourth parameter indicating a volume level at which to play the audio file; and

(5) a fifth parameter indicating an output device for playing the audio file;

the audio task plays the audio file in response to the play-file message received from the API;

the host-to-board messages comprises a stop-file message;

the audio task stops play of the audio file in response to the stop-file message received from the API; and

the audio task transmits a play-done message to the API at completion of playing of the audio file.

11. A computer-implemented process for processing audio signals, comprising the steps of:

(a) receiving, by an application programming interface (API), one or more function calls from an application to control the play of an audio file, wherein:

the API is implemented on a general-purpose host processor of a computer system;

the application is implemented on the general-purpose host processor; and

the audio file is stored in a storage device of the computer system;

(b) translating, by the API, the function calls into one or more host-to-board messages;

(c) transmitting, by the API, the host-to-board messages to an audio task, wherein the audio task is implemented on a digital signal processor of the computer system; and

(d) playing, by the audio task, the audio file based on the host-to-board messages received from the API at a higher priority than one or more other audio files preempting playback of the other audio files.

12. The process of claim 11, wherein the audio file is a Microsoft.RTM.-standard Wave file.

13. The process of claim 11, wherein the API is a Microsoft.RTM. Windows.TM. operating system dynamic link library.

14. The process of claim 11, wherein:

step (c) comprises the step of transmitting, by the API, an open-file message of the host-to-board messages to the audio task, wherein the open-file message indicates a path name for the audio file; and

step (d) comprises the steps of:

(1) opening, by the audio task, the audio file in response to the open-file message received from the API; and

(2) transmitting, by the audio task, a callback message to the API in response to the open-file message.

15. The process of claim 14, wherein:

if the open-file message is successful, then the callback message indicates that the audio file is valid; and

if the open-file message is unsuccessful, then the callback message indicates one of:

(1) the audio file cannot be opened;

(2) the audio file comprises unexpected file data;

(3) the audio file comprises data of an invalid audio type;

(4) the audio file is of an invalid file type;

(5) the audio file comprises a bad format type;

(6) the audio file comprises a bad header type;

(7) an on-board resource error occurred;

(8) the audio file comprises data at an unsupported rate;

(9) the audio file comprises data at an unsupported sample size; and

(10) a high-priority audio channel is already in use.

16. The process of claim 11, wherein:

step (c) comprises the step of transmitting, by the API, a play-file message of the host-to-board messages to the audio task, wherein the play-file message comprises:

(1) a first parameter indicating a path name for the audio file;

(2) a second parameter indicating how many times to play the audio file;

(3) a third parameter;

(4) a fourth parameter indicating a volume level at which to play the audio file; and

(5) a fifth parameter indicating an output device for playing the audio file; and

step (d) comprises the step of playing, by the audio task, the audio file in response to the play-file message received from the API.

17. The process of claim 16, wherein:

the third parameter is ignored;

a value of 1 for the fifth parameter indicates that the output device is an on-board speaker;

a value of 2 for the fifth parameter indicates that the output device is a line-out device; and

a value of 4 for the fifth parameter indicates that the output device is a head phone device.

18. The process of claim 11, wherein:

step (c) comprises the step of transmitting, by the API, a stop-file message of the host-to-board messages to the audio task; and

step (d) comprises the step of stopping, by the audio task, play of the audio file in response to the stop-file message received from the API.

19. The process of claim 11, wherein step (d) comprises the step of transmitting, by the audio task, a play-done message to the API at completion of playing of the audio file.

20. The process of claim 11, wherein:

the audio file is a Microsoft.RTM.-standard Wave file;

the API is a Microsoft.RTM. Windows.TM. operating system dynamic link library;

step (c) comprises the step of transmitting, by the API, an open-file message of the host-to-board messages to the audio task, wherein the open-file message indicates a path name for the audio file;

step (d) comprises the steps of:

(1) opening, by the audio task, the audio file in response to the open-file message received from the API; and

(2) transmitting, by the audio task, a callback message to the API in response to the open-file message;

step (c) comprises the step of transmitting, by the API, a play-file message of the host-to-board messages to the audio task, wherein the play-file message comprises:

(1) a first parameter indicating a path name for the audio file;

(2) a second parameter indicating how many times to play the audio file;

(3) a third parameter;

(4) a fourth parameter indicating a volume level at which to play the audio file; and

(5) a fifth parameter indicating an output device for playing the audio file;

step (d) comprises the step of playing, by the audio task, the audio file in response to the play-file message received from the API;

step (c) comprises the step of transmitting, by the API, a stop-file message of the host- to-board messages to the audio task;

step (d) comprises the step of stopping, by the audio task, play of the audio file in response to the stop-file message received from the API; and

step (d) comprises the step of transmitting, by the audio task, a play-done message to the API at completion of playing of the audio file.


Description

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to audio/video conferencing, and, in particular, to systems for real-time audio, video, and data conferencing in windowed environments on personal computer systems.

2. Description of the Related Art

It is desirable to provide real-time audio, video, and data conferencing between personal computer (PC) systems operating in windowed environments such as those provided by versions of Microsoft.RTM. Windows.TM. operating system. There are difficulties, however, with providing real-time conferencing in non-real-time windowed environments.

It is accordingly an object of this invention to overcome the disadvantages and drawbacks of the known art and to provide real-time audio, video, and data conferencing between PC systems operating in non-real-time windowed environments.

It is a particular object of the present invention to provide real-time audio, video, and data conferencing between PC systems operating under a Microsoft.RTM. Windows.TM. operating system.

Further objects and advantages of this invention will become apparent from the detailed description of a preferred embodiment which follows.

SUMMARY OF THE INVENTION

The present invention comprises a computer-implemented process and a computer-based subsystem for processing audio signals. According to a preferred embodiment, the subsystem comprises an application programming interface (API) and an audio task. The API is implemented on a general-purpose host processor of a computer system and the audio task is implemented on a digital signal processor of the computer system. The API receives one or more functions calls from an application to control the play of an audio file, where the application is implemented on the general-purpose host processor and the audio file is stored in a storage device of the computer system. The API translates the function calls into one or more host-to-board messages and transmits the host-to-board messages to the audio task. The audio task plays the audio file based on the host-to-board messages received from the API at a higher priority than one or more other audio files.

BRIEF DESCRIPTION OF THE DRAWINGS

Other objects, features, and advantages of the present invention will become more fully apparent from the following detailed description of the preferred embodiment, the appended claims, and the accompanying drawings in which:

FIG. 1 is a block diagram representing real-time point-to-point audio, video, and data conferencing between two PC systems, according to a preferred embodiment of the present invention;

FIG. 2 is a block diagram of the hardware configuration of the conferencing system of each PC system of FIG. 1;

FIG. 3 is a block diagram of the hardware configuration of the video board of the conferencing system of FIG. 2;

FIG. 4 is a block diagram of the hardware configuration of the audio/comm board of the conferencing system of FIG. 2;

FIG. 5 is a block diagram of the software configuration of the conferencing system of each PC system of FIG. 1;

FIG. 6 is a block diagram of a preferred embodiment of the hardware configuration of the audio/comm board of FIG. 4;

FIG. 7 is a block diagram of the conferencing interface layer between the conferencing applications of FIG. 5, on one side, and the comm, video, and audio managers of FIG. 5, on the other side;

FIG. 8 is a representation of the conferencing call finite state machine (FSM) for a conferencing session between a local conferencing system (i.e., caller) and a remote conferencing system (i.e., callee);

FIG. 9 is a representation of the conferencing stream FSM for each conferencing system participating in a conferencing session;

FIG. 10 is a representation of the video FSM for the local video stream and the remote video stream of a conferencing system during a conferencing session;

FIG. 11 is a block diagram of the software components of the video manager of the conferencing system of FIG. 5;

FIG. 12 is a representation of a sequence of N walking key frames;

FIG. 13 is a representation of the audio FSM for the local audio stream and the remote audio stream of a conferencing system during a conferencing session;

FIG. 14 is a block diagram of the architecture of the audio subsystem of the conferencing system of FIG. 5;

FIG. 15 is a block diagram of the interface between the audio task of FIG. 5 and the audio hardware of audio/comm board of FIG. 2;

FIG. 16 is a block diagram of the interface between the audio task and the comm task of FIG. 5;

FIG. 17 is a block diagram of the comm subsystem of the conferencing system of FIG. 5;

FIG. 18 is a block diagram of the comm subsystem architecture for two conferencing systems of FIG. 5 participating in a conferencing session;

FIG. 19 is a representation of the comm subsystem application FSM for a conferencing session between a local site and a remote site;

FIG. 20 is a representation of the comm subsystem connection FSM for a conferencing session between a local site and a remote site;

FIG. 21 is a representation of the comm subsystem control channel handshake FSM for a conferencing session between a local site and a remote site;

FIG. 22 is a representation of the comm subsystem channel establishment FSM for a conferencing session between a local site and a remote site;

FIG. 23 is a representation of the comm subsystem processing for a typical conferencing session between a caller and a callee;

FIG. 24 is a representation of the structure of a video packet as sent to or received from the comm subsystem of the conferencing system of FIG. 5;

FIG. 25 is a representation of the compressed video bitstream for the conferencing system of FIG. 5;

FIG. 26 is a representation of a compressed audio packet for the conferencing system of FIG. 5;

FIG. 27 is a representation of the reliable transport comm packet structure;

FIG. 28 is a representation of the unreliable transport comm packet structure;

FIG. 29 are diagrams indicating typical connection setup and teardown sequences;

FIGS. 30 and 31 are diagrams of the architecture of the audio/comm board;

FIG. 32 is a diagram of the audio/comm board environment;

FIG. 33 is a flow diagram of the on-demand application invocation processing of the conferencing system of FIG. 5;

FIG. 34 is a flow diagram of an example of the processing implemented within the conferencing system of FIG. 5 to manage two conferencing applications in a single conferencing session with a remote conferencing system;

FIG. 35 represents the flow of bits between two remote high-resolution counters used to maintain clock values over a conferencing network;

FIG. 36 is a flow diagram of the processing of the conferencing system of FIG. 1 to control the flow of signals over reliable channels;

FIG. 37 is a flow diagram of the preemptive priority-based transmission processing implemented by the communications subsystem of the conferencing system of FIG. 1;

FIG. 38 is a state diagram for the complete rate negotiation processing, according to a preferred embodiment of the present invention;

FIG. 39 is a state diagram for the rate negotiation processing for a called node during a 64 KBPS upgrade;

FIG. 40 is a state diagram for the rate negotiation processing for a calling node during a 64 KBPS upgrade; and

FIG. 41 is a state diagram for the rate negotiation processing in loopback mode during a 64 KBPS upgrade.

DESCRIPTION OF THE PREFERRED EMBODIMENT(S)

Point-To-Point Conferencing Network

Referring now to FIG. 1, there is shown a block diagram representing real-time point-to-point to-point audio, video, and data conferencing between two PC systems, according to a preferred embodiment of the present invention. Each PC system has a conferencing system 100, a camera 102, a microphone 104, a monitor 106, and a speaker 108. The conferencing systems communicate via an integrated services digital network (ISDN) 110. Each conferencing system 100 receives, digitizes, and compresses the analog video signals generated by camera 102 and the analog audio signals generated by microphone 104. The compressed digital video and audio signals are transmitted to the other conferencing system via ISDN 110, where they are decompressed and converted for play on monitor 106 and speaker 108, respectively. In addition, each conferencing system 100 may generate and transmit data signals to the other conferencing system 100 for play on monitor 106. In a preferred embodiment, the video and data signals are displayed in different windows on monitor 106. Each conferencing system 100 may also display the locally generated video signals in a separate window.

Camera 102 may be any suitable camera for generating NSTC or PAL analog video signals. Microphone 104 may be any suitable microphone for generating analog audio signals Monitor 106 may be any suitable monitor for displaying video and graphics images and is preferably a VGA monitor. Speaker 108 may be any suitable device for playing analog audio signals and is preferably a headset.

Conferencing System Hardware Configuration

Referring now to FIG. 2, there is shown a block diagram of the hardware configuration of each conferencing system 100 of FIG. 1, according to a preferred embodiment of the present invention. Each conferencing system 100 comprises host processor 202, video board 204, audio/comm board 206, and ISA bus 208.

Referring now to FIG. 3, there is shown a block diagram of the hardware configuration of video board 204 of FIG. 2, according to a preferred embodiment of the present invention. Video board 204 comprises industry standard architecture (ISA) bus interface 310, video bus 312, pixel processor 302, video random access memory (VRAM) device 304, video capture module 306, and video analog-to-digital (A/D) converter 308.

Referring now to FIG. 4, there is shown a block diagram of the hardware configuration of audio/comm board 206 of FIG. 2, according to a preferred embodiment of the present invention. Audio/comm board 206 comprises ISDN interface 402, memory 404, digital signal processor (DSP) 406, and ISA bus interface 408, audio input/output (I/O) hardware 410.

Conferencing System Software Configuration

Referring now to FIG. 5, there is shown a block diagram of the software configuration each conferencing system 100 of FIG. 1, according to a preferred embodiment of the present invention. Video microcode 530 resides and runs on pixel processor 302 of video board 204 of FIG. 3. Comm task 540 and audio task 538 reside and run on DSP 406 of audio/comm board 206 of FIG. 4. All of the other software modules depicted in FIG. 5 reside and run on host processor 202 of FIG. 2.

Video, Audio, and Data Processing

Referring now to FIGS. 3, 4, and 5, audio/video conferencing application 502 running on host processor 202 provides the top-level local control of audio and video conferencing between a local conferencing system (i.e., local site or endpoint) and a remote conferencing system (i.e., remote site or endpoint). Audio/video conferencing application 502 controls local audio and video processing and establishes links with the remote site for transmitting and receiving audio and video over the ISDN. Similarly, data conferencing application 504, also running on host processor 202, provides the top-level local control of data conferencing between the local and remote sites. Conferencing applications 502 and 504 communicate with the audio, video, and comm subsystems using conference manager 544, conferencing application programming interface (API) 506, video API 508, comm API 510, and audio API 512. The functions of conferencing applications 502 and 504 and the APIs they use are described in further detail later in this specification.

During conferencing, audio I/O hardware 410 of audio/comm board 206 digitizes analog audio signals received from microphone 104 and stores the resulting uncompressed digital audio to memory 404 via ISA bus interface 408. Audio task 538, running on DSP 406, controls the compression of the uncompressed audio and stores the resulting compressed audio back to memory 404. Comm task 540, also running on DSP 406, then formats the compressed audio for ISDN transmission and transmits the compressed ISDN-formatted audio to ISDN interface 402 for transmission to the remote site over ISDN 110.

ISDN interface 402 also receives from ISDN 110 compressed ISDN-formatted audio generated by the remote site and stores the compressed ISDN-formatted audio to memory 404. Comm task 540 then reconstructs the compressed audio format and stores the compressed audio back to memory 404. Audio task 538 controls the decompression of the compressed audio and stores the resulting decompressed audio back to memory 404. ISA bus interface then transmits the decompressed audio to audio I/O hardware 410, which digital-to-analog (D/A) converts the decompressed audio and transmits the resulting analog audio signals to speaker 108 for play.

Thus, audio capture/compression and decompression/playback are preferably performed entirely within audio/comm board 206 without going through the host processor. As a result, audio is preferably continuously played during a conferencing session regardless of what other applications are running on host processor 202.

Concurrent with the audio processing, video A/D converter 308 of video board 204 digitizes analog video signals received from camera 102 and transmits the resulting digitized video to video capture module 306. Video capture module 306 decodes the digitized video into YUV color components and delivers uncompressed digital video bitmaps to VRAM 304 via video bus 312. Video microcode 530, running on pixel processor 302, compresses the uncompressed video bitmaps and stores the resulting compressed video back to VRAM 304. ISA bus interface 310 then transmits via ISA bus 208 the compressed video to video/host interface 526 running on host processor 202.

Video/host interface 526 passes the compressed video to video manager 516 via video capture driver 522. Video manager 516 calls audio manager 520 using audio API 512 for synchronization information. Video manager 516 then time-stamps the video for synchronization with the audio. Video manager 516 passes the time-stamped compressed video to communications (comm) manager 518 using comm application programming interface (API) 510. Comm manager 518 passes the compressed video through digital signal processing (DSP) interface 528 to ISA bus interface 408 of audio/comm board 206, which stores the compressed video to memory 404. Comm task 540 then formats the compressed video for ISDN transmission and transmits the ISDN-formatted compressed video to ISDN interface 402 for transmission to the remote site over ISDN 110.

ISDN interface 402 also receives from ISDN 110 ISDN-formatted compressed video generated by the remote site system and stores the ISDN-formatted compressed video to memory 404. Comm task 540 reconstructs the compressed video format and stores the resulting compressed video back to memory 404. ISA bus interface then transmits the compressed video to comm manager 518 via ISA bus 208 and DSP interface 528. Comm manager 518 passes the compressed video to video manager 516 using video API 508. Video manager 516 passes the compressed video to video decode driver 548 for decompression processing. Video decode driver 548 passes the decompressed video to video playback driver 550, which formats the decompressed video for transmission to the graphics device interface (GDI) (not shown) of the Microsoft.RTM. Windows.TM. operating system for eventual display in a vide o window on monitor 106.

For data conferencing, concurrent with audio and video conferencing, data conferencing application 504 generates and passes data to comm manager 518 using conferencing API 506 and comm API 510. Comm manager 518 pasess the data through board DSP interface 532 to ISA bus interface 408, which stores the data to memory 404. Comm task 540 formats the data for ISDN transmission and stores the ISDN-formatted data back to memory 404. ISDN interface 402 then transmits the ISDN-formatted data to the remote site over ISDN 110.

ISDN interface 402 also receives from ISDN 110 ISDN-formatted data generated by the remote site and stores the ISDN-formatted data to memory 404. Comm task 540 reconstructs the data format and stores the resulting data back to memory 404. ISA bus interface 408 then transmits the data to comm manager 518, via ISA bus 208 and DSP interface 528. Comm manager 518 passes the data to data conferencing application 504 using comm API 510 and conferencing API 506. Data conferencing application 504 processes the data and transmits the processed data to Microsoft.RTM. Windows.TM. GDI (not shown) for display in a data window on monitor 106.

Preferred Hardware Configuration for Conferencing System

Referring now to FIG. 6, there is shown a block diagram of a preferred embodiment of the hardware configuration of audio/comm board 206 of FIG. 4. Referring now to FIGS. 30 and 31, there are shown diagrams of the architecture of the audio/comm board. Referring now to FIG. 32, there is shown a diagram of the audio/comm board environment. The description for this section is the same as the description for the section of the same name in U.S. patent application Ser. No. 08/157,694.

Software Architecture for Conferencing System

The software architecture of conferencing system 100 shown in FIGS. 2 and 5 has three layers of abstraction. A computer supported collaboration (CSC) infrastructure layer comprises the. hardware (i.e., video board 204 and audio/comm board 206) and host/board driver software (i.e., video/host interface 526 and DSP interface 528) to support video, audio, and comm, as well as the encode method for video (running on video board 204) and encode/decode methods for audio (running on audio/comm board 206). The capabilities of the CSC infrastructure are provided to the upper layer as a device driver interface (DDI).

A CSC system software layer provides services for instantiating and controlling the video and audio streams, synchronizing the two streams, and establishing and gracefully ending a call and associated communication channels. This functionality is provided in an application programming interface (API). This API comprises the extended audio and video interfaces and the communications APIs (i.e., conference manager 544, conferencing API 506, video API 508, video manager 516, video capture driver 522, video decode driver 548, video playback driver 550, comm API 510, comm manager 518. Wave API 514, Wave driver 524, PWave API 552, audio API 512, and audio manager 520).

A CSC applications layer brings CSC to the desktop. The CSC applications may include video annotation to video mail, video answering machine, audio/video/data conferencing (i.e., audio/video conferencing application 502 and data conferencing application 504), and group decision support systems.

Audio/video conferencing application 502 and data conferencing application 504 rely on conference manager 554 and conferencing API 506, which in turn rely upon video API 508, comm API 510, and audio API 512 to interface with video manager 516, comm manager 518, and audio manager 520, respectively. Comm API 510 and comm manager 518 provide a transport-independent interface (TII) that provides communications services to conferencing applications 502 and 504. The communications software of conferencing system 100 may be designed to support different transport mechanisms, such as ISDN, SW56, and LAN (e.g., SPX/IPX, TCP/IP, or NetBIOS). The TII isolates the conferencing applications from the underlying transport layer (i.e., transport-medium-specific DSP interface 528). The TII hides the network/connectivity specific operations. In conferencing system 100, the TII hides the ISDN layer. The DSP interface 528 is hidden in the datalink module (DLM). The TII provides services to the conferencing applications for opening communication channels (within the same session) and dynamically managing the bandwidth. The bandwidth is managed through the transmission priority scheme.

In a preferred embodiment in which conferencing system 100 performs software video decoding, video capture driver 522 is implemented on top of video/host interface 526 (the video driver). In an alternative preferred embodiment in which conferencing system 100 performs hardware video decoding, a video display driver is also implemented on top of video/host interface 526.

The software architecture of conferencing system 100 comprises three major subsystems: video, audio, and communication. The audio and video subsystems are decoupled and treated as "data types" (similar to text or graphics) with conventional operations like open, save, edit, and display. The video and audio services are available to the applications through video-management and audio-management extended interfaces, respectively.

In a preferred embodiment, conferencing system 100 is implemented mostly in the C++ computer language using the Microsoft.RTM. Foundation Classes (MFC) with portions implemented in the C7.0 computer language.

Audio/Video Conferencing Application

Audio/video conferencing application 502 implements the conferencing user interface. Conferencing application 502 is implemented as a Microsoft.RTM. Windows.TM. 3.1 application. One child window will display the local video image and a second child window will display the remote video image. Audio/video conferencing application 502 provides the following services to conferencing system 100:

Manages main message loop.

Performs initialization and registers classes.

Handles menus.

Processes toolbar messages.

Handles preferences.

Handles speed dial setup and selections.

Connects and hang up.

Handles handset window

Handles remote video.

Handles remote video window.

Handles local video.

Handles local video window.

Handles call notification dialog box.

Plays sounds.

Handles address book lookup for caller identification purposes.

Audio/video conferencing application 502 is made up of six main modules:

(1) ABIF.LIB: An address book class wrapper library.

(2) SUBDLG.LIB: A sub-dialog library.

(3) CMDLL.DLL: A conference manager DLL (shown in FIG. 5 as conference manager 544).

(4) CMIF.LIB: A library with high-level interfaces to CMDLL.DLL and other sub-systems.

(5) PSVIDEO.EXE: An audio/video application proper.

(6) PSNOTIFY.EXE: An incoming call notification application (a listener).

CMIF.LIB

The CMIF.LIB CMIF was created to encapsulate several subsystem APIs: conference manager API 544, conferencing API (VCI) 506, audio API 512, and video API 508. It is a set of (mostly) static classes (data and functions). It consists of 3 classes: CCm, CImageSize, and ClmageState. ClmageSize encapsulates the various sizes that video images can have. CImageState encapsulates the different attributes and drivers that a video image can have. CCm handles everything not covered by CImageSize and CImageState. CCm includes a callback function to handle notification from CMDLL.

The callback function in CCm can be called in the context of another application (namely VTHREAD.EXE of VCI 506). This is the reason that CMIF is a separate library. By making it a separate library the compile flags are set to use smart callbacks in a way that is compatible with MakeProcInstance( ). MakeProcInstance( ) is used on the callback to force the data segment to point appropriately upon entering the callback function.

CCm

The CCm class simplifies calling routines in subsystem APIs. It has the following capabilities: loading and unloading of the subsystem DLLs, registering and unregistering with CMDLL, call support, channel pair support, stream support, CMDLL callback handler, and NO_VCI support. CCm assumes that there is one video call. This call can be in various different states: disconnected, connecting, connected, and disconnecting. Tests and assertions are performed throughout the code to verify that operations on a call can be carried out during the current call state.

Loading and Unloading

CCm::Load( ) loads the various subsystems. It does so by calling LoadLibrary( ) for each DLL. For each subsystem, after successfully loading the library into memory, CCm::Load( ) loads a table of function pointers for each subsystem by calling GetProcAddress( ). There are macros that allow CCm to call these functions in a manner consistent with implicit linking.

CCm::Unload( ) unloads the various subsystems. It does so by calling UnloadLibrary( ) for each DLL.

Registering and Unregistering

CCm::Register( ) registers CCm::Callback( ) as a callback function with CMDLL 544. It also registers an application window with CMIF to post messages to. It should be called once upon application startup.

CCm::UnRegister( ) unregisters with CMDLL. It should be called once upon application shutdown.

CCm::Install( ) allows an application that is not currently installed in CMDLL's registry to install itself there. This should be called if CCm::Register( ) fails with a return code of CMRC_UNKNOWNAPP. Then, after successfully calling CCm::Install( ), call CCm::Register( ) again.

Call Support

CCm::MakeCall( ) allows the application to place a video call via CMDLL 544. CCm::MakeCall( ) takes a ISDN phone number (of the callee) and a name (of the caller). The name is passed to the callee for display in the caption of the remote video window. Upon successful return from CCm::MakeCall( ), the call will be in the connecting state.

CCm::HangupCall( ) hangs the current call up. If there is no call in progress, it returns an error code. Upon successful return from CCm::HangupCall( ), the call will be in the disconnecting state.

Channel Pair Support

A channel pair is a bi-directional VCI/TII communication channel. It can be used to send and receive data from a peer application during a video call. Channel pairs are established with a peer application (facilitated through registration) and are referred to by an id (0-31). CMDLL 544 provides this channel pair support to CMIF.

CCm::GetChannelPairInfo( ) can be called with a channel pair id. It will return information pertaining to the given channel pair or an error code if it failed.

CCm::GetChannelPair( ) requests a channel pair from CMDLL. The request is completed when CMDLL sends the CMN_CHANNELPAIR callback message. This message indicates whether or not the channel was successfully opened. Once a channel pair is opened it needs to have a channel handler registered for it. This is done by calling CCm::RegisterChanHandler( ) for each channel in the channel pair.

After successfully registering a channel handler for each channel in a channel pair, the channel pair can be used to send and receive data.

CCm::SendData( ) requests that data be sent on the outbound channel within a channel pair. The request is complete when VCI sends the CHAN_DATA_SENT callback message to the channel handler.

CCm::ReceiveData( ) posts a receive buffer to VCI/TII for the inbound channel to receive data into. When data has been successfully received into a buffer for the inbound channel, VCI sends the CHAN_RCV_COMPLETE callback message to the channel handler. This buffer must not be freed until the receive has completed.

When finished using a channel pair, call CCm::CloseChannelPair( ) to request that the resources associated with it be freed. The request is complete when CMDLL sends the CMN_CLOSECHANNELPAIR callback message. After receiving this message do not attempt to use the channels in that channel pair. PSVIDEO encapsulates registering a channel handler and sending and receiving data in a class called CChannelPair.

Stream Support

VCI stream groups contain both an audio data stream and a video data stream. They can be either local or remote. Local stream groups are created by calling CCm::CreateLocalStream( ) and remote stream groups are created by calling CCm::CreateRemoteStream( ). Both take as parameters a window in which to play the video, a size to play the video at, a state to play the video in, and flags specifying whether or not to play the video/audio. Local stream groups do not play audio locally, whereas remote stream groups do. There is preferably only local stream group and one remote stream group.

To start sending the local stream group to the peer video conferencing application, call CCm::SendLocalStream( ). This should be called only once after the video call has been established and the local stream group has been created.

To start and stop sending during a video call, call CCm::SendLocalStreamMute( ). This has the same syntax as the former, but does not stop the audio time stamps from being sent to the peer application. These audio time stamps are preferably continually sent, after they have been started, for the life of a video call.

To start and stop playing either the local stream group or the remote stream group, call CCm::PlayLocalStream( ) or CCm::PlayRemoteStream( ), respectively. These also can be used to play a stream at a different size or state than it was previously being played at before (i.e., the local stream group can be played as an IRV video stream or a YUV video stream).

To change the window in which either the local stream group or the remote stream group is being played, call CCm::RedirectLocalVideo( ) or CCm::RedirectRemoteVideo( ), respectively.

To take a snapshot of either stream, call CCm::SnapStream( ). It will return a device independent bitmap of the given stream group's current video image. It does not matter what the size or state of the video stream is or whether it is the local or remote stream group.

To control the volume of the remote stream group (remember local streams do not play audio), call CCm::SetWaveVolume( ).

To switch the remote stream group in and out of open audio (half-duplex speaker phone), call CCm::SetOpenAudio( ).

To control either stream groups video attributes (color, contrast, brightness, and tint), call CCm::SetColor( ), CCm::SetContrast( ), CCm::SetBrightness( ), and CCm::SetTint( ). However, CCm::SetTint( ) is not preferably supported for the remote stream group.

To adjust the display quality (frequency) of the local stream group, call CCm::SetFRC( ). Also, local stream groups can toggle between mirror and camera views and normal and wide angle views.

To toggle these attributes, call CCm::SetMirrorView( ) and CCm::SetWideAngle( ), respectively.

CCm::Redraw( ) forces the given stream group to redraw itself.

To free the resources allocated for a stream group call CCm::DestroyStream( ).

CMDLL Callback

When registering with CMDLL 544, a callback function is specified. CMIF specifies a procedure instance for CCm::CallBack( ). This callback function may be called in the context of another application. This is why MakeProcInstance( ) is used to generate a procedure instance for the callback. However, smart callbacks (e.g., Microsoft.RTM. C8.0 compiler flag /GA) are not compatible with MakeProcInstance( ). Use the old style callbacks (e.g., Microsoft.RTM. C8.0 compiler flag /Gw, /GW or /GA/GEea) instead.

This callback function receives the following types of messages from CMDLL 544: call messages (CMN_CALL, CMN_HANGUP, CMN_REJECTED, CMN_PROGRESS), channel pair messages (CMN_CHANNELPAIR, CMN_CLOSECHANNELPAIR), CMDLL shutdown message (CMN_EXIT) and error message (CMN_ERROR). For each message sent to this callback, a message is posted to the application window that was given during registration.

When a call message is received, the call state is tested to validate whether the message is valid in the current call state or not. When a message is received that does not correspond to the current call state, a debug message is printed out and in some cases an appropriate action is taken to correct the situation. When a CMN_CALL, CMN_HANGUP, or CMN_REJECTED message is received, the call state is updated to reflect the state diagram shown above.

NO VCI Support

The NO_VCI compiler is a user interface (UI) development tool to help UI developers debug and test their code without the subsystems running. Most every function that calls a subsystem has the following code fragment:

// code to handle either case

#ifndef NO_VCI

// code to handle the case when calling the subsystem

.

.

#else

// code to handle the case when not calling the subsystem

.

.

#endif

// code to handle either case

As the subsystems change, this flag allows the UI developers to postpone integration with these changes. Also, debugging is sometimes difficult when running with the subsystems enabled.

Miscellaneous

CCm::GetUtilization( ) queries the video subsystem for the cost (as a percent of system utilization) of a single video window with the given characteristics. If there is a failure in getting this cost, the cost returned is 100.

CCm::GetErrorText( ) gets the error text for a given error code from either CMDLL 544 or VCI 506.

CCm::IsLostCallError( ) returns TRUE if the given error code could represent a lost call.

CImageSize

CImageSize encapsulates the image size and display speed attributes of a video stream. It contains operators that allow for iteration through the various sizes. Lastly, it contains member constants that can be queried to see if various image attributes are supported for a given video device driver at the size represented by the object.

CImageState

CImageState encapsulates the image state of a video stream. This state is represented in terms of features that can be turned on or off and various video device drivers (e.g., IRV and YUV). Calling CImageState::AddFlag( ) will turn on or off a given feature in the current image state. To test to see if a feature is on or off, call CImageState::IsFlagSet( ). CImageState also allows the changing of the current video driver.

PSVIDEO.EXE

This section provides details on the objects and class hierarchies in PSVIDEO.EXE. The objects are described in a top down fashion. The class structure is defined with the following two long-term goals:

(1) The architecture is extensible to a multipoint environment.

(2) The architecture is easy to modify. For example, it is easy to rip out all video specific code in order to implement a phone-on-the-screen.

Frame, View, and Image

Three terms are used quite a bit in this section: frame, view, and image windows. The image window corresponds to the area of the display containing just the video image. The view image contains the image window plus a panel which has buttons and controls specific to the view. The frame is the top-most window and contains the local view.

Class Descriptions

This section describes most of the classes in the class hierarchy, as well as their relationships to other classes.

CCyApp

CCyApp is derived from the Microsoft.RTM. CWinApp class, which is the base class from which a Microsoft.RTM. Windows.TM. application object is derived in the MFC framework. The CCyApp object provides member functions for initializing the application and for running the application. This object is constructed when other C++ global objects are constructed and is already available when the Microsoft.RTM. Windows.TM. operating system calls the WinMain function, which is supplied by the MFC framework. As required by MFC, the InitInstance member function has been overridden to create the application's main window object.

Many classes in PSVIDEO declare a CSettings class. Each class is responsible for reading, updating, and saving their own settings (e.g., window screen location) or preferences (always on top). They do this through the CSettings class. Each class that requires settings to be saved implements DoSettings(CSettings *). The DoSettings(CSettings *) function gets and puts the settings associated with that class. When settings are loaded, changed, or saved, parent classes are responsible for notifying their children about the setting event. The children then call their children until all the classes in the hierarchy have performed the necessary setting adjustments. Use the Visual C++ browser to look at examples of DoSettings implementations. A preference is a value that is stored in the PSVIDEO.INI file under the [Preferences] section. A setting is a value that is stored in the PSVIDEO.INI file. The difference is that a preference is set in the Preference dialog box, while settings are not. Settings are things like window size and position.

The CSettings class is derived both from ISettings and CCyDialog. ISettings is not derived from any thing. It declares the virtual function DoSetting (which is implemented by classes derived from ISettings). CSettings also provides a dialog box for user preferences.

CCyApp is also responsible for handling <F1> and <Shift-F1> help requests. Because PSVIDEO uses the Microsoft.RTM. Viewer for displaying help instead of the standard Microsoft.RTM. Windows.TM. help engine (WinHelp), the void CWinApp::WinHelp function is overridden so that the viewer is called instead of WinHelp.

When the application is invoked, the MFC framework calls CWinApp::InitInstance. When the application is closed, the MFC framework calls CWinApp::ExitInstance.

CCyFrameWnd

CCyFrameWnd is derived from CFrameWnd and is not directly instantiated. The CFrameWnd class provides the functionality of a Microsoft.RTM. Windows.TM. single document interface (SDI) overlapped or pop-up frame window, along with members for managing the window. The main reason for defining CCyFrameWnd is to provide a class with methods common to the three windows in the application (handset, local video, and remote video), including the non-client drawing and system menu handling.

Frame windows have access to the video controller. The main reason for this is that the CAppFrame class (derived from CCyFrameWnd) informs the video controller about certain events (e.g., call started, call ended) and queries the video controller for certain information (e.g., whether the video windows are ready to close).

CCyAppFrame

CCyAppFrame is derived from CCyFrameWnd. This class implements the PSVIDEO main window (which includes the handset) and owns all dialog boxes in the application (except the Preferences . . . dialog box in CSettings, which is available through its reference to CSetting). This class is responsible for handling menu selections and handset button clicks.

CMIF contains a callback routine for processing callbacks from CMDLL 544. CMIF passes CMDLL messages on to CAppFrame. CAppFrame implements the following message handlers for the messages passed on from CMIF:

CAppFrame::OnCmCall--A call has started. Notify CHandset so that the appropriate call progress is displayed and the call duration clock starts. Notify CVideoController so that it starts to play the remote video stream. The caller also requests a channel pair from CMDLL. This channel pair is used for sending and receiving control data such as mute on or mute off.

CAppFrame::OnCmHangup--A call has ended. Inform CHandset so that call progress is set correctly and the duration timer stops. Notify CVideoController so that it stops playing the remote video stream.

CAppFrame::OnCmRejected--The call that was placed was rejected by the other side. Inform CHandset so that call progress is set correctly.

CAppFrame::OnCmProgress--Call progress messages originating from VCI 506. Inform CHandset so that call progress is set correctly.

CAppFrame::OnCmChannelPair--A channel pair has been established. Register channel handlers for the input and output channels.

CVideoFrame

CVideoFrame encapsulates properties and methods for a video window. It is derived from CCyFrameWnd and is not directly instantiated. Two classes are based on CVideoFrame: CLocalFrame and CRemoteFrame. The CVideoFrame class' main responsibility is to manage and keep track of frame window size and handle commands specific to video frame windows.

CVideoController

The CVideoController object is derived from CCmdTarget and ISettings. Its main purpose is to manage video windows. In a preferred embodiment of the present invention, there are only two video windows. In alternative preferred embodiments, there may be several video windows and this class will then be extended to keep track of the list of available video windows.

The CVideoController class is also responsible for implementing size and display rate restrictions. When the user resizes a video frame window, CVideoController will determine appropriate size for the "other" window based on system characteristics (CPU and graphics adapter). CVideoController is responsible for sending size messages to the appropriate video window. CVideoController keeps track of state information related to audio and video, e.g., mute, high quality, and open audio.

Auto-Sizing of Video Windows

The audio/video conferencing application 502 has two video windows: one for the local camera video image and one for the remote camera video image. The user of application 502 expects to be allowed to change the size of these video windows. However, the computer system may not have enough processing power to accommodate such a change, due to the demands larger video windows place on the system. Specifically, increasing the size of one of the video windows may exceed an acceptable amount of the computer system's processing power. If the user is allowed to change the size of one of the video windows without any constraints, the video quality may degrade to an unacceptable level. Conferencing application 500 automatically resizes the windows to utilize some acceptable amount of the computer system's processing power. The sizes a video window can be are preferably constrained to an optimized set based on the video subsystem software.

Split and Combined Modes

Conferencing application 502 can be in one of two modes: split or combined. This section explains how the frame, view, image model is used to implement split and combine. In general, a method is provided to convert a single top-level window, W, with more than one child window, w1, . . . , wn, to multiple top-level windows, W1, . . . , Wn, each of which contains one of the child windows w1, . . . , wn. Typically, the window W is the same window as some window Wi, 1<=i<=n. A top-level window is a window that is the child of the desktop window. The method assumes that in the single window mode, there is a top-level window with several child windows. Each child may have its own children, but this is irrelevant to this discussion. In split windows mode, each child (or a subset of the children) is re-parented and gets its own parent (which is a top-level window). These additional top-level windows can be managed in at least two ways:

(1) Create them upon application start-up. If the application is in combined window mode, then hide the windows. If the application is in split windows mode, then show the windows.

(2) Dynamically create and destroy them as needed.

Switching between combined window mode and split windows mode may be implemented as follows:

SingleToMultiple:

Create top level windows if needed.

Re-parent children that needs to be re-parented.

Show the additional top-level windows.

MultipleToSingle:

Hide the top level windows that are losing their child windows.

Show the children in the single window.

Destroy top-level windows if needed.

In PSVIDEO.EXE, the split/combine code can be found in the file videoctl.cpp in the method CVideoController::SetGrouping.

Control Channel Management

This section describes the application control channel. The control channel enables conferencing application 502 to inform peer of events (e.g., mute on and mute off) and transfer arbitrary size information. In general, the application control channel can be used to transmit any data. For example, conferencing application 502 has an "audio/video mute" feature, which allows a video conferencing participant to click on a button so that he/she is no longer visible to the other conference participant. The peer application must be informed about this operation so that it can discontinue displaying the video image and instead display some indication that the other side turned mute on.

By establishing a control channel and a control structure, peer applications are able to be notified about events that require some action. The control channel assumes reliable transfer of control packets. Conferencing system 100 does not confirm the receipt of control channel messages. This simplifies the implementation. All control channel messages are of the following form:

    typedef struct tagChanMsgData {
        WORD      wMsg;         // message identifier.
        DWORD     dwMsgParam;   // message specific parameter/value.
        DWORD     dwSequence;   // sequence number for a series of
                                   control packets.
        WORD      wBufferSize;  // size of the buffer following the
                                structure.
    } ChanMsgData, FAR* LPCHANMSGDATA;


In order to send additional data along with a control channel structure, data is arranged following the structure. The size amount of data is specified, in number of bytes, by the dwBufferSize field. Four messages have been defined: mute, high quality snapshot, application launch, and application launch response.

Mute Message

Only wMsg and dwMsgParam are used for the MUTE message.

wMsg: value=1.

dwMsgParam:

LOWORD of dwMsgParam specifies the audio state.

0=audio mute off

1=audio mute on

2=no change

HIWORD of dwMsgParam specifies the video state.

0=video mute off

1=video mute on

2=no change

For example, to mute video without changing the state of audio dwMsgParam would be MAKELONG(2, 1). CVideoController sets the mute state when the user clicks the mute button and informs the peer about its new state using the control channel.

High-Quality Snapshot Message

The SNAPSHOT message is sent to the peer application when a still image capture that disables outgoing video stream is in progress. For example, to achieve a certain still image quality, outgoing video may have to be disabled. In these situations, the SNAPSHOT message is sent to peer application. The message is resent when the flow of outgoing video resumes. Only wMsg and dwMsgParam are used for the SNAPSHOT message.

wMsg: value=2.

dwMsgParam: LOWORD of dwMsgParam specifies the snapshot state.

0=High-quality mode OFF

1=High-quality mode ON

Application Launch

The APPLAUNCH message is sent to a remote node to launch an application. The application path is specified in the buffer following the control channel structure. wBufSize holds the size of the buffer. The other fields are not used.

wMsg: value=101.

Application Launch Response

The APPLAUNCHRESPONSE message is sent by the remote node that was asked to launch an application. The return code of the Microsoft.RTM. Windows.TM. WinExec function is passed back in dwMsgParam. The other fields are not used.

wMsg: value=102.

CChanPair

CVideoController has an instance of a CControlChannel class which is derived from the class CChanPair. CChanPair provides an abstraction of CMDLL and TII services.

Video View Class Relationships

The CVideoView class' main responsibilities are to manage image (video) windows (ClmageWindow) and handle the control bar operations available in the video panels (CVideoViewPanel). There exist two classes, CLocalView and CRemoteView, that encapsulate specific functionality of the remote and local windows. The CLocalView and CRemoteView classes have a CLocalImage and CRemoteImage class, respectively. In addition, CLocalViewPanel and CRemoteViewPanel have been derived to contain functionality specific to the local and remote windows.

CVideoView knows about the CVideoController class. When the user selects a function such as size or snapshot in the panels, the CVideoView informs the CVideoController about the user's request. CVideoController is responsible for auto-sizing the video windows.

Handset Class Relationships

As described earlier, the CAppFrame class has an instance of a CHandset. The CHandset class is a window. The controls inside the handset window are specified in a dialog template. This is the reason why CHandset is derived from CDialog. A modeless and borderless dialog box is placed on top of a window, in this case, CAppFrame. CHandset is responsible for handling all button clicks and user interactions in the handset area.

In addition to using several MFC defined classes, CHandset also has a CLcd and a speed dial list, a CSDComboBox. The CLcd class consists of instances of CStatic for call progress information and call duration, and of CEdit for entering the number to dial. CSDComboBox is an owner-drawn listbox that displays the five most recently dialed numbers and also speed dial numbers. Finally, CHandset contains several buttons.

An interface is defined for interactions between CHandset and CSpeedDial. When a number has been dialed, the CHandset informs CSpeedDial about the dialed number. CSpeedDial is then responsible for updating the speed dial list (the CSDComboBox) and the PSVIDEO.INI file.

Dialog Boxes

Conferencing application 502 contains several dialog boxes. The focus of this section is to describe CCyDialog. A special dialog box class is derived from CDialog to avoid problems when dialog boxes are displayed when the application is in split mode and some or all of the topmost windows (handset, local video, remote video, snapshot) are always-on-top. Specifically:

(1) If a dialog is brought up when the application was in split mode (3 Windows) and one of the topmost windows is always-on-top, then portions of the dialog box would otherwise be obscured by the always-on-top window. For example, the Preferences dialog box would be on top of the handset window but underneath the remote window.

(2) When the user brings up a dialog box when the applications is in split mode (3 Windows), the user would otherwise be able to click on the local window and then obscure the dialog box.

The problem with (1) is that the OK and Cancel buttons would often be hidden by the video windows, so the application would be difficult to use. The problem with (2) is that a user could bring up a modal dialog box but perform functions outside of the dialog box before closing it.

To make the application easier to use and more consistent with Microsoft.RTM. Windows.TM. operating system standards, CCyDialog is introduced and the problematic dialog boxes are derived from CCyDialog instead of CDialog. As a result, problems (1) and (2) are fixed. When a dialog box derived from CCyDialog is initialized, if the ProShare application has any visible topmost windows and the dialog box's owner is not topmost (e.g., the local window is topmost, but the owner of the Preferences is the handset window), then the dialog box owner window is made topmost to ensure the dialog is visible. To prevent the user from clicking in other application windows, all other topmost windows are disabled. These operations are reversed when the dialog box is destroyed.

When the user switches between conferencing application 502 and other applications, special care is taken due to the modifications made in CCyDialog::OnInitDialog. When conferencing application 502 is deactivated, CCyDialog::OnActivateApp turns temporary topmost off, and then back on when the application is re-activated. In addition, CCyDialog::OnActivateApp reenables the user interface while conferencing application 502 is deactivated so the user can click on any of the conferencing application's visible windows to get back to the dialog (and re-activate conferencing application 502).

Helper Classes

Dialog Helper

The dialog helper class, CDialogHelper, provides a method for obtaining dialog boxes with exact pixel dimensions in the Microsoft.RTM. Windows.TM. environment. The Microsoft.RTM. Windows.TM. M operating system provides support for dialog boxes, both in the API and in the support tools (dialog editors), but the coordinates used for dialog boxes and their elements are not pixels but "dialog units," which are calculated according to the font and display driver in use. Dialog boxes are not created with exact pixel dimensions and coordinates using this support, yet this is often desired, for example, when the position of dialog elements relates to a fixed size live video display (160.times.120 pixels).

The CDialogHelper class simplifies the manipulation of dialog boxes. If a dialog box is designed with the assumption that 1 dialog unit=1 pixel, CDialogHelper can use a dialog template to resize and position dynamically the dialog elements correctly. The procedure follows:

Create the dialog box initially invisible.

Load the dialog template used to create the dialog box.

For each control referenced in the template,

Get a handle to the actual control, and

Use the coordinates from the template to reposition and resize the control.

Use the coordinates from the template to resize the entire dialog box.

Unload the dialog template.

Make the dialog box visible.

By using a dialog template that is a subset of the template used to create the dialog box, it is possible to resize and reposition only some of the controls, enabling a combination of pixel and dialog units to be employed.

Fast Bitmaip Buttons

An owner-draw button class, CFastBmpBtn, solves the following problems:

(1) Microsoft.RTM. Windows.TM. operating system provides basic button objects in several styles (particularly push buttons, check boxes, and radio buttons) with a default graphical appearance that is not easily customized. It is possible for an application to use "owner-draw" buttons, but then all distinction between button types is lost and the application must implement check box and radio button functionality itself. The Microsoft.RTM. Foundation Classes (MFC) provide a bitmapped button class that allows an application to provide up to four images to represent the various states of a button (up, depressed, with focus, and disabled), but this does not solve the basic problem, and each image must be stored in a separate bitmap, making maintenance difficult.

(2) When a user clicks on a Microsoft.RTM. Windows.TM. button with the mouse cursor, the button takes the input focus even if the user moves the cursor off the button before releasing the mouse button, thereby not generating a button press event. This makes it difficult for an application to keep the input focus in a desired location (e.g., an edit control).

The two problems above are solved in the following manner:

(1) CFastBmpBtn, a C++ class derived from the basic MFC window class, allows the developer to start with generic buttons, specifying the styles as desired, then add only three lines of code and one bitmap per button to obtain a flexible graphical appearance. The class dynamically subclasses the existing button object, taking over all standard button actions, in particular the drawing of the button. The bitmap contains 4, 8 or 12 images arranged vertically in a strict order, each representing a different button state (up, depressed, with focus, and disabled) for each of the possible check states unchecked, checked, and indeterminate check. The appropriate image is used to draw the button in response to system requests and in direct response to user interaction with the mouse. The CFastBmpBtn sends notification messages to the parent window in the same manner as standard buttons.

(2) The CFastBmpBtn::SetButtonFlags function allows an application to set the CFastBmpBtn::RestoreFocus flag for a particular button. When this flag is set, the button will remember which window previously had focus whenever it receives focus due to a mouse click, and will restore the focus to the window when the mouse button is released.

CFastBmpBtn is used to implement the graphical buttons in user interfaces for conferencing system 100. An example is the handset and control panel buttons in the conferencing application 502.

Data Conferencing Application

Data conferencing application 504 implements the data conferencing user interface. Data conferencing application is implemented as a Microsoft.RTM. Windows.TM. 3.1 application. The data conferencing application uses a "shared notebook" metaphor. The shared notebook lets the user copy a file from the computer into the notebook and review it with a remote user during a call. When the user is sharing the notebook (this time is called a "meeting"), the users see the same information on their computers, users can review it together, and make notes directly into the notebook. A copy of the original file is placed in the notebook, so the original remains unchanged. The notes users make during the meeting are saved with the copy in a meeting file. The shared notebook looks like a notebook or stack of paper. Conference participants have access to the same pages. Either participant can create a new page and fill it with information or make notes on an existing page. A preferred embodiment of a data conferencing application is described in U.S. patent application Ser. No. 08/137,319 (filed Oct. 14, 1993) and in U.S. patent application Ser. No. 08/170,146 (filed Dec. 20, 1993).

Conference Manager

Referring again to FIG. 5, audio/video conferencing application 502 supports audio and video conferencing between remote locations, while data conferencing application 504 supports the sharing of data (e.g., documents) between the remote locations. In general, conferencing system 100 is capable of simultaneously supporting multiple applications that support different types of conferencing services (e.g., audio/video conferencing, data sharing, and background file transfer).

When a single telephone line is used as the transport medium, the conference applications may need to share that line. Conference manager 544 (also known as CMDLL) coordinates connection and data channel activities for the conference applications. It provides capabilities to centralize and coordinate dial, hang-up, data channel management activities, and application launching. It enables conference applications to establish and tear down connections. It also provides applications access to already established connections. A connection is established through the conference manager 544 instead of calling the communication software (i.e., comm manager 518 via comm API 510) directly. Data channels are also obtained through the conference manager 544.

Conference manager 544 and conferencing API 506 provide the following advantages to conferencing system 100:

It is application aware (i.e., if application A on conferencing system X attempts to establish a data channel with application A on conferencing system Y, the conference manager 544 will automatically launch application A on system Y if application A is not already running).

It simplifies the establishment of a full duplex channel by providing a single simplified call to establish such a channel.

It allows applications that would normally use a single dedicated connection to share a connection.

It provides an efficient mechanism to inform applications about events such as "connection established" and "connection torn down."

It adds a layer of control for channel management (e.g., when an application with open channels terminates, the open channels are guaranteed to become closed).

The main purpose of conference manager 544 is to provide a set of services that allows several conference applications to share a common connection. The model is that once a connection is established by some application, any conference application can latch on to the connection and establish a full-duplex communication channel with its peer application running on the remote machine. The full duplex channel is implemented as a channel pair, or in TII terms, one outgoing and one incoming channel.

The conference manager services are used in conferencing system 100 to coordinate connection and data channel activities for the audio/video and data conferencing applications. The conference manager software sits between the applications (clients) and the communication software. A connection is established through the conference manager 544 instead of calling the communication software directly. Data channels are also obtained through the conference manager 544. Conference manager 544 also implements an application registry which gives it some intelligence as to which conference applications are running.

This approach has several advantages;

(1) Conference manager 544 is application aware. This means that if application A on computer X attempts to establish a data channel with application A on computer Y, conference manager 544 will automatically launch A on system Y if it is not already running.

(2) It simplifies the establishment of a full duplex channel. It provides a single simplified call to establish such a channel.

(3) It allows applications that would normally use a single dedicated connection to share a connection.

(4) It provides a nice mechanism to inform applications about events such as "connection established" and "connection torn down."

(5) It adds a layer of control for channel management. For example, when an application with open channels terminates, its channels are guaranteed to get closed.

Conference Manager Overview

Conference manager 544 consists of several modules. The most important ones are as follows:
    cmcall.c  Contains the icmMakeCall and icmHangupCall procedures that are
     called from
              the CMIF library.
    cmchan.c  Contains the implementation of channel related procedures
     specified in the
              conference manager API. These are cmGetChannelPair,
     cmCloseChannelPair,
              and cmGetChannelPairInfo.
    cmclntfy.c Contains dialog box procedures for the dialogs that are
     displayed on incoming
              calls. There is one dialog for the case when the caller ID
     matches a record in
              the address book, and one dialog for the case when a match is not
     found.
    cmdll.c   Contains the LibMain and WEP procedures. Also contains various
              initialization procedures, including Lib_InitializeCf which loads
     VCI.DLL and
              makes the VCI call CF_Init to initialize the comm subsystem and
              Lib_TerminateCf which calls CF_UnInit and unloads VCI.DLL. This
     module
              also contains code for registering and unregistering with the
     address book
              services provided by ABDLL.DLL.
    cmmisc.c  Contains miscellaneous supporting functions used throughout the
     other
              modules in CMDLL.
    cmnotify.c Contains callbacks required by VCI. The callback
     Lib_CfCallCallBack
              handles the CFM_* messages such as CFM_CALL_NTFY and
              CFM_CALL_HANGUP. The callback Lib_CfChanCallBack handles VCI
              CFM_CHAN_* channel messages such as CFM_CHAN_ACCEPT_NTFY and
              CFM_CHAN_REJECT_NTFY.
    cmreg.c   Contains the implementation of the conference manager API
     functions
              cmRegister, cmUnregister, and cmInstall.


Implementation Details

This section describes the implementation details of key areas of CMDLL 544.

Conference Application Installation

In order to make CMDLL 544 aware of conference applications, conference applications are preferably installed. Installed applications are listed in the [Conference Apps] section in the PSVIDEO.INI file. Applications are typically installed directly by an installation program. It is also possible for an application to install itself by calling cmInstall (if, for example, the PSVIDEO.INI file has been corrupted subsequent to installation).

Conference Application Registration

Before a conference application makes use of CMDLL services, it is loaded and registered with the DLL. An application registers with CMDLL 544 by calling cmRegister. This function is in the module cmreg.c. CMDLL 544 keeps track of registered applications, and for each registered application, a CONFAPP structure is filled in. CMDLL 544 has a dynamically allocated array of CONFAPP structures. This array is built based on the applications that are installed (i.e., specified in the [Conference Apps] sections in the PSVIDEO.INI file). If an application attempts to register without being installed cmRegister will fail.

After an application has registered with CMDLL 544, subsequent calls by said application do not require the application ID to be specified. CMDLL 544 keys off of the application's task handle and is able to map a task to an application ID. Registered applications are notified through a callback about certain events such as connection establishment and connection tear-down.

VCI Call Handler Callback

CMDLL 544 is responsible for handling VCI 506 call callback messages. Most messages are generated by the comm subsystem as a result of calls to VCI 506. All calls in VCI 506 are asynchronous, hence the messages in this callback. This callback, Lib_CfCallCallBack, is located in the module cmnotify.c and a pointer to the function is provided to VCI.DLL in the call CF_Init. The Lib_CfCallCallBack callback is defined as follows:

BOOL CALLBACK Lib_CfCallCallBack( IN UINT uiMsg, IN WPARAM wParam, IN LPARAM lParam )

Channel Pair Establishment

CMDLL 544 provides a high-level service, cmGetChannelPair, that enables conference applications to establish easily a channel pair, i.e., one channel for inbound data and one channel for outbound data. The cmGetChannelPair uses VCI services (which in turn use TIl services). Applications may establish up to 32 channel pairs. Each channel pair has an associated usage ID which is defined by the application. In this way, when an application establishes a channel pair, its peer (or peers in a multipoint environment) will know the purpose of channel pair. For example, one channel pair could be established for file transfer and another channel pair for control data. Appropriate channel handlers (e.g., TII/VCI) can thus be specified for different channels.

As noted earlier, CMDLL 544 keeps track of each application with an array of CONFAPP structures. A CONFAPP structure contains an array of structures of the type CHANNELPAIR, which is defined as follows:
    typedef struct tagCHANNELPAIR
    {
        HCHAN             hChanIn;    // input (receive) channel
        HCHAN             hChanOut;   // output (send) channel
        WORD              wState;     // channel pair state (CPS_*)
        CMCHAN_INFO       cmChanInfo; // channel info struct
    } CHANNELPAIR;
    In turn, each channel pair contains a CMCHAN_INFO structure, which is
    defined as follows:
    typedef struct tagCMCHAN_INFO
    {
        HCHAN             hChanIn;    // input (read) channel
        HCHAN             hChanOut;   // output (send) channel
        CHAN_INFO         chanInfo;   // channel information
        DWORD             dwTransId;  // transaction id
        BYTE              byUsageId;  // usage id
        BOOL              bOpener;    // TRUE if initiator of
                                         cmGetChannelPair, else FALSE
    } CMCHAN_INFO;


This structure, in turn, contains the CHAN_INFO structure defined by TII 510. When a channel pair is established, certain information is transferred between the peer CMDLLs. This information is transferred in the CHAN_INFO structure. Successful channel pair establishment happens as follows. First a connection is established. The application on Site A calls cmGetChannelPair. CMDLL then handles all the VCI details of establishing outbound and inbound data channels using the CF_OpenChannel and CF_AcceptChannel VCI calls. Once the two channels have been established at the CMDLL level, CMDLL calls the applications back with the channel handles.

Once the application receives the channel handles through the CMN_CHANNELPAIR message, the application registers a channel handler using the VCI call CF_RegisterChanHandler. The cmGetChannelPair procedure fills in the Id field of the CHAN_INFO structure and then calls CF_OpenChannel. The rest of the processing for setting up the channel pairs takes place in the channel manager callback Lib_CfChanCallBack. The Id field is important in that it identifies:

Which application is establishing a channel pair (which is important for CMDLL on the remote site so that it knows which application to notify).

The usage id for the channel pair (which is important for the remote application so that it knows what to do with the channel pair).

Whether the channel that is being opened is inbound or outbound.

Critical Sections

One of the key elements of CMDLL 544 is that it notifies conference applications of several events that take place in the comm subsystem, e.g., an incoming call. CMDLL 544 is also responsible for calling the comm subsystem in response to user-initiated events, e.g., hang up a call. CMDLL 544 is also responsible for starting and stopping playback of audio feedback through the PWave interface. CMDLL 544 may be interrupted by the comm subsystem while it is in the process of handling events initiated by the user. For example, while CMDLL 544 is in icmHangupCall processing, it may be interrupted by a CFM_REJECT_NTFY notification message from the comm subsystem. The critical section code prevents re-entrancy problems. It prevents the application from having to deal with call rejection messages when in fact it is already in the process of hanging up. Three global variables are declared in cmmisc.c for the purpose of critical sections:

UINT G_nProgressCriticalSection=0;

UINT G_nHangupCriticalSection=0;

UINT G_nRejectCriticalSection=0;

These variables are manipulated and examined in cmcall.c in icmHangupCall and in the handling of the CFM_REJECT_NTFY and CFM_PROGRESS_NTFY messages in the VCI call callback routine Lib.sub.13 CfCallCallBack in cmnotify.c.

Call Notification and Caller ID

CMDLL 544 is responsible for listening for incoming calls, notifying the user of incoming calls, and for rejecting or accepting the incoming call as specified by the user. On incoming calls, VCI 506 calls the Lib_CfCallCallBack with the CFM_CALL_NTFY message. As outlined in the VCI Call Handler Callback section, if caller ID is available (through the lParam of the callback message), then the callback function performs a series of address book queries to determine the name and additional information of the caller.

Once the series of address book queries have been completed, the procedure CallNotifyDlg_Do is called. It is responsible for calling one of two dialog box procedures: one if caller ID is unavailable from the comm subsystem or if the address book query failed, and a different dialog box if the address book query produced a match.

This procedure is also responsible for disabling all top-level windows (handset, remote, local, and snapshot). This is done to prevent the user from accessing other features when an incoming call is pending. Accessing other features when a call is pending causes re-entrancy problems.
    BOOL CallNotifyDlg_Do(
        IN HWND hWndOwner,   // owner of dialog windows (handset
                                window)
        IN HAB_REC hAbRec,   // address book record, possibly NULL
        IN LPSTR lpszCallerID // caller ID string from comm subsystem
    )


Audible Call Progress

CMDLL 544 is responsible for providing audible call progress. CMDLL 544 uses the PWave services for starting and stopping playback of wave files. PWave can play a sound both synchronously and asynchronously. In the synchronous case, the number of times to play a file is specified, and the StartWavePlayback procedure does not return until it is finished playing the file the specified number of times. In the asynchronous case, the StartWavePlayback procedure returns immediately. In this case, PWave allows CMDLL to stop wave file playback at any time. Audible call progress is provided in the following situations:

Incoming call:

The RINGIN.WAV wave file starts playing asynchronously in the WM_INITDIALOG case in the call notification dialog boxes. Playback ends when the user accepts the call, rejects the call, or the caller hangs up.

Incoming call in auto answer mode:

The AABEEP.WAV file is played once when a call comes in and the application is in auto-answer mode.

Outgoing call:

The RINGOUT.WAV file is played on the callers machine. This wave file is played to the caller to inform that the callee's machine is ringing. The RINGOUT.WAV file starts playing asynchronously when the comm subsystem calls the callback Lib_CfCallCallBack with the CFM_PROGRESS_NTFY message (with LOWORD(lParam) equal to CF_CALL_RINGING). Playback stops if the caller hangs up, or the callee accepts or rejects the call.

Busy signal:

The BUSY.WAV file is played once on the caller's machine if the callee is already in a video conference.

Error signal:

The PROBLEM.WAV file is played once in response to the CFM_REJECT_NTFY message in the callback Lib_CfCallCallBack.

The RINGIN.WAV file is the default wave file for incoming call notification. The user may optionally select a different wave file. This is done with a video preferences dialog box. If a different wave file has been selected, the PSVIDEO.INI file will contain the following entries: [Preferences]
    AudioPreference=1      ; 0 means use default
    Wavepath=c:.backslash.myring.wav ; if AudioPreference is 1, use this wave
     file
                           for incoming calls


The selected wave file meets the following criteria: sampled at 8 kHz, 8 bits, mono, and it is no larger than 64 K.

On Demand Annlication Invocation

Conventional electronic conferencing applications require that both sites be running versions of the conferencing application prior to initiating the sharing of information (i.e., conferencing). As a result, users must confirm (e.g., via an independent telephone call) that the appropriate applications are running before sharing information.

In a conferencing network comprising preferred embodiments of conferencing system 100, only one site need be running a conferencing application before information sharing can be initiated. Moreover, if possible, the same application on the remote site is launched to complete the sharing. Conference manager 544 of FIG. 5 provides these capabilities. Conference manager 544 allows an application to install, registerlunregister, make/hang-up calls, and establish/destroy communication channels. After successfully placing a call to a remote site, a conferencing application may try to establish a communication channel. In the process of establishing communication channels, the application is capable of being launched remotely if it is necessary. To accomplish this, all conferencing applications are assigned a unique application ID (i.e., APPID).

When an attempt to establish a communication channel is made, the application ID is used to identify the application for that channel. The conference manager 544 uses the APPID to determine (a) if the application is installed and (b) if the application is currently running. If the answer is yes to both of these questions, then the communication channel can be established immediately. If the answer is yes to (a) and no to (b), then the conference manager 544 is able to launch the desired application (via the Microsoft.RTM. WinExec function) and poll for registration. If the answer is no to both (a) and (b), then the communication channel will fail to be created.

Referring now to FIG. 33, there is shown a flow diagram of the on-demand application invocation processing of conferencing system 100 of FIG. 5, according to a preferred embodiment of the present invention. On-demand application invocation applies when a conferencing application (App #1) running in one conferencing system (Site A) attempts to establish a conference with another conferencing system (Site B), where App #1 is installed but not currently running in Site B. App #1 in Site A starts the process by causing a request for a comm channel to be sent to Site B to establish communication between Site A and Site B. The comm channel request identifies the application running in Site A (e.g., APPID for App #1).

As shown in FIG. 33, the conference manager 544 of Site B receives the comm channel request from Site A (step 3302). The conference manager 544 of Site B retrieves the application ID for App #1 from the comm channel request and determines whether App #1 is installed in Site B (step 3304). If App #1 is not installed in Site B (step 3306), then the requested conference cannot proceed and the conference manager 544 of Site B causes the comm channel request of Site A to be rejected (step 3308).

Otherwise, if App #1 is installed in Site B (step 3306), then the conference manager 544 of Site B determines whether App #1 is registered, indicating that App #1 is already running in Site B. If App #1 is registered (step 3310), then processing continues to step 3318 as described below. Otherwise, if App #1 is not registered (step 3310), then the conference manager 544 of Site B attempts to synchronously launch App #1 (by calling the WinExec function of the Microsoft.RTM. Windows.TM. operating system) and thereby inform App #1 of Site B that a call is in progress (step 3312).

After attempting to launch App #1, the conference manager 544 of Site B checks to see whether App #1 was successfully launched by determining whether App #1 is now registered. If App #1 is still not registered (step 3314), then something went wrong in launching App #1 in Site B and again the conference manager 544 of Site B causes the comm channel request of Site A to be rejected (step 3316). Otherwise, if App #1 is now registered (step 3314), then the conference manager 544 of Site B accepts the comm channel request from Site A (step 3318) and notifies App #1 of Site B that the comm channel is open (step 3320) allowing conferencing to proceed.

The pseudocode for the local site communication channel establishment (Site A) is as follows:

request a communication channel

get notified when it has been established (or failed)

The pseudocode for the remote site communication channel establishment (Site B) is as follows:

get a communication channel request

get the appid from the request

if the application is not installed then

reject the communication channel request

elseif the application is not registered then

Wined the application

inform the application there is a call in progress

if the application is still not registered then

reject the communication channel request

accept the communication channel request

notify application that comm channel is open

Through this on-demand invocation of applications, one conferencing system running a conferencing application can cause a remote conferencing system to invoke a corresponding application at the remote site. Those skilled in the art will understand that this capability alleviates the requirement for arranging for a conference by external means (e.g., via telephone) to coordinate the parallel independent launching of the corresponding conferencing applications in the remote sites.

Managing Multiple Applications

Comm API (i.e., transport independent interface (TII)) 510 of FIG. 5 establishes connections with remote conferencing systems for conferencing sessions. TII 510 also establishes one or more channels within each connection for use by conferencing applications (such as 502 and 504). These channels are used by the applications for transmitting or receiving different types of information with the remote conferencing systems. For example, audio/video conferencing application 502 uses four channels to transmit and receive audio and video signals to and from a remote conferencing system. Similarly, data conferencing application 504 uses two channels to transmit and receive data signals to and from a remote conferencing system.

The conference manager 544 of FIG. 5 provides the capability for two or more conferencing applications to share a single connection in a single conferencing session with a remote conferencing system. This capability allows two or more conferencing applications to participate in the same conferencing session simultaneously using a single connection between the local and remote conferencing systems.

Referring now to FIG. 34, there is shown a flow diagram of an example of the processing implemented within conferencing system 100 of FIG. 5 to manage two conferencing applications in a single conferencing session with a remote conferencing system, according to a preferred embodiment of the present invention. The processing of FIG. 34 begins with the audio/video conferencing application 502 by asking the conference manager 544 to establish a connection for conferencing with a remote conferencing system (step 3402). Application 502 makes this request by calling the cmMakeCall function of the conference manager 544.

The conference manager 544 passes the connection request to the conferencing API (VCI) 506 by calling the CF_MakeCall( ) function (step 3404). VCI 506 in turn passes the connection request to TII 510 by calling the MakeConnection function (step 3406). TII 510 causes the connection with the remote conferencing system to be established and also establishes four channels (i.e., transmit/receive audio/video) within that connection for the audio/video conferencing application 502 to use (step 3408). As part of this step, VCI 506 causes handles for the four channels to be passed back to application 502. TII 510 causes the connection and channels to be established by communicating with the peer TII 510 of the remote conferencing system.

Data conferencing application 504 then asks the conference manager 544 to establish channels within the established connection for transmitting and receiving data signals with the remote conferencing system (step 3410). Data conferencing application 504 knows that the connection has been established, because application 504 has already registered with the conference manager 544 and the conference manager 544 informs all registered applications of connections by sending the CMN_CALL message. Since data conferencing application 504 already knows that the connection has been established, application 504 makes the channel request by calling the cmGetChannelPair function of the conference manager 544.

The conference manager 544 then passes the channel request to the VCI 506 (by calling CF_OpenChannel) (step 3412). VCI 506 in turn passes the channel request to TII 510 (by calling OpenChannel) (step 3414). Conference manager 544 establishes the two requested channels for data conferencing application 504 within the already established connection with the remote conferencing system (step 3416). As part of this step, conference manager 544 causes handles for the two channels to be passed back to application 504.

The conferencing session is then able to proceed with both applications 502 and 504 using a single. connection with the remote conferencing system for its different channels (step 3418).

Conferencing API

Conferencing API 506 of FIG. 5 (also known as video conferencing interface (VCI)) facilitates the easy implementation of conferencing applications 502 and 504. Conferencing API 506 of FIG. 5 provides a generic conferencing interface between conferencing applications 502 and 504 and the video, comm, and audio subsystems. Conferencing API 506 provides a high-level abstraction of the services that individual subsystems (i.e., video, audio, and comm) support. The major services include:

Making, accepting, and hanging-up calls.

Mediating conference requirements between peers.

Establishing and terminating multiple communication channels for individual subsystems.

Instantiating and controlling local video and audio.

Sending video and audio to a remote site through the network.

Receiving, displaying, and controlling the remote video and audio streams.

Conferencing applications 502 and 504 can access these services through the high-level conferencing API 506 without worrying about the complexities of low-level interfaces supported in the individual subsystems.

In addition, conferencing API 506 facilitates the integration of individual software components. It minimizes the interactions between conferencing applications 502 and 504 and the video, audio, and comm subsystems. This allows the individual software components to be developed and tested independent of each other. Conferencing API 506 serves as an integration point that glues different software components together. Conferencing API 506 facilitates the portability of audio/video conferencing application 502.

Conferencing API 506 is implemented as a Microsoft.RTM. Windows.TM. Dynamic Link Library (DLL). Conferencing API 506 translates the function calls from conferencing application 502 to the more complicated calls to the individual subsystems (i.e., video, audio, and comm). The subsystem call layers (i.e., video API 508, comm API 510, and audio API 512) are also implemented in DLLs. As a result, the programming of conferencing API 506 is simplified in that conferencing API 506 does not need to implement more complicated schemes, such as dynamic data exchange (DDE), to interface with other application threads that implement the services for individual subsystems. For example, the video subsystem will use window threads to transmit/receive streams of video to/from the network.

Conferencing API 506 is the central control point for supporting communication channel management (i.e., establishing, terminating channels) for video and audio subsystems. Audio/video conferencing application 502 is responsible for supporting communication channel management for the data conferencing streams.

Referring now to FIG. 7, there is shown a block diagram of conference manager 544 and conferencing API 506 between conferencing applications 502 and 504, on one side, and comm API 508, video API 510, and audio API 512, on the other side, according to a preferred embodiment of the present invention. Conferencing API 506 comprises conferencing finite state machine (FSM) 702, conferencing primitive validator 704, conferencing primitive dispatcher 708, conferencing callback 706, comm primitive 712, comm callback 710, video primitive 716, and audio primitive 720 of FIG. 7.

Conferencing primitive validator 704 validates the syntax (e.g., checks the conferencing call state, channel state, and the stream state with the conferencing finite state machine (FSM) 702 table and verifies the correctness of individual parameters) of each API call. If an error is detected, primitive validator 704 terminates the call and returns the error to the application immediately. Otherwise, primitive validator 704 calls conferencing primitive dispatcher 708, which determines which subsystem primitives to invoke next.

Conferencing primitive dispatcher 708 dispatches and executes the next conferencing API primitive to start or continue to carry out the service requested by the application. Primitive dispatcher 708 may be invoked either directly from primitive validator 704 (i.e., to start the first of a set of conferencing API primitives) or from conferencing callback 706 to continue the unfinished processing (for asynchronous API calls).

After collecting and analyzing the completion status from each subsystem, primitive dispatcher 708 either (1) returns the concluded message back to the conferencing application by returning a message or invoking the application-provided callback routine or (2) continues to invoke another primitive to continue the unfinished processing.

There are a set of primitives (i.e., comm primitives 712, video primitives 716, and audio primitives 720) implemented for each API call. Some primitives are designed to be invoked from a callback routine to carry out the asynchronous services.

The subsystem callback routine (i.e., comm callback 710) returns the completion status of an asynchronous call to the comm subsystem to conferencing callback 706, which will conduct analysis to determine the proper action to take next. The comm callback 710 is implemented as a separate thread of execution (vthread.exe) that receives the callback Microsoft.RTM. Windows.TM. messages from the comm manager and then calls VCI DLL to handle these messages.

Conferencing callback 706 returns the completion status of an asynchronous call to the application. Conferencing callback 706 checks the current message/event type, analyzes the type against the current conferencing API state and the next primitive being scheduled to determine the actions to take (e.g., invoke another primitive or return the message to the application). If the processing is not complete yet, conferencing callback 706 selects another primitive to continue the rest of the processing. Otherwise, conferencing callback 706 returns the completion status to the application. The conferencing callback 706 is used only for comm related conferencing API functions; all other conferencing API functions are synchronous.

The major services supported by conferencing API 506 are categorized as follows:

Initialization and Call Services (establish/terminate a conference call).

Stream Services (capture, play, record, link, control the multimedia audio and video streams, and access and manipulate data from the streams).

Channel Services (establish/terminate channels on the call, and send/receive data on the channels).

Interfacing with the Comm Subsystem

Conferencing API 506 supports the following comm services with the comm subsystem:

Comm initialization--initialize a session in the comm subsystem on which the call will be made.

Call establishment--place a call to start a conference.

Channel establishment--establish two comm channels for video conferencing control information, two comm channels for audio (incoming/outgoing), four comm channels for video (incoming data and control and outgoing data and control).

Call termination--hang up a call and close all active channels.

Comm Initialization/Uninitialization

Initialization of a session in the comm subsystem on which a call may be made by the user of conferencing system A of FIG. 1 and the user of conferencing system B of FIG. 1 is implemented as follows:

Conferencing APIs A and B call BeginSession to initialize their comm subsystems.

Conferencing APIs A and B enter a PeekMessage loop waiting for a SESS_BEGIN callback from the comm subsystem.

Uninitialization of a session in the comm subsystem is implemented as follows:

Conferencing APIs A and B call EndSession to uninitialize their comm subsystems.

Conferencing APIs A and B receive a SESS_CLOSED callback from the comm subsystem.

Conferencing APIs A and B then notify the conferencing applications with a CFM_UNINIT_NTFY callback.

Call Establishment

Establishment of a call between the user of conferencing system A of FIG. 1 and the user of conferencing system B of FIG. 1 is implemented as follows:

Conferencing API A calls MakeConnection to dial conferencing API B's number.

Conferencing API B receives a CON_REQUESTED callback from the comm subsystem.

Conferencing API B sends the call notification to the graphic user interface (GUI) with a CFM_CALL_NTFY callback; and if user B accepts the call via the GUI, conferencing API B proceeds with the following steps.

Conferencing API B calls AcceptConnection to accept the incoming call from conferencing API A.

Conferencing APIs A and B receive CONN_ACCEPTED callback from the comm subsystem.

Conferencing API A calls OpenChannel to open its outgoing conferencing control channel.

Conferencing API B receives the CHAN_REQUESTED callback for the incoming control channel and accepts it via AcceptChannel. Then Conferencing API B calls OpenChannel to open its outgoing conferencing control channel.

Conferencing API A receives the CHAN_ACCEPTED callback for its outgoing control channel and calls RegisterChanHandler to receive channel callbacks from the comm subsystem. Then Conferencing API A receives the CHAN_REQUESTED callback for the incoming control channel and accepts it via AcceptChannel.

Conferencing API B receives the CHAN_ACCEPTED callback for its outgoing control channel and calls RegisterChanHandler to receive channel callbacks from the comm subsystem.

Conferencing API A sends a Login Request on the control channel, which Conferencing API B receives.

Conferencing API B sends a Login Response on the control channel, which Conferencing API A receives.

Conferencing API A sends a Capabilities Request on the control channel, specifying conference requirements, which Conferencing API B receives.

Conferencing API B sends a Capabilities Response on the control channel, accepting or modifying conference requirements, which Conferencing API A receives.

Conferencing API A calls OpenChannel to open its outgoing audio channel.

Conferencing API B receives the CHAN_REQUESTED callback for the incoming audio channel and accepts it via AcceptChannel.

Conferencing API A receives the CHAN_ACCEPTED callback for the outgoing audio channel.

The last three steps are repeated for the video data channel and the video control channel.

Conferencing API B then turns around and repeats the above 4 steps (i.e., opens its outbound channels for audio/video data/video control).

Conferencing API A sends Participant Information on the control channel, which Conferencing API B receives.

Conferencing API B sends Participant Information on the control channel, which Conferencing API A receives.

Conferencing APIs A and B then notify the conferencing applications with a CFM_ACCEPT_NTFY callback.

Channel Establishment

Video and audio channel establishment is implicitly done as part of call establishment, as described above, and need not be repeated here. For establishing other channels such as data conferencing, the conferencing API passes through the request to the comm manager, and sends the comm manager's callback to the user's channel manager.

Call Termination

Termination of a call between users A and B is implemented as follows (assuming user A hangs up):

Conferencing API A unlinks local/remote video/audio streams from the network.

Conferencing API A then calls the comm manager's CloseConnection.

The comm manager implicitly closes all channels, and sends Chan_Closed callbacks to conferencing API A.

Conferencing API A closes its remote audio/video streams on receipt of the Chan_Closed callback for its inbound audio/video channe ls, respectively.

Conferencing API A then receive s the CONN_CLOSE_RESP from the comm manager after the call is cleaned up completely. Conferencing API A notifies its application via a CFM_HANG_NTFY.

In the meantime, the comm manager on B would have received the hangup notification, and would have closed its end of all the channels, and notified conferencing API B via Chan_Closed.

Conferencing API B closes its remote audio/video streams on receipt of the Chan_Closed callback for its inbound audio/video channels, respectively.

Conferencing API B unlinks its local audio/video streams from the network on receipt of the Chan_Closed callback for its outbound audio/video channels, respectively.

Conferencing API B then receives a CONN_CLOSED notification from its comm manager. Conferencing API B notifies its application via CFM_HANGUP_NTFY.

Interfacing with the Audio and Video Subsystems

Conferencing API 506 supports the following services with the audio and video subsystems:

Capture/monitor/transmit local video streams.

Capture/transmit local audio streams.

Receive/play remote streams.

Control local/remote streams.

Snap an image from local video stream.

Since the video and audio streams are closely synchronized, the audio and video subsystem services are described together.

Capture/Monitor/Transmit Local Streams

The local video and audio streams are captured and monitored as follows:

Call AOpen to open the local audio stream.

Call VOpen to open the local video stream.

Call ACapture to capture the local audio stream from the local hardware.

Call VCapture to capture the local video stream from the local hardware.

Call VMonitor to monitor the local video stream.

The local video and audio streams are begun to be sent out to the remote site as follows:

Call ALinkOut to connect the local audio stream to an output network channel.

Call VLinkOut to connect the local video stream to an output network channel.

The monitoring of the local video stream locally is stopped as follows:

Call VMonitor(off) to stop monitoring the local video stream.

Receive/Play Remote Streams

Remote streams are received from the network and played as follows:

Call AOpen to open the local audio stream.

Call VOpen to open the local video stream.

Call ALinkIn to connect the local audio stream to an input network channel.

Call VLinkIn to connect the local video stream to an input network channel.

Call APlay to play the received remote audio stream.

Call VPlay to play the received remote video stream.

Control Local/Remote Streams

The local video and audio streams are paused as follows:

Call VLinkout(off) to stop sending local video on the network.

Call AMute to stop sending local audio on the network.

The remote video and audio streams are paused as follows:

If CF_PlayStream(off) is called, conferencing API calls APlay(off) and VPlay(off).

The local/remote video/audio streams are controlled as follows:

Call ACntl to c ontrol the gains of a local audio stream or the volume of the remote audio stream.

Call VCntl to control such parameters as the brightness, tint contrast, color of a local or remote video stream.

Snap an Image from Local Video Streams

A snapshot of the local video stream is taken and returned as an image to the application as follows:

Call VGrabframe to grab the most current image from the local video stream. Conferencing API 506 supports the following function calls by conferencing applications 502 and 504 to the video, comm, and audio subsystems:
    CF_Init                Reads in the conferencing configuration parameters
     from an initialization file;
                           loads and initializes the software of the comm,
     video, and audio subsystems by
                           allocating and building internal data structures;
     allows the application to
                           choose between the message and the callback routines
     to return the event
                           notifications from the remote site.
    CF_MakeCall            Makes a call to the remote site to establish a
     connection for conferencing. The
                           call is performed asynchronously.
    CF_AcceptCall          Accepts call initiated from the remote site based on
     information received in the
                           CFM_CALL_NTFY message as delivered to the graphical
     user interface.
    CF_RejectCall          Rejects incoming call, if appropriate, upon
     receiving a CFM_CALL_NTFY
                           message as delivered to the GUI.
    CF_HangupCall          Hangs up a call that was previously established;
     releases all resources,
                           including all types of streams and data structures,
     allocated during the call.
    CF_GetCallInfo         Returns the information about the specified call,
     including its current state.
    CF_CapMon              Starts the capture of analog video signals from the
     local camera and displays
                           the video in the local video window which is
     pre-opened by the application.
                           This function allows the user to preview his/her
     appearance before sending the
                           signals out to the remote site.
    CF_PlayRcvd            Starts the reception and display of remote video
     signals in the remote video
                           window, which is pre-opened by the application;
     starts the reception and play
                           of remote audio signals through the local speaker.
    CF_DestroyStream       Destroys the specified stream group that was created
     by CF_CapMon or
                           CF_PlayRcvd. As part of the destroy process, all
     operations (e.g.,
                           sending/playing) being performed on the stream group
     will be stopped and all
                           allocated system resources will be freed.
    CF_Mute                Uses AMute to turn on/off the mute function being
     performed on the audio
                           stream of a specified stream group. This function
     will temporarily stop or
                           restart the related operations, including playing
     and sending, being performed
                           on this stream group. This function may be used to
     hold temporarily one
                           audio stream and provide more bandwidth for other
     streams to use.
    CF_SnapStream          Takes a snapshot of the video stream of the
     specified stream group and returns
                           a still image (reference) frame to the application
     buffers indicated by the
                           hBuffer handle.
    CF_ControlStream       Controls the capture or playback functions of the
     local or remote video and
                           audio stream groups.
    CF_SendStream          Uses ALinkOut to pause/unpause audio.
    CF_GetStreamInfo       Returns the current state and the audio video
     control block (AVCB) data
                           structure, preallocated by the application, of the
     specified stream groups;
    CF_PlayStream          Stops/starts the playback of the remote audio/video
     streams by calling
                           APlay/VPlay.
    CF_GetAudVidStream     Returns the audio and video stream handles for the
     specified stream group.
    CF_RegisterChanMgr     Registers a callback or an application window whose
     message processing
                           function will handle notifications generated by
     network channel initialization
                           operations. This function is invoked before any
     CF_OpenChannel calls are
                           made.
    CF_OpenChannel         Requests to open a network channel with the peer
     application. The result of
                           the action is given to the application by invoking
     the callback routine specified
                           by the call to CF_RegisterChanMgr. The application
     specifies an ID for this
                           transaction. This ID is passed to the callback
     routine or posted in a message.
    CF_AcceptChannel       A peer application can issue CF_AcceptChannel in
     response to a
                           CFM_CHAN_OPEN_NTFY message that has been received.
     The result of the
                           CF_AcceptChannel call is a one-way network channel
     for receiving data.
    CF_RejectChannel       This routine rejects a CFM_CHAN_OPEN_NTFY from the
     peer.
    CF_RegisterChanHandler This function registers a callback or an application
     window whose
                           message processing function will handle
     notifications generated by TII
                           network channel IO activities. The channels that are
     opened will
                           receive TII CHAN_DATA_SENT notifications, and the
     channels that
                           are accepted will receive TII CHAN_RCV_COMPLETE
     notifications.
    CF_CloseChannel        This routine will close a network channel that was
     opened by
                           CF_AcceptChannel or CF_OpenChannel. The handler for
     this channel is
                           automatically de-registered.
    CF_SendData            Send data to peer. If the channel is not reliable
     and there are no receive
                           buffers posted on the peer machine, the data will be
     lost.
    CF_RecvData            Data is received through this mechanism. Normally
     this call is issued in order
                           to post receive buffers to the system. When the
     system has received data in
                           the given buffers, the Channel Handler will receive
     the TII
                           CHAN_RCV_COMPLETE notification.
    CF_GetChanInfo         This function will return various statistical
     information about a channel. For
                           example: bandwidth information, number of
     sends/second, number of
                           receives/second, etc.


These functions are defined in further detail later in this specification in APPENDIX A entitled "Conference Manager API."

In addition, conferencing API 506 supports the following messages returned to conferencing applications 502 and 504 from the video, comm, and audio subsystems in response to some of the above-listed functions:
    CFM_CALL_NTFY          Indicates that a call request initiated from the
     remote site has been
                           received.
    CFM_PROGRESS_NTFY      Indicates that a call state/progress notification
     has been received from
                           the local phone system support.
    CFM_ACCEPT_NTFY        Indicates that the remote site has accepted the call
     request issued
                           locally. Also sent to the accepting application when
     CF_AcceptCall
                           completes.
    CFM_REJECT_NTFY        Indicates that the remote site has rejected or the
     local site has failed to
                           make the call.
    CFM_HANGUP_NTFY        Indicates that the local or remote site has hung up
     the call.
    CFM_UNINIT_NTFY        Indicates that uninitialization of comm subsystem
     has completed.
    CFM_ERROR_NTFY         Indicates that a SESS_ERROR was received from comm
     subsystem.


Referring now to FIG. 8, there is shown a representation of the conferencing call finite state machine (FSM) for a conferencing session between a local conferencing system (i.e., caller) and a remote conferencing system (i.e., callee), according to a preferred embodiment of the present invention. The possible conferencing call states are as follows:
    CCST_NULL          Null State - state of uninitialized caller/callee.
    CCST_IDLE          Idle State - state of caller/callee ready to make/
                       receive calls.
    CCST_CALLING       Calling state - state of caller trying to call callee.
    CCST_CALLED        Called state - state of callee being called by caller.
    CCST_ACCEPTING     Accepting state - state of accepting call from
                       caller.
    CCST_CONNECTED     Call state - state of caller and callee during
                       conferencing session.
    CCST_CLOSING       A hangup or call cleanup is in progress.


At the CCST_CONNECTED state, the local application may begin capturing, monitoring, and/or sending the local audio/video signals to the remote application. At the same time, the local application may be receiving and playing the remote audio/video signals.

Referring now to FIG. 9, there is shown a representation of the conferencing stream FSM for each conferencing system participating in a conferencing session, according to a preferred embodiment of the present invention. The possible conferencing stream states are as follows:
    CSST_INIT       Initialization state - state of local and remote streams
                    after CCST_CONNECTED state is first reached.
    CSST_ACTIVE     Capture state - state of local stream being captured.
                    Receive state - state of remote stream being received.
    CSST_FAILURE    Fail state - state of local/remote stream after resource
                    failure.


Conferencing stream FSM represents the states of both the local and remote streams of each conferencing system. Note that the local stream for one conferencing system is the remote stream for the other conferencing system.

In a typical conferencing session between a caller and a callee, both the caller and callee begin in the CCST_NULL call state of FIG. 8. The conferencing session is initiated by both the caller and callee calling the function CF_Init to initialize their own conferencing systems. Initialization involves initializing internal data structures, initializing communication and configuration information, and verifying the local user's identity. The CF_Init function takes both the caller and callee from the CCST_NULL call state to the CCST_IDLE call state. The CF_Init function also places both the local and remote streams of both the caller and callee in the CSST_INIT stream state of FIG. 9.

Both the caller and callee call the CF_CapMon function to start capturing local video and audio signals and playing them locally, taking both the caller and callee local stream from the CSST_INIT stream state to the CSST_ACTIVE stream state. Both the caller and callee may then call the CF_ControlStream function to control the local video and audio signals, leaving all states unchanged.

The caller then calls the CF_MakeCall function to initiate a call to the callee, taking the caller from the CCST_IDLE call state to the CCST_CALLING call state. The callee receives and processes a CFM_CALL_NTFY message indicating that a call has been placed from the caller, taking the callee from the CCST_IDLE call state to the CCST_CALLED call state. The callee calls the CF_AcceptCall function to accept the call from the caller, taking the callee from the CCST_CALLED call state to the CCST_ACCEPTING call state. The caller and callee receive and process a CFM_ACCEPT_NTFY message indicating that the callee accepted the call, taking the caller and callee from the CCST_CALLING/CCST_ACCEPTING call states to the CCST_CONNECTED call state.

Both the caller and callee then call the CF_PlayRcvd function to begin reception and play of the video and audio streams from the remote site, leaving all states unchanged. Both the caller and callee call the CF_SendStream function to start sending the locally captured video and audio streams to the remote site, leaving all states unchanged. If necessary, both the caller and callee may then call the CF_ControlStream function to control the remote video and audio streams, again leaving all states unchanged. The conferencing session then proceeds with no changes to the call and stream states. During the conferencing session, the application may call CF_Mute, CF_PlayStream, or CF_SendStream. These affect the state of the streams in the audio/video managers, but not the state of the stream group.

When the conferencing session is to be terminated, the caller calls the CF_HangupCall function to end the conferencing session, taking the caller from the CCST_CONNECTED call state to the CCST_IDLE call state. The callee receives and processes a CFM_HANGUP_NTFY message from the caller indicating that the caller has hung up, taking the callee from the CCST_CONNECTED call state to the CCST_IDLE call state.

Both the caller and callee call the CF_DestroyStream function to stop playing the remote video and audio signals, taking both the caller and callee remote streams from the CSST_ACTIVE stream state to the CSST_INIT stream state. Both the caller and callee also call the CF_DestroyStream function to stop capturing the local video and audio signals, taking both the caller and callee local streams from the CSST_ACTIVE stream state to the CSST_INIT stream state.

This described scenario is just one possible scenario. Those skilled in the art will understand that other scenarios may be constructed using the following additional functions and state transitions:

If the callee does not answer within a specified time period, the caller automatically calls the CF_HangupCall function to hang up, taking the caller from the CCST_CALLING call state to the CCST_IDLE call state.

The callee calls the CF RejectCall function to reject a call from the caller, taking the callee from the CCST_CALLED call state to the CCST_IDLE call state. The caller then receives and processes a CFM_REJECT_NTFY message indicating that the callee has rejected the caller's call, taking t