Coherency (e.g., same view to multiple users)

System and methods for synchronizing two or more datasets

6295541

Abstract

Synchronization system and associated methods provide synchronization of an arbitrary number of datasets, including more than two datasets. To achieve this, a reference dataset is used to store a super-set of the latest or most-recent data from all user datasets to provide a repository of information that is available at all times. Therefore, if the user later wishes to synchronize a new user dataset, such as one in a server computer that stores user information, the system already has all the information necessary for synchronizing the new dataset, regardless of whether any of the other datasets are then available. Further, to simplify use, a unified user interface is provided that allows the user to easily determine which of his or her datasets are currently set to be synchronized and allows the user to conveniently alter the current settings to select one, two, or even more than two clients for synchronization. Various "conflict" or "duplicate" resolution strategies are described for intelligently handling complexities resulting from allowing synchronization for an arbitrary number of datasets and allowing synchronization using even data from datasets that are not available. Architectural support for "plug-in" client accessors and type modules is also provided. This allows support to be added for new datasets or new types of data merely by developing and plugging in new, compact client accessors or type modules, without updating or replacing the core synchronization engine.


Claims

What is claimed is:

1. In an information processing system, a method for synchronizing an arbitrary number of multiple datasets residing on different devices, the method comprising:

accepting a designation of an arbitrary number of multiple datasets comprising more than two datasets from different devices to be synchronized, wherein at least one of the designated datasets includes one record to be synchronized with all others of the designated datasets;

in response to said designation, creating a reference dataset that is used to store a super-set of most-recent data from all of the designated datasets, said reference dataset having records that are uniquely identified in a manner that is independent of how records are identified in any given one of the designated datasets, thereby providing a repository of information that is available independent of unavailability of one of said devices at a given point in time; and

in response to the designation and based on the information stored by said reference dataset, synchronizing all of the designated datasets without requiring further designating of datasets to be synchronized, wherein, after the step of synchronizing, each of the designated datasets includes a record that corresponds to, and is in a synchronized state with, the one record.

2. The method of claim 1 wherein the number of designated datasets is N, and the step of synchronizing the N designated datasets comprises performing a sequence of binary reconciliations, each involving at least one of the N designated datasets, to obtain at least one dataset that is globally-synchronized, wherein any globally-synchronized dataset includes propagated changes, if any exist and were not discarded due to conflict resolution, from each of the N designated datasets, and any dataset by definition includes changes, if any, propagated from itself.

3. The method of claim 2 wherein the step of synchronizing the N designated datasets further comprises, after the step of performing the sequence of binary reconciliations, synchronizing each dataset, of the N designated datasets, that is not yet globally-synchronized with another dataset that is already globally-synchronized.

4. The method of claim 3 wherein a reference dataset is involved in each of at least N-1 binary reconciliations in the sequence of binary reconciliations.

5. The method of claim 3 wherein the sequence of binary reconciliations comprises at most N binary reconciliations, wherein at least (N-1) of the at most N binary reconciliations each involves at least one of the N designated datasets.

6. The method of claim 3 wherein:

the sequence of binary reconciliations includes a first binary reconciliation involving a first dataset and a second, later binary reconciliation;

the first dataset includes a record that has a modification time; and

the step of performing the sequence of binary reconciliations comprises causing, during the first binary reconciliation, a recording of the modification time such that the modification time is available for use after start of the second binary reconciliation.

7. The method of claim 6 wherein:

the second binary reconciliation involves a second and a third dataset, each of which is not the first dataset; and

the step of performing the sequence of binary reconciliations further comprises comparing the recorded modification time to another time in resolving a conflict between a record from the second dataset and a record that is already in a synchronized state with regard to the first record.

8. The method of claim 1 wherein the step of synchronizing the more than two designated datasets comprises:

providing a synchronizer dataset that contains records, wherein the synchronizer dataset reflects a result of an earlier synchronization;

comparing each of the more than two designated datasets to the synchronizer dataset to identify an addition, update, or deletion of a record in the each designated dataset;

changing at least two of the designated datasets based on the identified addition, update, or deletion of a record; and

changing the synchronizer dataset based on the identified addition, update, or deletion of a record.

9. An information processing system for synchronizing an arbitrary number of multiple datasets from different devices, the system comprising:

means for accepting a designation of an arbitrary number of datasets comprising more than two datasets from different devices to be synchronized, wherein at least one of the designated datasets includes one record to be synchronized with all others of the designated datasets;

means, responsive to said designation, for creating a reference dataset that is used to store a super-set of most-recent data from all of the designated datasets, said data comprising records uniquely identified within the reference dataset in a manner that is independent of how data is identified within the designated datasets, for providing a repository of information that is available independent of unavailability of one of said devices at a given point in time; and

means for synchronizing, in response to the designation and based on the information stored by said reference dataset, the more than two designated datasets without requiring further designating of datasets to be synchronized, wherein, after the step of synchronizing, each of the more than two designated datasets includes a record that corresponds to, and is in a synchronized state with, the one record.

10. A computer program product for use with a computer system comprising:

a computer-readable storage medium;

computer code on the storage medium for instructing the computer system to accept a designation of an arbitrary number of datasets comprising more than two datasets from different devices to be synchronized, wherein at least one of the designated datasets includes one record to be synchronized with all others of the designated datasets;

computer code on the storage medium, responsive to said designation, for creating a reference dataset that is used to store a super-set of most-recent data from all of the designated datasets, said data comprising records uniquely identified within the reference dataset in a manner that is independent of how data is identified within the designated datasets, for providing a repository of information that is available independent of unavailability of one of said devices at a given point in time; and

computer code on the storage medium for instructing the computer system to synchronize, in response to the designation and based on the information stored by said reference dataset, the more than two designated datasets without requiring further designating of datasets to be synchronized, wherein, after the step of synchronizing, each of the more than two designated datasets includes a record that corresponds to, and is in a synchronized state with, the one record.

11. A method capable of synchronizing more than two designated datasets, the method comprising:

providing a synchronizer dataset that contains records, wherein the synchronizer dataset reflects a result of an earlier synchronization of the designated datasets, and wherein each record in the synchronizer dataset is identified by a globally-unique record identifier that is independent of how any corresponding records from the designated datasets are identified;

creating a table that associates each globally-unique record identifier to any corresponding records residing on the designated datasets;

using said synchronizer dataset and said table, comparing values derived from each of the more than two designated datasets to values derived from the synchronizer dataset to identify a change of a record in the each designated dataset since an earlier synchronization involving the each designated dataset; and

changing at least two of the designated datasets based on the identified change of a record.

12. The method of claim 11 wherein the step of comparing each of the designated datasets to the synchronizer dataset comprises, for each of the designated datasets, comparing values derived from records from the each designated dataset to values derived from the synchronizer dataset to determine whether any records of the each designated dataset are updates or additions, or whether any record of the each designated dataset has been deleted, since a prior synchronization involving the each designated dataset.

13. The method of claim 12 wherein the step of comparing values derived from records comprises determining a particular record in the each designated dataset to be an update, if the particular record possesses a modification time that is later than a particular time.

14. The method of claim 13 wherein the particular time is a priority time associated with the corresponding record in the synchronizer dataset.

15. The method of claim 14 wherein the priority time is indicative of a modification time, herein referred to as the original priority time, of a record in a particular dataset, wherein values derived from the record in the particular dataset were used to change the corresponding record in the synchronizer dataset due to an earlier synchronization.

16. The method of claim 13 wherein the particular time is a time established based on a most recent earlier synchronization that involved the each designated dataset.

17. The method of claim 18 wherein the particular time is indicative of a time no earlier than a latest modification time, from any record of the each designated dataset, that existed prior to and was seen during the most recent earlier synchronization that involved the each designated dataset.

18. The method of claim 12 further comprising resolving conflicts among at least determined updates or deletions corresponding to a single record of the synchronizer dataset.

19. The method of claim 18 wherein:

the step of providing a synchronizer dataset comprises providing status information regarding contents of at least a particular dataset, of the designated datasets, according to an earlier synchronization involving the particular dataset; and

the step of comparing values derived from records further comprises:

determining a particular record in the each designated dataset to be an addition, if the particular record has no corresponding record in the synchronizer dataset; and

determining that the particular dataset includes a deletion, if the particular dataset no longer includes a valid record that corresponds to a particular record in the synchronizer dataset, wherein the particular dataset did have such a valid record at the end of an earlier synchronization, according to the status information.

20. The method of claim 19 wherein the step of comparing values derived from records further comprises:

comparing a CRC-type result derived from the particular record in the each designated dataset with a CRC-type result derived from the particular record during the earlier synchronization involving the each designated dataset; and

determining the particular record in the each designated dataset to be an update, if the CRC-type results differ.

21. The method of claim 19 wherein the step of comparing values derived from records comprises determining the particular record in the each designated dataset to be an update only if the particular record in the each designated dataset possesses a modification time that is later than a particular time.

22. The method of claim 19 further comprising:

comparing at least one value derived from a determined addition of a record with values derived from at least one other record to determine whether the determined addition represents a duplication of records.

23. The method of claim 22 further comprising:

comparing at least one value derived from a determined update of a record with values derived from at least one other record to determine whether the determined update represents a duplication of records; and

resolving any determined duplication of records.

24. The method of claim 19 further comprising, for at least one of the designated datasets:

identifying a change of a record in the synchronizer dataset since an earlier synchronization involving the at least one designated dataset; and

changing the at least one designated dataset based on the identified change in the synchronizer dataset.

25. The method of claim 24 wherein the step of resolving conflicts comprises resolving conflicts among a plurality of determined updates or deletions corresponding to the single record of the synchronizer dataset, wherein the plurality of determined updates or deletions includes an update or deletion identified in the synchronizer dataset and an update or deletion identified in one of the designated datasets.

26. The method of claim 24 wherein the step of resolving conflicts comprises resolving conflicts among more than two updates or deletions that correspond to the single record of the synchronizer dataset by giving precedence to one of the more than two updates or deletions over others of the more than two updates or deletions.

27. The method of claim 11 wherein at least one of the designated datasets is a non-participant in the earlier synchronization the result of which is reflected in the provided synchronizer dataset.

28. The method of claim 11 wherein all of the designated datasets are non-participants in the earlier synchronization the result of which is reflected in the provided synchronizer dataset.

29. The method of claim 28 further comprising, in the earlier synchronization, changing a first record in the synchronizer dataset; and after the earlier synchronization, in synchronizing the designated datasets:

identifying the changed first record in the synchronizer dataset to be an addition, update, or deletion; and

changing the designated datasets based on the first record in the synchronizer dataset.

30. A system for synchronizing more than two designated datasets, the system comprising:

means for providing a synchronizer dataset that contains records, wherein the synchronizer dataset reflects a result of an earlier synchronization and wherein each record in the synchronizer dataset is identified by a globally-unique record indentifier that is independent of how any corresponding records form the designated datasets are identified;

means for creating a table that associates each globally-unique record identifier to any corresponding records residing on the designated datasets;

using said synchronizer dataset and said table, comparing values derived from means, responsive to said synchronizer dataset and said table, for comparing each of the more than two designated datasets to the synchronizer dataset to idenifty an addition, update, or deletion of a record in the each designated dataset since an earlier synchronization involving the each designated dataset;

means for updating at least two of the designated datasets based on the identified addition, update, or deletion of a record; and

means for updating the synchronizer dataset based on the identified addition, update, or deletion of a record.


Description

COPYRIGHT NOTICE

A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.

BACKGROUND OF THE INVENTION

The present invention relates generally to management of information or datasets stored on electronic devices and, more particularly, to a system implementing methods for maintaining synchronization of datasets among one, two, or more such devices.

With each passing day, there is ever increasing interest in providing synchronization solutions for connected information appliances. Here, the general environment includes "appliances" in the form of electronic devices including, but not limited to, cellular phones, pagers, other hand-held devices (e.g., REX.TM., PalmPilot.TM. and Windows.TM. CE devices), personal computers (PCs) of all types and sizes, and Internet or intranet access devices (e.g., PCs or embedded computers running, for example, Java virtual machines or browsers or Internet Protocol (IP) handlers).

A problem facing such an environment today is that these devices, and the software applications running on these devices, do not communicate well with one another and are typically not designed with data synchronization in mind. In particular, a problem exists as to how one integrates information--such as calendaring, scheduling, and contact information--among disparate devices and software applications. Consider, for instance, a user who has his or her appointments on a desktop PC at work, but also has a notebook computer at home and a battery-powered, hand-held device for use in the field. What the user really wants is for the information (e.g., appointments), in each device to remain synchronized with corresponding information in all devices in a convenient, transparent manner. Still further, some devices (e.g., PCs) are typically connected at least occasionally to a server computer, e.g., an Internet server, which stores information for the user. The user would of course like the information on the server computer to participate in the synchronization, so that the server also remains synchronized.

There have been attempts to solve the problem of synchronizing datasets across different devices or software applications. An early approach to maintaining consistency between datasets was to import or copy one dataset on top of another. This simple "one-way" approach, one which overwrites a target dataset without any attempt at reconciling any differences, is inadequate for all but the simplest of applications. Expectedly, more sophisticated synchronization techniques were developed. In particular, techniques were developed for synchronization of exactly two datasets by attempting to reproduce in each dataset the changes found in the other dataset since a previous synchronization.

The conventional synchronization techniques, which are limited to such pair-wise, or binary, synchronization, may satisfy an information appliance user who wishes to synchronize exactly two datasets. However, these conventional techniques do not adequately satisfy the modem information appliance user, who is accumulating ever more information appliances and software applications and frequently wants to synchronize information from more than just two datasets. To synchronize more than two datasets, such a user is forced by the conventional synchronization techniques to try to achieve the correct synchronization result using successive binary synchronizations. This approach is not only inefficient and generally inconvenient and confusing for the user, but is also fraught with potential for human error, and, further, may produce incorrect results in certain situations.

As an example, consider a user who needs to synchronize information, e.g., scheduling information, from three sources: a PalmPilot organizer device ("Pilot"), a Windows CE device ("CE Device"), and a Microsoft Outlook PIM (Personal Information Manager) software application ("Outlook") on a PC. The user may first perform a Pilot-and-Outlook synchronization, typically using a first synchronization software program that is designed and sold specifically for the Pilot. Next, the user may perform an Outlook-and-CE-Device synchronization, typically using a second synchronization program that is designed and sold specifically for the CE Device. At this point, the user may think he or she is finished. However, he or she in reality must still perform another Pilot-and-Outlook synchronization to ensure that the Pilot correctly includes the latest information from the CE Device. Unfortunately, conventional synchronization programs do not tell the user of the need for this additional synchronization.

In general, the conventional synchronization programs do not adequately help the user in keeping more than two datasets synchronized. For example, the conventional programs do not provide a single user interface (UI) that allows the user to select more than two datasets to be synchronized with every other selected dataset. The conventional programs also do not automatically perform the synchronization of data in more than two datasets for the user without further user input. At most, a conventional synchronization program may permit the user to select a pair of datasets having data of one type (e.g., contacts data) for synchronization and a possibly different pair of datasets having data of a different type (e.g., calendar data) for synchronization. Although the user selects more than two datasets in total, any actual synchronization involves only two datasets. More particularly, synchronization of similar data involves only two datasets. Put another way, no single data record on a particular dataset will effect changes in more than one other dataset in response to a user's selection of datasets using the conventional program. In short, the conventional synchronization scheme is inconvenient and susceptible to human error, especially by less technically-savvy users, and especially as the number of datasets above two increases for a user.

The conventional synchronization scheme suffers from a further problem in that when more than two datasets are synchronized using multiple binary synchronizations, dataset status information (e.g., data modification times) that may be available during earlier-performed synchronizations may not be available during later-performed synchronizations. The failure to retain these status information may be intentional and also reasonable under an assumption that only two datasets need ever be synchronized. The failure to retain these status information may also be caused in part by the design limitations of particular information devices or software applications. Whatever the cause, the unavailability of status information during later-performed synchronizations can lead to erroneous results when, for example, conflicts are resolved during later synchronizations, as will be described in a later section.

Another characteristic of the modem environment for connected information appliances is that new appliances are introduced with ever increasing frequency. Conventionally, an entirely new synchronization software application is released with each new appliance. Such a synchronization application is dedicated to synchronizing data in a single particular information appliance with data in one or several popular PIM applications on a PC. Consequently, the user must buy and install a separate synchronization application, which typically is multi-megabytes long, for each new information appliance. This approach is undesirable for a variety of reasons, including the reason that such large software applications typically cannot be quickly downloaded from the Internet by telephone modem or low-bandwidth channels. Furthermore, should it become desirable for the synchronization programs to synchronize additional types of data, e.g., electronic mail data or expense logging data, or for the synchronization programs to handle additional PC or other software applications, the synchronization software provider typically must upgrade and re-release every affected software application, and the user typically must obtain and install every affected and updated software application.

What is needed is a system and methods for synchronizing data among two or more datasets that is efficient, correct, and resistant to human error, even when more than two datasets need to be synchronized. What is also needed is such a system and method that can easily be expanded to handle new datasets and new data types. At the same time, the approach should be automated so that the user is provided with "one-click" convenience. The present invention fulfills these and other needs.

SUMMARY OF THE INVENTION

The present invention provides a versatile synchronization system and associated methods that provide user-configurable, easily-extensible synchronization among one, two, or more than two user datasets. One aspect of the present invention is that it presents a unified user interface to the user for synchronizing an arbitrary number of the user's datasets (or, "synchronization clients"). Preferably, this user interface includes a client map that allows a user to quickly determine which of his or her clients are currently set to be synchronized and allows the user to conveniently alter the current settings to select one, two, or more than two clients for synchronization.

Another aspect of the present invention is that it preferably controls a reference dataset, sometimes called the Grand Unification Database or GUD, to store a super-set of data from his or her user datasets. In this way, the system of the present invention provides a repository of information that is available at all times and does not require that any other user dataset be connected. Suppose, for instance, that a user has two datasets: a first that resides on a desktop computer and a second that resides on a hand-held device. If the user later wishes to synchronize a third user dataset, such as one in a server computer that stores user information, the system of the present invention has, in the GUD, all the information necessary for synchronizing the new dataset, regardless of whether any of the other datasets are then available. The system of the present invention can, therefore, correctly propagate information to any appropriate user dataset without having to "go back" to (i.e., connect to) the original user dataset from which that data originated. The system of the present invention includes various "conflict" or "duplicate" resolution strategies that handle the increased complexities of allowing synchronization for an arbitrary number of datasets and including in the synchronization even data from datasets that are not available.

Another aspect of the present invention is that, internally, it employs "type plug-in" modules, each one for supporting a particular data type. Since the core synchronization engine treats data generically (e.g., as undifferentiated data), type-specific support is provided by the corresponding plug-in module. Each plug-in module is a type-specific module having an embedded interface (e.g., an API--application programming interface) that each synchronization client may link to, for providing type-specific interpretation of undifferentiated data. For instance, the system may include one type-specific record API for contact information, another for calendar information, and yet another for memo information. In this manner, each client may employ a type-specific API for correctly interpreting and processing particular data. The engine, on the other hand, is concerned with correct propagation of data, not interpretation of that data. It therefore treats the data itself generically. In this fashion, the present invention provides a generic framework supporting synchronization of an arbitrary number of synchronization clients or devices.

An exemplary method for providing synchronization of more than two datasets includes, in an information processing system, accepting a designation of more than two datasets to be synchronized. At least one of the designated datasets includes one record to be synchronized with all others of the designated datasets. The method further includes, in response to the designation, synchronizing the more than two designated datasets without requiring further designating of datasets to be synchronized. After the step of synchronizing, each of the more than two designated datasets includes a record that corresponds to, and is in a synchronized state with, the one record.

Another exemplary method capable of synchronizing more than two designated datasets includes providing a synchronizer dataset that contains records. The synchronizer dataset reflects a result of an earlier synchronization. The method further includes comparing values derived from each of the more than two designated datasets to values derived from the synchronizer dataset to identify a change of a record in the each designated dataset since an earlier synchronization involving the each designated dataset. The method further includes changing at least two of the designated datasets based on the identified change of a record.

An exemplary method is provided for interacting with a user prior to a synchronization of datasets selected from a plurality of more than two datasets, in which the synchronization will leave each of the selected datasets with a record that is synchronized with respect to a corresponding record in each other one of the selected datasets. The method includes presenting an indicator of each of at least the selected datasets; presenting indicators to indicate possible flow of changes among the selected datasets; and accepting a change to the selection of datasets such that, after the step of accepting, the selected datasets to be synchronized include more than two of the plurality of more than two datasets.

Another exemplary method is provided for interacting with a user prior to a synchronization of at least one dataset from a plurality of datasets. The method includes presenting a indicator, referred to as the dataset indicator, of each of at least the at least one dataset to be synchronized; presenting a nexus indicator in addition to the dataset indicators; and presenting path indicators that schematically link the nexus indicator to the dataset indicators of at least the at least one dataset to be synchronized.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram that shows the conventional approach to synchronization of datasets.

FIG. 2 is a block diagram that shows a synchronization system according to an embodiment of the present invention in an example environment.

FIG. 3A is a block diagram of a computer system in which the present invention may be embodied.

FIG. 3B is a block diagram of a software system of the present invention for controlling operation of the system of FIG. 3A.

FIG. 4A is a screen-shot that shows a graphical user interface provided by the Synchronizer, according to a preferred embodiment of the invention.

FIG. 4B is a screen-shot that shows the graphical user interface of FIG. 4A at a different moment.

FIG. 5A is a screen-shot of an options dialog box for changing synchronization and other settings for an example client, the REX.TM. organizer.

FIG. 5B is a screen-shot that shows a dialog box for changing options related to the Contacts data type.

FIG. 5C is a screen-shot that shows a data-field-mapping table displayed by the Synchronizer for changing data-field mappings for a particular mapping of record files from different clients.

FIG. 6 is a flowchart that summarizes a method used by the Synchronizer for interacting with a user.

FIG. 7A is a flowchart that illustrates a method used by a "binary-based" embodiment of the Synchronizer for synchronizing two or more than two datasets using multiple binary synchronizations.

FIG. 7B is a flowchart that illustrates a particular implementation of the method of FIG. 7A.

FIG. 7C is a table that shows a faulty result produced by the basic binary-based embodiment of the Synchronizer that can be remedied in an improved system.

FIG. 9A is a block diagram that shows the architecture of a Synchronizer according to the preferred embodiment of the invention.

FIG. 9B is a block diagram that schematically shows the mechanism by which the engine, the client accessors, and the type modules of the Synchronizer communicate with one another.

FIG. 10A is a block diagram that shows components of a Synchronizer dataset.

FIG. 10B is a table that shows records of the Synchronizer dataset, including their data and some of their status information.

FIG. 10C is a table that shows portions of the mapping table (of FIG. 10A) that describes a particular client's records.

FIG. 11A is a flow chart that illustrates the Synchronizer's methodology for performing a synchronization.

FIG. 11B is a flowchart that shows an expansion of the step (of FIG. 11A) that determines fresh changes and proposes actions in response.

FIG. 11C is a flowchart that shows an expansion of the step (of FIG. 11B) that determines all fresh updates and adds from a particular client and proposes GUD.sub.13 Updates and GUD_Adds, respectively, in response.

FIG. 11D is a flowchart that shows an expansion of the step (of FIG. 11B) that determines all fresh deletions from a particular client and proposes a GUD_Delete in response.

FIG. 11E is a flowchart that shows an expansion of the step (of FIG. 11B) that determines all fresh changes in the GUD with respect to the client and proposes to propagate the changes in response.

DETAILED DESCRIPTION OF A PREFERRED EMBODIMENT

The detailed description is separated into the following chapters:

I. Introduction

II. System Hardware

III. System Software

IV. An External View: The User Interface

V. "Binary-Based" System For Automatic Synchronization of Two Or More Datasets

VI. Comparison of Timestamps From Different Datasets

VII. A Plug-In Architecture For the Synchronizer

VIII. Core of The Synchronizer

IX. Synchronization Methods

X. Additional Embodiment(s)

I. Introduction

A. Datasets, Records, and Synchronization

Datasets are collections of data. According to the present invention, the purpose of synchronizing two, or more than two, datasets is to update them as necessary with data from one another so that they contain the same or equivalent data (generally, the latest data), at least in the portions of the datasets that the user has designated for synchronization. Each dataset may be organized into individual data records. For example, a dataset having contact information may be organized into records, including a record listing a "Bill Smith's" phone numbers and addresses and another record listing a "Ted Brown's" phone numbers and addresses. In general, if records have been added to any dataset before a synchronization, then equivalent records are added to the other datasets as a result of the synchronization. Also, generally, if modifications or deletions of records have been made to one dataset before the synchronization, then equivalent modifications and deletions of corresponding records are made to the other datasets as a result of the synchronization. (The preceding discussion of synchronization according to the present invention becomes more complicated if conflicts or duplicates are present. Conflicts and duplicates are further described in later sections.)

B. Record Files, Data Types, Data Fields, etc.

In synchronizing two, or more than two, datasets, a correspondence is generally established between particular records across the datasets. For example, a contact record for "Bob Smith, of Acme Widgets" may exist in every dataset (perhaps as a result of synchronization), and these records in different datasets may correspond to one another. The records in a dataset may be of various data types, for example, a time-zone type, a contact type, a calendar-entry type, a task (or "to do"-list-entry) type, a memo type, an electronic-mail type, or other types. In general, each record may include data organized into one or more data fields. For example, a contact-type record may include data for a "last name" field, a "first name" field, a "company" field, and many other fields. For many typical data types, it is not necessary for each record of the data type to have data for every possible field. For synchronization, a correspondence is typically established between particular data fields across datasets. For example, a "title" field for contact records in one dataset may correspond to a "Job Title" field for contact records in another dataset. In general, the systems and methodologies of the present invention can be adapted to work with any one type of data, or with any multiple types of data, and with arbitrarily defined or named data fields.

Within a dataset, the records of a particular data type may further be organized into one or more groups that are here referred to as record files. Examples of record files include "Cardfiles" in Starfish's Sidekick.RTM. PIM or "folders" in Microsoft's Outlook PIM. A preferred embodiment of the invention allows the user to specify an arbitrary correspondence, or mapping, of particular record files of the same data type in different datasets to each other. For example, the user may specify that a record file named "Business Contacts" in a first dataset, a record file named "Contacts" in a second dataset, and a record file named "Customer List" in a third dataset be mapped to one another. Separately, the user may specify that only a record file named "Calendar" in the first dataset and a record file also named "Calendar" in the third dataset be mapped to each other. As demonstrated by the preceding example, a user-specified synchronization of multiple datasets by the preferred embodiment may include a number of separate synchronizations, each of which synchronizes a set of mutually-mapped, or corresponding, record files. Each synchronization of corresponding record files does not necessarily involve all of the multiple datasets. Each synchronization of corresponding record files also need not necessarily involve the same datasets as other synchronizations of other record files. For simplicity only, unless otherwise stated or unless context demands otherwise, discussion of synchronizing datasets may use language as if to assume that all datasets involved in the synchronization each contains exactly one record file that is mapped to the one record file of all other datasets involved. It is to be understood that this simplification, and other simplifications made for ease of description, are not meant to limit the scope of the invention.

C. Record Transformations

When performing synchronization, a synchronization system transforms records from one dataset's representation into another dataset's representation. For example, the system may transform from an Internet Sidekick.RTM. cardfile for business contacts into a synchronization-system-internal representation. Typically, there is a one-to-one relationship between records in the source and target datasets. If this is not the case, however, the component of the system that interacts with a non-conforming dataset (e.g., a dataset accessor, which will be further described) includes logic to handle this non-conformity.

D. Field Mapping Types and Field Conversion Types

Record transformations are a combination of field mappings and conversions from a source record to a target record. It is often the case that there are significant differences in the number, size, type and usage of fields between two datasets in a synchronization relationship. The specification of transformations generally depends on the particular datasets involved., and may be user configurable, with the synchronization system providing defaults.

In a specific embodiment, the following types of field mappings are supported.

    1. Null           Source field has no equivalent field in the target
                      dataset and is ignored during synchronization.
    2. One-to-One     Map exactly one field in the target to one field in
                      the source.
    3. One-to-Many    Map one field in the target to many fields in the
                      source, such as parse a single address line to
                      fields for number, direction, street,
                      suite/apartment, or the like.
    4. Many-to-One    Map several fields in the target to one fleld in the
                      source, such as reverse the address line mapping
                      above.


The following types of field conversions are supported.

    1. Size            Source field may be larger or smaller in size than
                       the target field.
    2. Type            Data types may be different, such as float/integer,
                       character vs. numeric dates, or the like.
    3. Discrete Values A fleld's values may be limited to a known set.
                       These sets may be different from target to source
                       and may be user defined.


E. Conflicts and Duplicate Records

In general, the user may make arbitrary changes to individual datasets and later synchronize the datasets. In general, each change made to a dataset (for example, addition, modification, or deletion of a record) by its user is propagated to other datasets as a result of a subsequent synchronization. However, it sometimes happens that two, or more than two, changes are in conflict with one another such that the changes cannot all be propagated without one change's undoing or otherwise interfering with another. Such changes give rise to a "conflict." For example, a conflict exists when a user has made a modification to a record in a first dataset, and has separately made a conflicting modification to the record's corresponding record in a second dataset. For a specific example, the user may have set a contact's (e.g., Bob Smith's) "title" field to "salesperson" in his handheld organizer device and separately set the corresponding contact's (Bob Smith's) "title" field to "Sales Manager" on the user's desktop PIM. Automatic and user-assisted methods for resolving conflicts according to the present invention are discussed in later sections.

Occasionally, the user may cause the same, or matching, information to exist in different datasets without using the present invention, and then use the present invention to synchronize the datasets. For example, the user may cause records to exist for a "Bob Smith, of Acme Widgets" in multiple datasets, either by adding such records or by modifying existing records into such records. If the definition of the contact data type requires that the first name, last name, and company information for each contact be unique, then the example records would by definition match one another. In such a situation, simpleminded propagation of each added or modified record in each dataset to all other datasets would result in a duplication of records. Therefore, the present invention performs duplicate resolution to prevent such duplication. Automatic and user-assisted methods for resolving duplicates according to the present invention are discussed in later sections.

F. Timestamps

The present invention often will make processing decisions based on comparing the time at which past events occurred. For example, the system may want to know whether a record in a dataset was modified before or after a most recent synchronization. Therefore, the time of various events should be recorded. One or more "timestamp" values in record fields are dedicated to this purpose. Typically, datasets involved in synchronization can be assumed to support a "last-modification-time" timestamp. Datasets that do not have timestamps, however, can still be synchronized using the present invention, but may require more processing by the present invention (for example, to perform exhaustive record comparisons) or more intervention by the user (for example, during conflict resolution).

In conjunction with the use of timestamps to compare the relative timing of record creation or modification or synchronization, etc., the clocks on the datasets' respective devices may themselves be kept synchronized, or assumed to be synchronized, either to the same value, or to equivalent values, or to values having a constant offset. Equivalent clock values include clock values plus clock time-zone information showing that the clock values correspond to a common time, for example a common Greenwich Mean Time (GMT). Clock values having a constant offset to one another may exist for example if devices that do not include time zone information have clocks set for different time zones. Clock values having a constant offset may also exist if two devices do not have their clocks synchronized, and the user does not wish to, or cannot, synchronize them. For example, the clocks on a server computer and a local computer may not be synchronized, and the user may be unable to reset the server clock, even though it is off by, for example, five minutes. In specific situations, the present invention will work directly with timestamps from the clock of a particular dataset's device without first converting such timestamps to a common time such as time according to the synchronization system's own clock or GMT. This is done, when possible, to minimize problems due to any relative drift in the devices' clocks, such as drifts caused by clock inaccuracies or drifts caused by the user's re-setting of a clock on a device. Comparison of timestamps will be further discussed in later sections.

G. The Conventional Fragmented, Binary Approach

FIG. 1 is a block diagram that shows the conventional approach to synchronization of datasets, as was introduced in the Background section. An example environment 5 includes a main PC 10, a secondary PC 15, a Pilot organizer 20, and a REX.RTM. organizer 25. PIM applications Starfish Sidekick.RTM. 30 and Microsoft Outlook 35 reside on the main PC 10. PIM applications Microsoft Outlook 40 and Lotus Organizer 45 reside on the other PC 15. The Pilot organizer maintains a dataset 22 and includes an interface 21 (e.g., device software) to the dataset 22. The REX.TM. organizer maintains a dataset 27 and includes an interface 26 (e.g., device software) to the dataset 27. The PIM applications Sidekick.RTM. 30, Outlook 35, Outlook 40, and Organizer 45 maintain the datasets 32, 37, 42, and 47, respectively. A REX.TM.-specific synchronization software module 50 performs binary synchronization between the REX.TM. organizer's dataset 27 and another dataset, namely, Sidekick.RTM.'s dataset 32. A Pilot-specific synchronization software program 55 performs binary synchronization between the Pilot organizer's dataset 22 and a selected one of the datasets 32 and 37 on the main PC 10. A PC-to-PC synchronization software program 60 performs binary synchronization between two datasets selected from the datasets 32, 37, 42, and 47 on the PCs 10 and 15. The synchronization programs 55 and 60 are drawn schematically as rotary contact switches to emphasize that they can select at most two datasets for any one synchronization.

H. A Unified, Not-Necessarily-Binary Approach

FIG. 2 is a block diagram that shows a synchronization system 200 ("Synchronizer") according to an embodiment of the present invention in an example environment 205. As shown, the environment 205 includes a number of elements from the environment 5 of FIG. 1, and other elements. The shared elements retain their numeric labels from FIG. 1 and need not be specifically introduced again. The Synchronizer is a centralized system that can access when available an arbitrary number of datasets, for example, a dataset 267 and the datasets 22, 27, 32, 37, 42, and 47, and other datasets (not pictured), and any sub-combination of such datasets. The arbitrarily many datasets reside in various devices, for example, the main PC 10, the secondary PC 15, the Pilot organizer 20, the REX.TM. organizer 25, and a cyberspace 265 which includes, for example, the Internet. The dataset 267 resides in the cyberspace 265 and is accessible via an Internet server 266 (for example, an online PIM provided by the TrueSync.com Service of the present assignee, Starfish Software, Inc.; TrueSync.RTM. is a registered trademark of Starfish Software, Inc.). The Synchronizer communicates with certain devices, for example, the secondary PC 15 and the cyberspace 265 through communication channels, including for example local or wide area computer networks, telephone networks, infrared or radio-frequency or other wireless networks, or networks that use the Internet Protocol (IP), and others. Datasets do not need to be directly connected but, instead, can be connected via a store-and-forward transport, such as electronic mail. The datasets and devices shown in FIG. 2 are merely examples; the Synchronizer can in general be used with arbitrary other types of datasets and devices, through appropriate interfaces.

The datasets (e.g., 32, 267, etc.) or their control or access logic (e.g., Sidekick.RTM. 30, TrueSync.com Internet server 266, etc.) are called clients. Unlike the conventional synchronization systems shown in FIG. 1, the Synchronizer is capable of automatically synchronizing data among an arbitrary number of accessible datasets in a single synchronization session following user selection of datasets, instead of being able to synchronize data between only two datasets in such a session.

I. Synchronization Options and the Synchronizer Dataset

The Synchronizer allows the user to choose the extent to which a particular client participates, or does not participate, in a synchronization. The choices include, for example, no participation (e.g., "do not synchronize"), full participation (e.g., "synchronize"), or one-way participation toward the client (e.g., "overwrite the client"), but are not limited to these choices. "Do not synchronize" means that the client simply will not be involved in the synchronization and generally needs not even be present or accessible during the synchronization. "Synchronize" means that the client will be synchronized at least with those other datasets that are also set to "synchronize" in the synchronization. More particularly, "synchronize" implies that changes (for example, additions, modifications, or deletions), if any, may flow into and may flow out of the client as a result of the synchronization. Even more particularly, "synchronize" implies at least that (1) changes may be made to the client as a result of the synchronization, and (2) changes may be made to other datasets (for example, clients) in response to data in the client as a result of the synchronization. "Overwrite the client" means that the client's data will be overwritten with the result of the synchronization, but the synchronization does not otherwise include the client. Thus, "overwrite the client" implies that changes, if any, may flow into but will not flow out of the client as a result of the synchronization. In general, unless stated otherwise or unless context demands otherwise, discussion of synchronization may assume for simplicity that participating clients are set to fully participate in synchronization.

In a preferred embodiment, the Synchronizer itself includes a dataset, called the Synchronizer dataset, that contains the most up-to-date data as seen during any synchronization of datasets for a user since the Synchronizer dataset was created or last reset. The Synchronizer dataset may also contain much status information about the records within it. The Synchronizer is capable of having multiple Synchronizer datasets, each being used for a particular user of the Synchronizer. A single person may choose to be more than one Synchronizer user (i.e., own multiple Synchronizer datasets) if he or she wishes for some reason to have multiple, separate "worlds" of data. For example, the person may want one world of data for a first set of clients (e.g., REX.TM., Pilot, and Outlook) and another world for a second set of clients (e.g., Sidekick.RTM. and TrueSync.com Server) that, for example, does not overlap with the first set. For another example, the person may want additional worlds of data for experimental or archival (back-up) purposes. In general, unless otherwise stated or unless context demands otherwise, discussion will refer for simplicity to synchronization for a single particular user, and reference will be made, for example, to "the Synchronizer dataset" to mean "the Synchronizer dataset corresponding the single particular user."

The Synchronizer optionally allows the user to reset the Synchronizer dataset to be empty, such that it will later be filled anew with the result of a next synchronization of one or more clients. The Synchronizer may also allow the user to overwrite the contents of the Synchronizer dataset with the contents of a single client (this is akin to resetting the Synchronizer dataset to be empty and then synchronizing just the single client "by itself"). In a specific embodiment, allowing the Synchronizer dataset to be reset or overwritten is considered to be an advanced feature that is reserved for advanced users or is accompanied by warnings or requests for confirmation. In connection with specific embodiments of the invention, the Synchronizer dataset is also known as the Unified Database, or the "Grand Unification Database," or "GUD."

To illustrate one benefit of having the Synchronizer dataset, consider the following example scenario. A first client (e.g., the REX.TM. organizer) participates in a first synchronization with a first group of other clients (even zero other clients) and contributes a particular change (e.g., addition, modification, or deletion of a record) to the other participating clients and also automatically to the Synchronizer dataset. A second client (e.g., the TrueSync.com server) does not participate in the first synchronization. Later, the second client participates in a second synchronization with a second group of other clients (even zero other clients), none of which other clients has yet received the particular change via synchronization or via independent data entry. In this second synchronization, the second client can still receive the particular change because the particular change is in the Synchronizer dataset, which is preferably always available for full participation in synchronization. (One way to ensure that the Synchronizer dataset is always available is to have the Synchronizer store the Synchronizer dataset on the Synchronizer's local machine, or on another machine that is known or assumed to be always available.) The synchronization options and the Synchronizer dataset will be further discussed in later sections.

II. System Hardware

The present invention may be embodied on an information processing system such as the system 300 of FIG. 3A, which comprises a central processor 301, a main memory 302, an input/output (I/O) controller 303, a keyboard 304, a pointing device 305, pen device (or the like), a screen or display device 306, a mass storage 307 (e.g., hard disk, removable floppy disk, optical disk, magneto-optical disk, or flash memory, etc.), one or more optional output device(s) 308, and an interface 309. Although not shown separately, a real-time system clock is included with the system 300, in a conventional manner. The various components of the system 300 communicate through a system bus 310 or similar architecture. In addition, the system 300 may communicate with other devices through the interface or communication port 309, which may be an RS-232 serial port or the like. Devices which will be commonly connected to the interface 309 include a network 351 (e.g., LANs or the Internet), a laptop 352, a handheld organizer 354 (e.g., the REX.TM. organizer, available from Franklin Electronic Publishers of Burlington, N.J.), a modem 353, and the like.

In operation, program logic (implementing the methodology described below) is loaded from the storage device or mass storage 307 into the main memory 302, for execution by the processor 301. During operation of the program (logic), the user enters commands through the keyboard 304 and/or pointing device 305 which is typically a mouse, a track ball, or the like. The computer system displays text and/or graphic images and other data on the display device 306, such as a cathode-ray tube or an LCD display. A hard copy of the displayed information, or other information within the system 300, may be obtained from the output device 308 (e.g., a printer). In a preferred embodiment, the computer system 300 includes an IBM PC-compatible personal computer (available from a variety of vendors, including IBM of Armonk, N.Y.) running Windows 9x or Windows NT (available from Microsoft Corporation of Redmond, Wash.). In a specific embodiment, the system 300 is an Internet or intranet or other type of network server and receives input from and sends output to a remote user via the interface 309 according to standard techniques and protocols.

III. System Software

Illustrated in FIG. 3B, a computer software system 320 is provided for directing the operation of the computer system 300. Software system 320, which is stored in system memory 302 and on storage (e.g., disk memory) 307, includes a kernel or operating system (OS) 340 and a windows shell 350. One or more application programs, such as client application software or "programs" 345 may be "loaded" (i.e., transferred from storage 307 into memory 302) for execution by the system 300.

System 320 includes a user interface (UI) 360, preferably a Graphical User Interface (GUI), for receiving user commands and data and for producing output to the user. These inputs, in turn, may be acted upon by the system 300 in accordance with instructions from operating system module 340, windows module 350, and/or client application module(s) 345. The UI 360 also serves to display the user prompts and results of operation from the OS 340, windows 350, and application(s) 345, whereupon the user may supply additional inputs or terminate the session. In the preferred embodiment, OS 340 and windows 345 together comprise Microsoft Windows software (e.g., Windows 9x or Windows NT). Although shown conceptually as a separate module, the UI is typically provided by interaction of the application modules with the windows shell and the OS 340. One application program 200 is the Synchronizer according to embodiments of the present invention, which will now be described in further detail.

IV. An External View: The User Interface

A. Overview

FIG. 4A is a screen-shot that shows a graphical user interface 400 provided by the Synchronizer, according to the preferred embodiment of the invention. The GUI 400 includes a title bar 405, a menu bar 410, a tool bar 415, a client-chart window 420, a user log window 425, and a status bar 430. The Synchronizer displays its own name 435 and a current user's name 437 on the title bar. The Synchronizer provides several pull-down menus 440 on the menu bar. The pull-down menus may include, for example, "Client," "Data," "View," "Synchronize," and "Help" menus. The Synchronizer provides several pushbuttons 444 on the tool bar as shortcuts for common or important menu commands. These pushbuttons may include, for example, a start-synchronize button 445 or a halt-synchronize button 446. The pushbuttons may also include buttons for initiating interactive configuration of options for individual data types, such as a contact data type, by the user. Examples of such buttons include a configure-time-zones button 447, a configure-contacts button 448, a configure-calendar button 449, a configure-memos button 450, or a configure-task-list button 451. The Synchronizer displays a client chart 453 in the client-chart window 420. The client chart includes representations of the clients and representations of the flow of dataset changes (somewhat informally, "data flow"). The representations of clients include, for example, visual client indicators 455, 456, 457, 458, 459, 460, 461, or 462. The representations of the data flow include, for example, visual client data-flow direction indicators 465, 466, or 469, client data-path indicators 467 or 468, and a client nexus indicator 470, which all may be called data-flow indicators. The Synchronizer displays summary information about each synchronization session in the user log window. Examples of summary information include counts of records added, modified, or deleted from each client during a synchronization, and any warning or error messages. In general, summary information for each summary session persists as a permanent record, but the user may set a time period (e.g., a week) after which summary information will be deleted. The Synchronizer displays run-time status information in the status bar. Examples of displayed run-time status information include, but are not limited to, "initializing REX," "determining changes from REX," "distributing record modifications," and "uninitializing REX."

FIG. 4B is a screen-shot that shows the graphical user interface 400 at a different moment, for example, after the user has modified the settings for the Synchronizer. In particular, the client chart 453 of FIG. 4A has been re-drawn by the Synchronizer as a client chart 453B to reflect the modified user settings. More particularly, the client chart 453B includes certain visual representations of the data flow that are changed as compared to the representations found in FIG. 4A. Even more particularly, the client data-flow direction indicators 465, 466, and 469 and the client data-path indicators 467 and 468 of FIG. 4A have been replaced in FIG. 4B by the indicators 465B, 466B, 469B, 467B, and 468B, respectively. The client charts 453 and 453B and their components will be further explained below. Unless reference is specifically made to FIG. 4B, the following explanation of the client charts and their components will refer to FIG. 4A. In the following discussion, the numeric labels (e.g., 455, 456, . . . , 462) from FIG. 4A for client indicators may also be used to accompany references to the client indicators' respective corresponding clients, but it will be apparent from the context whether the client indicator or the client is meant and whether the distinction is important for the particular discussion.

B. Visual Representation of Clients

The Synchronizer displays client indicators, for example the client indicators 455-462, in the client chart 453 each to represent known clients (e.g., datasets). Particular client indicators may represent local or network-connected software applications, such as PC-style applications. Examples include the Schedule+PIM indicator 455, the Organizer PIM indicator 456, the ACT! PIM indicator 457, the TrueSync.RTM. Information Manager PIM indicator 458, or the Outlook PIM indicator 459. Client indicators may also represent Internet-based or intranet-based services (e.g., World Wide Web sites, and the like). An example is the TrueSync.com server visual indicator 460. Client indicators may also represent small or special-purpose devices or applications on such devices. Examples include the PalmPilot indicator 461 or the REX.RTM. indicator 462. The Synchronizer according to the preferred embodiment displays the client elements grouped according to client type. For example, the Synchronizer may display the client indicators 455, 456, 457, 458, and 459, which correspond to local or PC-style applications, adjacent to one another and in a group, as is shown. Similarly, the Synchronizer may display the client indicators 461 and 462, which correspond to small devices, adjacent to one another and in a group, as is shown. Likewise, the Synchronizer may display the TrueSync.com server indicator 460, which corresponds to Internet-based services, in its own group, as is shown, with other services, if any. The example client indicators 455-462 are shown as labeled rectangles, but other shapes or visual features may be also be used.

C. Visual Representation of the Flow of Dataset Changes

The Synchronizer displays data-flow indicators (for example, the indicators 465, 466, 467, or 468) to illustrate and summarize the possible flow of changes (e.g., additions, modifications, or deletions) into or out of each client or among clients according to current settings. Examples of data flow indicators include data-path indicators (for example, the indicators 467 or 468) which include lines (or another form) that schematically depict paths for changes to flow to or from individual clients, which clients are represented by their corresponding client indicators. Examples of data-flow indicators also include data-flow direction indicators (for example, the arrowheads 465 or 466, or the arrow stumps 465B and 466B of FIG. 4B) which indicate possible or impossible direction(s) in which changes may flow for particular clients according to current settings. For example, the arrowhead 465 points away from its corresponding client indicator 461 to indicate that changes may flow from the client 461 during a next synchronization according to current settings. Conversely, the arrowhead 466 points toward its corresponding client indicator 461 to indicate that changes may (also) flow into the client 461 during the next synchronization according to current settings. Separately, the arrow stumps 465B and 466B (both from FIG. 4B) respectively indicate that changes will neither flow out of nor into the client 461.

If the current settings call for a particular client to participate fully in synchronization, then the Synchronizer would display data-flow direction indicators showing that changes may flow both into or out of the particular client. If the current settings call for a particular client's dataset to be overwritten during synchronization, then the Synchronizer would display data flow indicators showing that changes may flow only into the particular client. If the current settings call for a particular client not to participate in synchronization, then the Synchronizer would display data flow indicators showing that changes do not flow to or from the particular client. Preferably, the data-path indicator, if shown at all, for the particular client would be shown in a different color (e.g., "greyed out") as compared to a data-path indicator for a client that is set to participate in synchronization. If the current settings call for a particular client's data to overwrite the Synchronizer dataset (and recall that this may be considered an advanced and generally rarely used setting), then the Synchronizer would display data flow indicators showing that changes may flow only out of the particular client, and preferably the Synchronizer would require that all other clients must be set to be overwritten or to "not participate" in the next synchronization. (This restriction avoids complicated-looking client charts, which may confuse the user, especially if the user is not knowledgeable about the Synchronizer.) A particular embodiment makes an additional simplifying restriction that if the current settings call for a particular client's data to overwrite the Synchronizer dataset, then at most one other client can be set to be overwritten and all remaining clients must be set to "not participate" in the next synchronization. Another particular embodiment makes a further simplification that if the current settings call for a particular client's data to overwrite the Synchronizer dataset, then there must be one and only one other client, and that other client must be set to be overwritten.

In general, each client indicator (e.g., Pilot indicator 461) may have an associated data-path indicator(s) (e.g., line 467) which may include associated data-flow direction indicator(s) (e.g., arrowheads 465 and 466). Preferably, data-flow direction indicators are arrowheads or the absence of such arrowheads(s) (e.g., arrow stumps without arrowheads or greyed-out arrowheads, or the like). Preferably, each data-path indicator is a line which may be bi-directional or unidirectional (e.g., has one or two associated arrowheads), depending on the current settings. Other configurations are possible. For example, each client indicator may have up to two lines (e.g., parallel lines) as its data-path indicator, wherein each line is a unidirectional arrow and thus is also a data-flow direction indicator, and wherein the entire line is absent or greyed-out if changes cannot flow along the indicated direction based on current settings. For another example, data-flow direction indicators may be colors (e.g., green for two-way, yellow for one-way, red for zero-way), letters, numbers, or other graphical indicia. For example, data-flow direction indicators may be animations, for example, "moving lights" (e.g., successively lit-up "light bulbs," or the like) or other animated graphics. These animations may be near or along the path-way indicators. A light or other graphic "moving" away from a client (e.g., a string of "light bulbs") and a light or other graphic "moving" toward the same client may, or may not, overlap in time or space. For example, a graphic that moves away from a client, and a graphic that moves toward the same client may each begin from opposite ends of a line at about the same time and traverse the line at about the same speed (thus, greatly or completely overlapping in time), and either pass by (non-overlapping in space) or pass through (overlapping in space) each other along the line. Still other configurations or schemes are possible.

As shown in FIG. 4A for an embodiment of the invention, the data-flow indicators (e.g., the indicators 465, 466, 467, or 468) describing flow from individual client indicators (e.g., the indicators 461 and 462) converge at the nexus indicator 470. By way of this configuration, the user may visually trace data flow from client indicator to client indicator (e.g., from indicator 461 to indicator 462). As shown, the nexus indicator 470 is displayed as a generally centrally located "hub," preferably as a circle, and preferably with the image of at least one arrow upon it. (Two arrows, both pointing generally clockwise, or both pointing generally counterclockwise, are preferred.) Other configurations or schemes are possible. For example, the nexus indicator 470 may alternatively simply be a general vicinity of empty display-screen space that serves as a nexus for flow indicators from clients to converge. Such an empty space may be, for example, a generally centrally located space, for example, of approximately the same size as a client indicator. For another example, the nexus indicator 470 may alternatively be displayed as a straight or winding or bent line (e.g., a "street"), and the individual data path indicators (e.g., "driveways") may tap into the line at separate points, preferably at approximately right angles. The "street" may have driveways on both sides or only one side. Also, the street may be located generally in the center of the client chart (e.g., as a horizontal line segment), or the street may form a border that surrounds the client indicators. Still other configurations or schemes are possible.

D. Visual Representation of the Synchronizer

The nexus indicator 470 according to the preferred embodiment not only represents the conceptual place where data paths from individual clients converge but also represents the Synchronizer itself. Thus, the nexus indicator 470 is also a synchronizer indicator 470 in the preferred embodiment. Preferably, when the Synchronizer is actively synchronizing, the nexus indicator 470 indicates this activity by changing its appearance. In the preferred embodiment, the nexus indicator 470 is a circle imprinted with two arrows, as described earlier, and indicates that synchronization is ongoing by rotating, preferably in the direction of the two arrows on the indicator 470, i.e., clockwise about its center.

In the client chart 453 shown in FIGS. 4A and 4B, the Synchronizer dataset is represented along with the rest of the Synchronizer by the nexus indicator 470. In an alternative embodiment of the Synchronizer (not shown), the Synchronizer displays a client chart that separately represents the Synchronizer dataset as if it were another client (e.g., as another satellite around the Synchronizer hub). In particular, the client chart would include a Synchronizer dataset indicator (not pictured) that is like a client indicator but may be highlighted in some way, for example, by having special appearance, including special position, or by having data flow indicators that have a special appearance or position, and so forth, as compared to client indicators. For example, the Synchronizer dataset indicator may be closer to the Synchronizer indicator 470 than the client indicators.

In general, the Synchronizer dataset is utilized in every synchronization, and therefore the Synchronizer dataset indicator would generally be displayed as if representing a client that participates in synchronization, e.g., with at least one data-flow direction indicator (e.g., arrow) in the "on" or "present" state. For example, data-flow direction indicators showing possible two-way flow from the Synchronizer dataset imply that at least one client is set to be synchronized; data-flow direction indicators showing possible one-way flow only, into the Synchronizer dataset, show that the Synchronizer dataset has been set to be overwritten with the result of synchronization; and data-flow direction indicators showing possible one-way flow only, out of the Synchronizer dataset, show that all participating clients are set to be overwritten. Particular embodiments may permit special synchronization modes, in which the Synchronizer dataset indicator may instead be displayed as a client that does not participate in synchronization. An example of such a special synchronization mode is one in which the user specifies that the Synchronizer dataset is to be cleared (e.g., reset) both before and after synchronization.

E. Accepting User Input

1. Using the Client Chart to Accept User Input

The client indicators (e.g., indicators 455 or 462, etc.) and the synchronizer indicator 470 are useful in accepting user input. In particular, the Synchronizer is programmed to allow the user to "select" or "shift focus to" such an indicator for changing the settings relevant to the indicated client or the indicated Synchronizer. For example, in the preferred embodiment, the Synchronizer permits a user to first position a cursor (not shown) over an indicator using a pointing device 305 (from FIG. 3A) and then hit a key (e.g., a left mouse button) or other similar input to select the indicator to modify its options. This action is commonly termed "clicking on" the indicator (or "left-clicking on" the indicator, if the left mouse button is used). In the preferred embodiment, the Synchronizer responds to a left-click on a client indicator (e.g., indicator 462) by displaying an options dialog box by which the user can change synchronization and other settings for the indicated client (e.g., the REX.TM. organizer). Similarly, the Synchronizer responds to a user click on the right mouse key, when the cursor is over a client indicator, by bringing up a more terse dialog box having commonly chosen commands from the options dialog box (e.g., "Overwrite this client" or "Overwrite the <Synchronizer dataset>") for quick access by the user, and an "Options . . . " command (or a similar command) that brings up the options dialog box in case the user desires access to the full range of choices from the options dialog box. In the preferred embodiment, the Synchronizer responds to a left-click on the Synchronizer indicator by starting synchronization. The Synchronizer may also respond to a click on the Synchronizer indicator (or on the Synchronizer dataset indicator, if any) by displaying an options dialog box by which the user can change synchronization and other settings for the Synchronizer in general or for the Synchronizer dataset.

Other schemes for using the graphical client indicators and the synchronizer indicator for accepting user input are possible. For example, the Synchronizer can, in an alternative embodiment, display a "focus" indicator near a single client or synchronizer indicator, and allow the user to shift the focus to another client or synchronizer indicator using, e.g., forward or backward cursor keys or the tab or shift-tab keys on the keyboard, or similar inputs. The Synchronizer can then allow the user to select a highlighted indicator for modifying its options by depressing an "enter" or space key on the keyboard, or another input on any user input device. Focus indicators may take many forms, including for example a highlight such as a border, symbol, "backlighting," and/or color change, etc.

2. Choosing Client and Synchronizer Settings

FIG. 5A is a screen-shot of the options dialog box 505 for changing synchronization and other settings for an example client, the REX.TM. organizer. In general, each client will have an options dialog box, and these boxes may differ for each client according to the particular capabilities of each client or according to the necessary or available options for each client. (Incidentally, such client-specific behavior is easy to implement, in light of the Synchronizer's modular underlying architecture, as will be described in other sections.) In certain embodiments of the invention, an options dialog box is also implemented for changing synchronization and other settings for the Synchronizer dataset.

The options box 505 includes a number of tabbed panels, including for example, a synchronization panel 506, a preferences panel 507, or a backup/restore panel 508. The synchronization panel 506 includes a listing of synchronization settings for the client which may be chosen by the user (e.g., by mouse-clicking). In particular, the possible settings include a do-not-synchronize setting 511, a synchronize setting 512, an always-overwrite-client setting 514, and an overwrite-client-once setting 513. These possible settings have been discussed in an earlier section. The synchronization settings 511, 512, 513, and 514 are generally applicable to typical clients, and thus are typically found in the options dialog box for any client, but selected settings may be omitted from the options dialog box for specific clients or for the Synchronizer. The synchronization panel 506 also includes other settings that are more client-specific, such as a connection-type setting 510.

The connection-type setting 510 is a choice of a connection type (e.g., PC-Card slot or a serial cradle) to be used for communicating with REX.TM., the example client. For example, the user may select the connection-type setting 510 from available choices "PC-Card slot" or "Docking station on COM port 1," or "Docking station on COM port 2," etc. (The choices may be displayed on, for example, a pop-up list, as shown, or another type of selector.) For other clients on other devices, connection-type settings could include infrared port, modem, Internet server, etc. The preferences panel 507 includes other settings (generally client-specific settings) for the user's adjustment. The backup/restore panel 508 permits the user to generate a back-up copy of the client's dataset or to select previously generated back-up cop(ies) of the client's dataset for restoration to the client.

As discussed in a previous section, the options dialog box 505 may be invoked by, e.g., clicking on the corresponding client indicator on the client chart. The options dialog box 505 may also be invoked by navigating the pull-down menus 440 (shown in FIG. 4A). In particular, the options dialog box 505 may be invoked by pulling open the "Client" menu and selecting a revealed "REX Options . . . " menu command (not shown). Other commands on the "Client" menu include commands for invoking options dialog boxes for the other clients. For example, these other commands include "Schedule+Options . . . ," "Outlook 98 Options . . . ," etc. The other commands on the "Client" menu also include an "Add" command for adding new clients to the Synchronizer's awareness, a "Remove" command for removing clients, and a Backup/Restore command for generating back-up copies of various client(s)' data or for restoring previously backed-up copies of data to particular client(s).

3. Choosing Data-Type Settings

The "Data" menu, when pulled open, includes commands for changing settings related to particular data types. These commands include, for example, a "Time Zones . . . " command, a "Contacts . . . " command, a "Calendar . . . " command, a "Memos . . . " command, and a "To-Do List" command. (Recall that these commands can also be invoked using the shortcut pushbuttons 447, 448, 449, 450, and 451, respectively, from the tool bar 415, as shown in FIG. 4A. Settings for a particular data type include user choices for mapping record files of the particular data type in each dataset to record files of the particular data type in every other dataset. Settings for a particular data type also include user choices for mapping data fields across datasets for each mapping of record files across datasets. The mappings (both of record files and of data fields) may be attempted automatically by the Synchronizer (e.g., by string-matching and thesaurus-string-matching filenames or field names across datasets and matching files or fields to one another if their names are the closest string-match or thesaurus-string-match available). The mappings (both of record files and of data fields) may also be chosen directly by the user, perhaps with the automatically-generated mappings as a starting point or default choice. In the preferred embodiment, automatic mapping of data fields is performed as described, for example, in commonly-owned U.S. patent application Ser. No. 09/020,047, entitled METHODS FOR MAPPING DATA FIELDS FROM ONE DATA SET TO ANOTHER IN A DATA PROCESSING ENVIRONMENT, the disclosure of which has been incorporated by reference. Automatic record file mapping may be performed by analogous methods, except that record file names are used instead of data-field names. Settings for a particular data type also include a user choice of whether to exclude records of the particular data type from synchronization.

a. Choosing Mappings of Record Files

FIG. 5B is a screen-shot that shows a view of a dialog box 515 displayed by the Synchronizer for changing options related to the Contacts data type. The Synchronizer displays the dialog box 515 in response to the "Contacts . . . " command. The dialog box 515 includes a toggle box 516 by which the user can choose whether the Synchronizer is to exclude the particular data type from synchronization. The view includes an active, or exposed, tabbed record-file-mapping panel 517 that displays current mappings for record files and accepts user input for modifying these mappings. Another tabbed panel, panel 518, is not active in the view of FIG. 5B. As shown, a record-file-mapping table 519 (e.g., a two dimensional matrix) is used to represent the record file mappings. The columns of the table 519 represent clients. The rows of the table each represents a mapping of record files, such that the record files named in each row are mapped to each other. In each row (i.e., each mapping), certain cells may contain "<not mapped>" (or another indicator) to show that the particular mapping does not include any record file from the cell's corresponding dataset (i.e., column). Note that a single table, namely table 519, is used to display, and accept user changes to, current mappings for an arbitrary number of clients, for example, for more than two clients. If the table 519 is too large to be fully displayed in the dialog box, then the table 519 is a virtual table, only a portion of which is viewable at one time through a viewing window 520 in the dialog box, and vertical and/or horizontal scroll bars (for example, a horizontal scroll bar 521) will be used to move the virtual table 519 vertically and/or horizontally with respect to the viewing window.

The method of accepting user input for changing record file mappings is as follows. Consider a particular cell (for example, a cell 523) in the table 519. This particular cell corresponds to a particular client, or column. If a user clicks on the particular cell, the Synchronizer displays a pop-up list (for example, a pop-up list 525), preferably near the particular cell, of candidates for a record-file name to appear in the particular cell. The candidates include the names of all record files in the particular client. If other cells in the particular cell's row contain record-file names (from other clients) that differ from the record file names in the particular cell's client, then those different names, appended with the text "<add>" (or another indicator), are also included in the candidates. The addition of the text "<add>" to a candidate indicates that, if the candidate is chosen, the Synchronizer will add a record file having the candidate name to the particular client.

b. Choosing Mappings of Data Fields

The tabbed panel 518 is used to initiate choosing of data-field mappings. When active, or exposed, the tabbed panel 518 contains a copy (not shown) of the record-file-mapping table 519. When the user clicks on any row in this copy (which represents a particular record-file mapping), the Synchronizer responds by displaying a data-field-mapping table 530, in a separate pop-up window 532, as shown in FIG. 5C. This data-field-mapping table displays, and accepts user changes to, current data-field mappings for the record files (from different clients) that map to each other under the particular record file mapping. The data-field-mapping table 530 operates in essentially the same manner as the record-file-mapping table 519 of FIG. 5B, except that the rows in table 530 represent mappings of data fields instead of mappings of record files as in table 519.

4. Choosing Other Settings and Commands

Other settings can be set by the user by using, for example, the pull down menus. For example, the user may use a "Set Current User" command to select a particular existing Synchronizer dataset to be used in the next synchronization; the user may use an "Add User" command to create an additional Synchronizer dataset; or the user may use other commands to select conflict resolution modality, or, optionally, to select duplicate resolution modality, or to reset the Synchronizer dataset, or to choose to overwrite the Synchronizer dataset, or to begin synchronization (including overwriting). Conflict resolution and duplicate resolution will be further described in subsequent section(s). For now, it suffices to note that these modalities may each include the choice of automatic versus manual resolution, and optionally the particular rule(s), e.g., precedence rule(s), to be used by the Synchronizer to perform automatic resolution. These various user commands may also be invoked by the user via shortcuts--e.g., by clicking on relevant indicators on the client chart, for example by clicking on the Synchronizer indicator to bring up a dialog box having the choices "Conflict Resolution . . . " and "Duplicate Resolution . . . ", or simply "Conflict/Duplicate Resolution . . . " and clicking on the desired choice.

F. Methodology for User Interaction

FIG. 6 is a flowchart 600 that summarizes a method used by the Synchronizer for interacting with a user. Many elements within the flowchart have been already been discussed above, or will be described in later sections, and need not be described again in detail. In the flowchart, in a step 605, the Synchronizer depicts (e.g., visually) its clients. In an (optional) step 610, the Synchronizer depicts a nexus. Next, in a step 615, the Synchronizer depicts possible data flow for each client. The step 615 may include: a step 616 of depicting, e.g., a single arrow toward a client indicator for a client to be overwritten, a step 617 of depicting, e.g., a double arrow for a client to be synchronized, a step 618 of depicting, e.g., a naked line for a client not to participate in synchronization, or other steps 619 for depicting indicators for other synchronization modes for a client. After the step 615, in a step 625, the Synchronizer accepts user input, for example, in the manner described in previous sections. If the user input changes the synchronization mode for any client, the Synchronizer re-performs the step 615. If the user input is an instruction to begin synchronization, then, in a step 630, the Synchronizer begins the synchronization and depicts an indication of the ongoing synchronization. For other user input, in steps 635, the Synchronizer may depict changes caused by the user input, for example by displaying or re-displaying visual elements of the GUI to reflect changes caused by the user input.

G. Other Embodiments of the UI

1. Small-Display Embodiment

In a specific embodiment, the Synchronizer is implemented to present output not on a PC-type display screen, but on a much smaller display, for example, a display having about 200-by-160 pixels or fewer, about 160-by-100 pixels or fewer, or about 100-by-60 pixels or fewer. For example, the Synchronizer may be implemented on a REX.TM. organizer-type device or an even smaller device, for example, a wireless pager or a wrist-watch-sized device. In this embodiment, the Synchronizer displays the client chart in the form of multiple display frames that the user can view, one frame or screenful at a time. The user pushes a button (or performs another action) to advance to the next frame. Depending on the number of clients, each frame may include just a subset of all client indicators (e.g., just one client indicator) and the data-flow direction indicator(s) for that subset. One frame may include a synchronizer indicator and associated data-flow indicators. One frame may be a selector or "home" frame that gives a global count or summary of the clients participating in synchronization, and allows the user to select into particular clients' frames, perhaps by clicking on miniature client indicators in a miniature copy of the client chart, wherein a miniature client indicator does not necessarily include the client's full name, but may include an icon or a first one or two letters of the client's name.

2. Audio/Telephone-Interface Embodiment

In another specific embodiment, the Synchronizer's user interface uses audio output to the user, and optionally, audio input from the user as well. This embodiment is useful for a telephone-based Synchronization system, or for a system for the visually handicapped. In this embodiment, the user interface outputs frames that include client indicators and data-flow indicators as described above for the small device implementation, but the frames comprise snippets of audio. In particular, each client indicator contains a recorded or synthesized utterance of the client's name, and the data-flow indicator(s) for each client contain an utterance of the current synchronization setting for the client, for example, "is set to synchronize" or "will be overwritten," etc. The Synchronizer uses these indicators to accept user input by allowing the user to enter input relevant to a particular client after the user hears the client indicator and its associated data-flow indicator(s). For user input, the Synchronizer accepts either keypad input (e.g., from a telephone keypad) or speech input. Keypad input is prompted by voice-menu choices, similarly to those found in telephone voice-mail systems. Speech input is recognized into commands using standard speech-recognition engines (e.g., software engines). Speech recognition software is available from, for example, Dragon Systems of Newton, Mass. or from IBM. After the user changes a synchronization option for a particular client, the Synchronizer repeats the (updated) data-flow indicator for the client, preferably prefaced with a repeating of the client indicator--for example, "REX will be overwritten."

V. "Binary-Based" System for Automatic Synchronization of Two or More Datasets

A. Binary-Based System is Implemented Using Binary Synchronizations

A UI has been discussed above that allows a user to conveniently specify the clients (e.g., two, or even more than two clients) that are to be synchronized, and the manner in which each such client is to be synchronized (e.g., using reconciliation of changes or mere overwriting of data). Given the user's specification of participating clients, a goal of the Synchronizer is to automatically achieve the requested synchronization result for the user, without requiring further user input. A particular, binary-based embodiment of the Synchronizer achieves the desired synchronization result (at least for certain data) by intelligently, and automatically utilizing successive pair-wise synchronizations. The synchronizer can even effect these pair-wise synchronizations by invoking the prior art synchronization systems (e.g., programs 55 or 60 of FIG. 1). (The preferred embodiment, which does not rely on successive pair-wise synchronizations and is more efficient, will be further described in later sections.)

FIG. 7A is a flowchart that illustrates a method 700 used by the binary-based embodiment of the Synchronizer for synchronizing two or more than two datasets. The method 700 includes a user-input step 702, followed by a first-pass step 705, followed by a second-pass step 710. In the user-input step 702, the Synchronizer accepts a user specification of the clients to be synchronized. The user specification may be accepted interactively, for example according to the flowchart 600 of FIG. 6. The user specification may also be accepted without interactive user input, for example from a default configuration file or programmatically from an invoking software process that may not require interactive user input. In the first-pass step 705, the Synchronizer synchronizes pairs of the user-specified clients in a sequence so as to obtain two globally-synchronized clients--i.e., two clients that include changes (if any) from all clients. Preferably, for N clients, (N-1) number of binary synchronizations are performed in the first-pass step, including a final binary synchronization that occurs after all other binary synchronizations in the sequence. The two clients from the final binary synchronization are the globally-synchronized clients. In the second-pass step 710, the Sychronizer binary-synchronizes every not-globally-synchronized client with a globally-synchronized client. Preferably, for N clients, (N-2) number of binary synchronizations are performed in the second-pass step.

FIG. 7B is a flowchart that illustrates a particular implementation 700A of the method 700 of FIG. 7A. As shown, according to the implementation 700A, the first-pass step 705A implements the first-pass step 705 of FIG. 7A, and the second-pass step 710A implements the second-pass step 710 of FIG. 7A to synchronize N clients, C.sub.1, C.sub.2, . . . , C.sub.N. (The user-input step 702 is unchanged from FIG. 7A.) The first-pass step 705A includes a binary-synchronization step 707A, which is performed (N-1) number of times. The second-pass step 710A includes a binary-synchronization step 712A, which is performed (N-2) number of times. In the first-pass step 705A, the Synchronizer binary-synchronizes one particular client, C.sub.1, with every other client, one at a time to obtain two globally-synchronized clients, C.sub.1 and C.sub.N. The sequence of binary synchronizations is C.sub.1 -C.sub.2, C.sub.1 -C.sub.3, . . . , C.sub.1 -C.sub.N. In the second-pass step 710A, the Synchronizer binary-synchronizes each of the clients C.sub.2, . . . , C.sub.(N-1) with one of the client C.sub.1 or the client C.sub.N. Other implementations are possible. For example, the first-pass step could instead use C.sub.1 -C.sub.2, C.sub.2 -C.sub.3, . . . , C.sub.(N-1 ) -C.sub.N as its sequence of binary synchronizations to obtain C.sub.(N-1) and C.sub.N as the globally-synchronized clients, and the corresponding second-pass step could then use, for example, C.sub.N -C.sub.1, C.sub.N -C.sub.2, . . . , C.sub.N -C.sub.(N-2) as its sequence of binary synchronizations.

An optimization may be made to the method 700 if the user has requested that certain clients be "synchronized" by being overwritten instead of by being synchronized in the two-way-reconciliation sense during the synchronization. In such a situation, the Synchronizer first withholds clients to be overwritten from the method 700, and synchronizes the remaining clients using the method 700. Afterward, the withheld clients are overwritten with the synchronization result, which can be found in any of the clients synchronized using the method 700.

In an alternative version of the binary-based embodiment, the Synchronizer does not automatically invoke all binary synchronizations of the method 700, but instead requests that the user invoke some or all of the binary synchronizations. For example, the Synchronizer may maintain a window on a computer display screen from which the Synchronizer displays instructions to the user. The Synchronizer may display a listing or representation of the entire sequence of needed binary synchronizations in the window (with window scrolling, if the window is too small to simultaneously display all the instructions). Alternatively, the Synchronizer may display a listing of one or a few binary-synchronizations in the window at a time (for example, "please synchronize REX and Outlook, now, and click OK when finished") and await the user's confirmation before displaying a listing of further binary-synchronizations to be performed.

B. Basic Binary-Based System May Resolve Conflicts Incorrectly

As alluded to in the Background section, a problem with using conventional binary synchronizations in sequence to effect synchronization of more than two datasets is that data-modification times, that may be available during earlier-performed binary synchronizations, may not be available during later-performed binary synchronizations. Thus, conflict resolution that relies on comparing data-modification times may fail during later-performed synchronizations and thereby cause errors. (Even user-assisted conflict resolution may fail because the synchronization system will be unable to provide the user with data-modification times for use in the user's decision-making process.) The binary-based system as described in the previous section would also suffer such failure, if implemented using invocations of prior-art binary synchronizers such as programs 55 or 60 of FIG. 1 to binary-synchronize the clients. FIG. 7C demonstrates an example of such a failure.

FIG. 7C is a table that shows the contents of three example clients at various times during a sequence of binary synchronizations initiated by the binary-based embodiment of the Synchronizer. In the example, the binary synchronizations are set to resolve conflicts by giving effect to the most recent (later) change and ignoring the less recent (earlier) conflicting change. As shown by upper column headers 720, the example clients are referred to as Clients A, B, and C. Each client contains three records, labeled R1, R2, and R3, that each corresponds to the identically-labeled record in each other client.

Initially, at Time 0 (row 724), corresponding records R1, R2, and R3 in all clients have identical (e.g., synchronized) values of 1, 2, and 3, respectively. Next, at Time 1 (row 726), the user changes the value of record R3 in Client B to 3B. Next, at Time 2 (row 728), the user changes the value of record R1 in Client A to 1A, the value of record R2 in Client B to 2B, and the value of record R3 in Client C to 3C. Each client records the time of modification of each record. Next, the binary-based Synchronizer begins to synchronize Clients A, B, and C using repeated binary synchronizations according to the steps 705A and 710A of FIG. 7B. The sequence of binary synchronizations will be A-B at Time 3, followed by A-C at Time 4, followed by A-B again at Time 5. At Time 3, Clients A and B are binary-synchronized such that they both contain the values 1A, 2B, and 3B (row 730). In particular, Client A's record R3 has been modified to have the value 3B, and Client A has dutifully recorded the modification time of its record R3 as Time 3. At Time 4, Clients A and C are binary-synchronized such that they both contain the values 1A, 2B, and 3B (row 732). In particular, Client C's record R3 has been modified to have the value 3B as a result of conflict resolution of the value 3B in Client A and the value 3C in Client C. The value 3B prevailed in the conflict resolution because its modification time of Time 3 in Client A is more recent than 3C's modification time of Time 2 in Client C. This is the wrong result, because the user actually entered the value 3C (Time 2 in Client C) more recently than he or she entered the value of 3B (Time 1 in Client B). Thus, because the original (user-) modification time (Time 1) of the value 3B was not available for use in conflict resolution during the binary A-C synchronization, the wrong result is obtained. At Time 5, the second-pass A-B binary synchronization produces no further changes (row 734). In summary, a wrong result (summarized in row 736) results from the binary-based Synchronizer because it relies on prior-art binary synchronizations of clients to try to effect a synchronization of more than two clients. The desired, correct result of 1A, 2B, and 3C is shown in row 738.

C. An Improved Binary-based System Resolves Conflict Better

As shown in the preceding example, the problem with conflict resolution occurs because when a record in a client is updated by a binary synchronization, the record's last-modification time is set to the time of the binary synchronization. Thereafter, if the updated record is used in subsequent binary synchronization, its last-modification time, which is the "priority time" that is used in conflict resolution, is not the updated record's original modification time. (The original modification time is the time that the user, or other non-Synchronizer entity, first entered a change into a corresponding record to thereby originally cause the updating of the updated record in the first place.) One might be tempted to try to repair this defect by setting the last-modification-time field of records updated or added during binary synchronization to the original modification timestamp as obtained from the other client. However, this simpleminded approach would fail for a first and a second reason. The first reason is that most clients produce their own timestamps and will not let a synchronization program set the last-modification-time field of records to arbitrary values. The second reason is that the last-modification-time field of records in a client is used not only in conflict resolution but also to determine the changes (for example, additions, deletions, or modifications) that have been made since a previous synchronization of the client. Therefore, setting a last-modification-time field for a record to an earlier time may cause the last-modification-time field to have a value that is earlier than a previous synchronization of the client. If such a situation occurs, the record may be erroneously ignored during a subsequent synchronization with another dataset and its value may thereby erroneously not be propagated to the other dataset.

An improved binary-based Synchronizer repairs the conflict-resolution problem of FIG. 7C by storing two timestamps for each record: a last-modification-time for determining the changes since a prior synchronization, and an original-modification-time to be used as the "priority time" that is compared during automatic conflict resolution or displayed to the user during user-assisted conflict resolution. In part because it is impractical to try to add a new, original-modification-time field to all records of all possible clients, the improved binary-based Synchronizer requires only that a single dataset, here referred to as the reference dataset, which is to be available always for the Synchronizer's use, include the new, original-modification-time field for all its records. If it is not convenient or possible for one of the user's datasets to be used for this purpose, the improved binary-based Synchronizer can create its own private reference dataset that has the new, original-modification-time field. The reference dataset will always be used as the "client" C.sub.1 in the steps 705A and 710A of FIG. 7B. For conflict resolution, the priority time that is used for the reference dataset C.sub.1 is its original-modification-time field, and the priority time used for the other datasets C.sub.2, . . . , C.sub.N need not be changed by the improvement (i.e., can be the last-modification-time field).

Even though the improved binary-based Synchronizer solves the conflict-resolution problem of FIG. 7, it pays a price. This price is that the improved binary-based Synchronizer cannot easily be implemented using repeated calls to existing, prior-art binary-synchronization systems. This is because the improved binary-based Synchronizer requires a new timestamp field (the original-modification-time) in at least one dataset (perhaps a brand new dataset) and requires this new timestamp field to be the priority time used in conflict resolution for at least the reference dataset. Thus, the improved binary-based Synchronizer loses an advantage of the non-improved binary-based Synchronizer, which is that existing prior-art binary-synchronizers could be intelligently invoked to effect a novel automatic synchronization result for even more than two datasets.

VI. Comparison of Timestamps from Different Datasets

A. Conversion Into a Common Time

In comparing the timestamps from different clients, for example, during conflict resolution (automatic or user-assisted) in any embodiment of the present invention, the timestamps are preferably first converted into a common time (e.g., GMT) in the synchronizer's memory, if possible, for comparison. This conversion is performed by, for example, the improved binary-based embodiment described above or the preferred embodiment that is further discussed in other sections. Each client's timestamps are converted into a common time in a way that is appropriate to the client. For example, if the client's timestamps individually include time-zone information (e.g., "timezone-stamps"), then the client's timestamps are directly converted into the common time using simple time-zone conversion. Otherwise, if the client's clock includes current time-zone information and the client's timestamps are assumed to share the clock's current time zone, then the client's timestamps are converted into the common time using simple time-zone conversion from the clock's current time zone. Otherwise, if the client's clock is known to have a constant offset with respect to another clock for which time-zone information is known or can be presumed, and the client's timestamps are assumed to have the same constant offset, then the client's timestamps are converted into the common time by shifting by the offset and by simply converting into the other clock's time zone. The value of the offset is preferably the value as observed, computed, and recorded by the Synchronizer at the end of a most recent previous synchronization (for improved compatibility with an optional clock-drift compensation scheme further described in another section), but may also be other values. For example, the offset value may be a current offset observed and computed by the Synchronizer at the beginning of the current synchronization (especially if this is a first synchronization of the dataset), or an offset supplied by the user.

B. Clock Drift

In practice, clocks on separate devices may run at different rates due, for example, to clock imperfections. If the system clocks of multiple datasets drift apart from one synchronization to the next, comparison across datasets of timestamps generated after the previous synchronization is potentially suspect. A user may re-set the clock used by a particular dataset, for example to reflect a new time zone on a device that does not explicitly support time zones. Such clock re-setting can also constitute an artificial type of drift that can also endanger timestamp comparisons. Additionally, a user may change the active time zone without actually re-setting the underlying time on a device that supports time zones. If the device's dataset has adequate protection of its existing timestamps against such a change of the active time zone, then comparison using those timestamps should be safe against such a change. However, if the device's dataset does not provide adequate protection, the time-zone change again constitutes a kind of drift that can endanger comparison using those timestamps. (Examples of adequate protection include timestamps that explicitly include timezone-stamps, or existing timestamps that are automatically updated throughout the dataset to reflect any newly chosen active time zone.) Embodiments of the present invention (including the previously-described improved binary-based embodiment and the preferred embodiment that is further described in other sections) take steps to detect, and if appropriate, try to compensate for such clock drift. The following description of clock-drift detection and compensation methods applies primarily to clock drift caused by clock imperfection, but is also applicable to, and may briefly discuss, the other types of timestamp-endangering clock drift.

C. Clock-Drift Detection

FIG. 8A is a flowchart that illustrates a method 800 that is optionally used by the Synchronizer to detect clock drift for a particular dataset's clock since the most recent previous synchronization involving the dataset (here referred to as "the previous synchronization"). The method 800 includes steps 805, 810, 815, 817, 820, and 825. In the step 805, the Synchronizer has simultaneously read the dataset clock and the reference clock at the end of the previous synchronization and recorded the dataset clock's reading, the reference clock's reading, and the dataset clock's active time zone, if any, for possible use in a later (i.e., a current) synchronization. (In an alternative embodiment, the dataset clock's reading is not directly recorded but an offset of that reading versus the reference clock's reading is recorded.) In the subsequent step 810, at the beginning of the current synchronization, the Synchronizer simultaneously reads the dataset clock and the reference clock. Preferably, the reference clock has full time-zone support and has not been re-set since the previous synchronization. In the subsequent step 815, the Synchronizer obtains (e.g., computes) the offsets between the dataset clock and the reference clock both from the end of the previous synchronization (previous offset) and from the beginning of the current synchronization (current offset). The Synchronizer also computes in the step 815 the difference between the current offset and the previous offset. This difference (which may be negative) is here referred to as the observed clock drift, or T.sub.DRIFT. The Synchronizer also computes in the step 815 the difference, or elapsed time, or T.sub.ELAPSED, between the two reference clock readings (i.e., between the end of the previous synchronization and the beginning of the current synchronization). To give a concrete example, at the beginning of a synchronization, T.sub.ELAPSED =1,000,000 seconds according to the reference clock may have elapsed since the end of a previous synchronization, and a dataset's clock may have gained T.sub.DRIFT =100 seconds versus the reference clock since then.

In general, for detecting in the step 815 drifts caused by clock imperfection or actual clock-resetting by the user, the Synchronizer computes the offsets after converting the dataset clock's reading to a common time, e.g., GMT, if possible, as described in the previous section. For detecting in the step 815 drifts caused by a change of active time-zone in a dataset that doesn't protect existing timestamps, the Synchronizer computes the current offset after converting the dataset's current clock reading to a common time, but under a forced pretense that the dataset's current clock reading represents the dataset's time zone at the end of the previous synchronization. The Synchronizer performs this latter type of conversion if it detects that the dataset's active time zone has changed since the previous synchronization and that it is of the type that does not protect existing timestamps.

In the subsequent step 817, the Synchronizer determines the presumed range (maximum and minimum) by which the dataset clock may have drifted at any time since the prior synchronization. For simplicity, it is currently preferred that the Synchronizer implicitly assume that the observed clock drift is one extreme of this "drift range" and that the other extreme is zero drift, and hence the step 817 is inherently completed as soon as the step 815 is completed. (This assumption is reasonable, for example, if the drift is due only to clock imperfection.) In alternative embodiments, the Synchronizer may seek or have additional information about drift characteristics of the dataset's clock. For example, the Synchronizer may ask the user for the greatest-magnitude amount in both positive and negative directions by which the user or other factors may have shifted the dataset's clock since the last synchronization. This query may be accomplished, for example, by asking the user to check off all the time zones that he or she has used as an active time zone since the last synchronization. For another example, the Synchronizer may access a known history of the clock's re-settings by its user, for example, from the client if the client has been programmed to record such information according to an aspect of the present invention. For yet another example, the Synchronizer may access interim readings of the clock (calibrated to the reference clock) which may exist, according to an embodiment of the present invention, because the Synchronizer automatically, and periodically reads the client's clock even when no synchronization is being performed. If the observed clock drift from the step 815 exceeds one extreme of the user-supplied or otherwise-obtained range, then the observed clock drift replaces the exceeded extreme in the drift range. Preferably, the range must include zero, and if it doesn't (i.e., if both drift range boundaries somehow have the same sign), then the Synchronizer replaces the drift-range boundary nearest to zero with zero.

To flag the presumed range of possible clock drift since the previous synchronization (i.e., the drift range), the Synchronizer in the step 820 optionally produces a warning message to the user, for example, via a synchronization log, if the maximum clock drift (either positive or negative) exceeds in magnitude a user-preset or Synchronizer-default clock-drift threshold value. Examples of useful default clock-drift threshold values include about zero, about ten seconds, about five minutes, about one hour (a difference due to daylight savings time), about twenty-four hours (the maximum difference due to time zone difference), or about twenty-five hours. Separate threshold values may be used for positive versus negative drift or for different types of drift. (Different types include, for example, drift thought to be caused only by clock imperfection, drift thought to be caused at least in part by user clock-resetting, or drift thought to be caused at least in part by a change of a time zone in a dataset that does not adequately protect existing timestamps). After any warnings have been produced, the Synchronizer proceeds to perform synchronization, in the Step 825, preferably using clock-drift compensation according to an aspect of the present invention, as is further described in the next section.

D. Clock-Drift Compensation

FIG. 8B is a flowchart that illustrates an optional method 830 used by embodiments of the Synchronizer to attempt to compensate for clock drift in the event that any timestamp ("the timestamp") from a particular dataset ("the dataset") needs to be compared to timestamp(s) made by other clock(s). The method 830 includes steps 835, 840, 845, and 850. The method 830 assumes that the steps of the clock-drift detection method 800 of FIG. 8A have already been performed. In particular, the method 830 assumes that the presumed drift range (i.e., possible clock drift since the previous synchronization) has already been determined for the dataset's clock ("the clock"), according to the step 817 of FIG. 8A.

Recall from a previous section that comparison of timestamps across datasets is preferably preceded by a conversion of the timestamps, at least in the Synchronizer's memory, into a common time such as GMT for comparison. Under the clock-drift compensation scheme, this conversion is accompanied by other actions. In particular, in the step 835, the Synchronizer converts the timestamp not merely into a common time but (also) into a range of possible "true," or drift-compensated common times. The range is here referred to as the "timestamp range" and is defined by an upper and a lower timestamp-range boundary. Each timestamp-range boundary is simply the timestamp's common time minus a respective one of the boundaries of the clock's presumed drift range. (If one of the boundaries is zero, then the timestamp's common time is itself one of the timestamp-range boundaries.) The subtraction of the drift-range boundary can be performed either before or preferably after conversion of the timestamp to the common time. In general, whenever the Synchronizer seeks a latest (or earliest) timestamp from a set of timestamps from different clocks, each of the timestamps will have a range of possible values in the common time. In the step 840, the Synchronizer compares the timestamp ranges of all timestamps to seek a latest (or an earliest) timestamp (for example, as a part of conflict resolution). If the Synchronizer can find a latest (or earliest) timestamp whose possible values are uniformly higher (or lower) than the possible values of any other timestamp (i.e., the found timestamp's possible values do not overlap with those of any other timestamp), then, in the step 845, that found timestamp can be used as the correct latest (or earliest) one.

However, if no such single, "clear-winner" timestamp can be found, then a winner must be selected, in the step 850, from multiple candidate timestamps whose possible values overlap. The Synchronizer may use a pre-set rule (e.g., a rule selected by the user) to determine the winner. The Synchronizer may also optionally display the candidate timestamps' corresponding records to the user and allow the user to select the winner. In allowing the user to select the winner, the Synchronizer may make one or more recommendations based on one or more pre-set rules. The preferred rule for automatically determining or recommending a winner is for the Synchronizer to convert each candidate timestamp into a "model-drift-compensated" time, according to any known or assumed model of the candidate timestamp's clock's drift. Detailed models of the clock drift may be built according to any known drift characteristics of the dataset's clock, for example, characteristics as described in connection with the step 817 of FIG. 8A. An example of a detailed model is one built by the Synchronizer using piecewise-linear connecting of clock drift observed from any interim calibrated readings of the clock. Typically, there is no particular known drift characteristic of the dataset's clock, and, as described earlier, each candidate timestamp's drift range is merely between zero and the observed clock drift, T.sub.DRIFT. In this situation, the dataset's clock drift is assumed to be constantly monotonic--i.e., the clock drift is modeled as being linear with respect to elapsed reference time--i.e., the model is built using a linear connecting of the present synchronization's observed clock drift and the previous synchronization's zero clock drift. Therefore, the model-drift-compensated time, T.sub.MCOMP, is set to the timestamp with a linearly interpolated portion of the observed clock drift, T.sub.DRIFT, subtracted out. The model-drift-compensated time is shown by the following equation:

T.sub.MCOMP =T.sub.STAMP -T.sub.DRIFT *(T.sub.STAMP -T.sub.PREV)/(T.sub.ELAPSE +T.sub.DRIFT)

where T.sub.PREV is the timestamp of the dataset clock at the end of the previous synchronization (i.e., when the clock drift was zero). For the concrete example given earlier, T.sub.MCOMP =T.sub.STAMP -100*(T.sub.STAMP -T.sub.PREV)/(1,000,100). For other types of models (for example, multi-piece, piecewise-linear models), analogous interpolation techniques or other techniques can be used, as appropriate to the particular model. Thereafter, the Synchronizer compares the model-drift-compensated time of the candidate timestamps (preferably, each converted to a common time such as GMT) and chooses the latest (or earliest) as the winner, and uses the winner in step 845.

E. Clock-Drift Avoidance

1. Example: Keep "Pre-Fresh Threshold" in Client's Time

The Synchronizer avoids some pitfalls and complexities of clock drift by avoiding comparison of timestamps from different clocks in certain circumstances, if possible. To understand this, note that one aspect of synchronization is that "fresh" changes generally need to be requested or determined from the client at the start of a synchronization. Here, fresh changes refer to changes in the client (for example, additions, modifications, and deletions) that have not been previously seen by the Synchronizer and are therefore fresh with respect to the Synchronizer. For this purpose of identifying fresh changes, the Synchronizer records a time, which may be called a "pre-fresh threshold," during each synchronization for use in the next synchronization involving the client. During the next such synchronization, changes occurring later than the recorded pre-fresh threshold will be considered "fresh" changes and will be examined by the Synchronizer for possible propagation to other dataset(s). The Synchronizer records a pre-fresh threshold for every particular client involved in a synchronization, and records the pre-fresh threshold for each client according to the particular client's clock, and not merely according to the Synchronizer's own clock. In this way, at the next synchronization involving any particular client, the pre-fresh threshold for the particular client can be recovered from the recorded information in the client's clock's own time. Thereafter, changed records in the client can be identified as fresh by comparing the changed records' modification timestamps directly with the pre-fresh threshold for the client without need for time conversion (assuming that no truly-fresh change has nevertheless received a timestamp earlier than the pre-fresh threshold as a result of the client's clock' being artificially re-set, after the previous synchronization, to a time earlier than the client's pre-fresh threshold).

2. An Improved Pre-Fresh Threshold

The time of synchronization (e.g., a clock reading during a synchronization) may be used as the pre-fresh threshold. However, in the preferred embodiment, the Synchronizer will record, as the pre-fresh threshold, the latest last-modified timestamp seen on any record in the client during the synchronization (except timestamps caused by the synchronization itself). Using this timestamp instead of the synchronization time can produce a more reliable result in the next synchronization for certain clients, in the situation that a change was made in those clients (slightly) earlier than the time of the synchronization, but the client was somehow not able to produce the changed record for the Synchronizer's use during the synchronization.

F. Clock-Synchronization

The Synchronizer may optionally synchronize the datasets' devices' system clocks by re-setting them to match a clock that is considered to be a reference clock, for example by re-setting them to have a same offset (perhaps zero) from the reference clock as they had during a most recent previous synchronization. Preferably, each clock is re-set, if at all, after any reading of the device's clock for drift detection and/or drift compensation, as described earlier, and before synchronization begins for the device's dataset(s), or at least before the time when the time of synchronization is recorded for use during a subsequent synchronization. The precise timing of re-setting the clocks may be varied depending on the particular characteristics of the devices and datasets involved. The reference clock may be the system clock of the Synchronizer dataset, or the system clock of the reference dataset in the improved binary-based embodiment, or another reference clock. Preferably, the Synchronizer will