Database dependency resolution method and system for identifying related data files5734886Abstract A method and system for displaying names of data files in a collection of data files represented by a corresponding symbol. According to one embodiment of the present invention, a user may display a listing of subroutine library files required to execute a particular subroutine. In such an embodiment, the user may enter the subroutine name as the symbol of interest and the system would display the library file containing that subroutine as well as those data files that contain subroutines called by that subroutine of interest. The present invention uses a transitive closure technique to traverse a data structure generated from a database and retrieve the data file list. The transitive closure technique enables the use of a compact database that contains only the data file names, corresponding symbol names, and symbol names of only data files for each data file that are directly related to that data file. Claims We claim: Description FIELD OF THE INVENTION
TABLE A
______________________________________
file V;
def A, B; ref D, E;
file W;
def C;
file X;
def E;
file Y;
def F;
file Z;
def D, G; ref E, F.
______________________________________
The relationship database of Table A contains information fields for symbol names and related data file information for five data files named V, W, X, Y and Z. The data file names appear in a data file name field preceded by the word "file". The corresponding symbol names appear in symbol name fields in the same row as the respective data file names. Each symbol name field is preceded by the term "def". For example, the symbol name corresponding to the data file W is C. Further, if a particular data file entry is directly related to, or dependent on, one or more data files, then the symbol names for those data files are contained in a corresponding related symbol field preceded by the term "ref". For example, the data file V is directly related to the data files having the corresponding symbols D and E. Conversely, if a data file is directly related to no other data file, such as the data file W in Table A, then no related symbol field exists for that data file entry. A data file may be represented by more than one symbol, as shown with respect to data files V and Z in Table A. The symbol names for data file V are A and B, and the symbol names for data file Z are D and G. The ability to assign more than one symbol name to a data file is important in many applications including those where a particular data file is a subroutine library with each subroutine corresponding to a different task. In such an application, each of symbol names representing that library data file may correspond to a particular one of the subroutines contained in the data file. The relationship database of Table A further identifies only those symbols corresponding to data files to which a particular data file is directly related, and not symbols for data files to which the particular data file is indirectly related. For example, in Table A the data file V is directly related to the data files corresponding to symbols D and E, which are data files Z and X, respectively. The data file Z is further directly related to data files corresponding to symbols E and F which are data files X and Y, respectively. Therefore, the data file V is indirectly related to the data file Y through its direct relation to the data file Z. The relationship database configuration shown in Table A minimizes the amount of memory required and administrative burden to maintain the relationship database independent of the number of data files or the hierarchical depth of the depending relationships. Further, the configuration of the relationship database of Table A is such that a simple routine may be created to automatically generate a relationship database. Such a routine may scan each data file maintained by the file server to determine the corresponding subroutine names as symbols and the corresponding called subroutines names as the directly related symbol names. A small relationship database is shown in Table A for illustration purpose only. In many applications of the present invention including typical file servers, the number of data files and related data files would be substantially larger and may be on the order of thousands. An exemplary database dependency resolution method in accordance with the present invention that is capable of using the relationship database of table A is shown in FIG. 3. According to one aspect of the present invention, the method of FIG. 3 may replace or supplement typical operating system routines for directory and data file listings. According to a second aspect of the present invention, the method of FIG. 3 may be employed in an implementation of a communications protocol, such as File Transfer Protocol ("FTP") or World Wide Web ("WWW"), for transferring files to remote users. A detailed discussion and usage of FTP and WWW are provided in, for example, Gibbs, M. and Smith, R., Navigating the Internet, Chaps. 3, 4 and 9, pp. 35-86, and 158-172 (Sams Publishing, IN, 1993). Referring to FIG. 3, the database dependency resolution method first identifies and reads the proper relationship database in step 210. For instance, if the file server 100 employs a file based operating system, such as MS-DOS or Unix, the data files may be organized into corresponding file directories with corresponding relationship databases existing for each of the file directories. In such a file server 100 the proper relationship database would be the one residing in the particular directory in which the user has accessed or is viewing. In an alternative arrangement in the file server 100, relationship databases may be maintained for a particular directory and all or particular ones of respective subdirectories. In addition, a single relationship database can be used for all the data files in the file server 100 whether or not the file server 100 maintains data files within a directory format. The database dependency resolution program then generates a data structure representing the relationships identified in the relationship database in step 220. The data structure is effectively organized as a relationship graph, such as that illustrated in FIG. 4, which is described in detail below. Suitable mapping routines for generating such a data structure could be readily generated, as is known in the art. Such a mapping routine is contained in the Appendix attached hereto. The mapping routine in the attached Appendix generates an array of files with each file containing a list of corresponding related symbols, and a hash table providing a cross-reference between the symbols and the array files. After generating the data structure, the database dependency routine then retrieves the symbol of interest entered or selected by the user in step 230. The symbol of interest may have been entered by the user at the initiation of the database dependency resolution method. Then, in step 240, the database dependency resolution method traverses the data structure based on the symbol of interest using a transitive closure technique to identify and retrieve the corresponding collection of data file names. Transitive closure techniques are known graph algorithms which have been previously used by program linkers when building software programs that call precompiled library subroutines. One suitable transitive closure routine is described in Aho, A., Hopcroft, J., and Ulman, J., The Design and Analysis of Computer Algorithms, Sect. 5.7, pp. 198-199 (Addison-Wesley, 1974), which is incorporated by reference herein. The data file names retrieved in step 240 include the data file represented by the symbol of interest as well as all data files that are directly and indirectly related to that data file. The retrieved data file names may then be used to obtain additional information about the corresponding data files, such as file size, file type and creation date, as indicated in the optional step 250. This information may be obtained by making the necessary calls to the appropriate file server operating system routines in a manner well known in the art. Then, in step 260, the symbol of interest, retrieved data file names and any additionally obtained information, are arranged in a predetermined display format. The arranged information is then displayed on the user's display in step 270. The database dependency resolution method of FIG. 3 may be performed while a user is on-line with the file server to provide a convenience of enabling the user to substantially instantaneously identify which data files correspond to a particular symbol. In applying the database dependency resolution method of FIG. 3 to the above relationship database in Table A, the method would first read the relationship database as indicated in step 210. Then, the method generates a data structure corresponding to the relationship database as indicated in step 220. A graph 300 representing such a data structure is illustrated in FIG. 4. In FIG. 4, each data file is represented by a box, such as the box 310, with the data file name appearing in a region in the upper-left corner of the box, such as region 320, and the corresponding symbol names appearing in the center of the box. For example, box 340 of FIG. 4 represents the data file Z that is represented by symbols D and G as indicated in Table A. The dependency or relation of one data file to another is represented by a broken line, such as line 330, between the corresponding data file boxes. Further, a data file that is dependent on another data file appears higher on the page than the other data file. For example, data file Z 340 is dependent on and appears higher on the page than data files 350 and 360 represented by the symbols E and F. Note that data file W 370 is not dependent on any other file and therefore, no relationship lines extend from it. After generation of the data structure, the method retrieves the symbol of interest entered or selected by the user, such as the symbol B. The symbol B represents the data file V 310, as indicated in step 220. The symbol of interest is used by a transitive closure technique to traverse the data structure and retrieve the corresponding file name and the names of any related data file as stated in step 240. Since the symbol B corresponds to the data file V 310, the data file name V will be retrieved. Also, since the data file V 310 is directly dependent on or related to the symbols D and E as indicated by relationship lines 330 and 335, the corresponding data files names Z and X are also retrieved. Further, since the data file name Y is dependent on the symbol F as indicated by the relationship line 345, then the corresponding data file name Y is also retrieved. Note that the data file Z is also dependent on the symbol E, the data file X, which has already been retrieved due to its direct relation to the data file V 310. Therefore, the retrieved list of data file names for the symbol B are V, Z, X and Y. Information concerning each of the file names is then obtained as indicated by step 250. The obtained data file information may include the creation dates and size corresponding to the retrieved data file names. The retrieved data file names and the corresponding obtained information are then arranged into a predetermined format in step 260. The predetermined format may be a suitable format for direct display on the file server 100 or data format for transmission over the Internet to a remote user. Lastly, the information in the predetermined format is then displayed or transmitted for display, as indicated in step 270. The information displayed may be as follows:
______________________________________
Symbol Data Files Creation Date
Size (KB)
______________________________________
B V 1994 2.8
Z 1993 1.2
X 1993 4.3
Y 1992 9.2
______________________________________
The usage of a transitive closure technique for identifying names of related data files enables the present invention to employ a relationship database of minimal size as well as simple reading and data structure mapping routines. The required relationship database may require minimal memory space because information concerning symbols of only directly related data files and not indirectly related data files need be maintained for each data file. Since the database needs to contain only the directly related symbols, the database can be automatically generated by a simple routine which scans each data file and identifies the symbols of the related data files indicated therein. Thus, the present invention not only provides a convenience to users in identifying the names of data files contained in a particular collection but also utilizes a compact relationship database that is a minimal burden to the file server administrator in its creation and maintenance. The database dependency resolution method of the present invention may be implemented in a shell script and be accessible to remote users accessing the file server 100 of FIG. 1 in a substantially similar manner as standard FTP or WWW commands. In the alternative, the method may be implemented as a computer program, such as a C language program, that operates above the FTP or WWW command programs to retrieve the commands received from the user remote computer. In such an arrangement, all standard FTP or WWW commands are passed along to the appropriate routines, while the virtual file listing command is directed to, and processed by, the database dependency resolution method. In addition, the method may be implemented in an operating system of the processing system to be accessible to users in a substantially similar manner as standard directory file listing routines. Although one embodiment of the method and system of the present invention has been described in detail above, many modifications to the described embodiment are possible without departing from the teaching of the present invention. All such modifications are intended to be encompassed by the claimed invention. For example, the method of the present invention may be used to display a list of corresponding data files for each available symbol by performing the method on each symbol listed in the relationship database. Further, although the present invention was described with respect to a file server on the Internet, any processing system in which data files are maintained that may depend on other data files may used with the present invention.
APPENDIX
______________________________________
/* This reads from stdin a relationship database file and builds 1) an
array
of files, each with a list of symbols referenced, and 2) a hash table
mapping defined symbols into files. Now, to compute the transitive
closure of a list of symbols given as arguments on the command line,
keep a stack of symbols yet to be resolved, a stack of files yet to be
expanded, and a list of files processed. */
#include <string.h>
#include <stdio.h>
struct HashEntry {char* Key; int Code;};
typedef struct HashEntry *HashTable;
extern char*
Fgets(char*,int,FILE*);
extern void
hashdel(HashTable*,int);
extern int
hashins(HashTable*,int,char*,int);
extern int
hashasu(char*,int);
extern int
hashpjw(char*,int);
extern int
hashsrch(HashTable*,char*);
extern void*
Malloc(size.sub.-- t);
extern void*
Realloc(void*,int);
extern char*
Strdup(char*);
typedef struct{
char *name;
char **ref;
/* array of symbols referenced by this file */
int nref;
int done; /* has this already been added to hit? */
} File;
typedef struct{
File *file;
int nfile;
HashTable Def;
/* symbol --> index of file defining it */
} Dependencies;
static void
save.sub.-- refs(File *f, int nref, char **ref)
f-->ref = (char**)Malloc(nref*sizeof(*f-->ref));
memcpy(f-->ref,ref,nref*sizeof(*f-->ref));
f-->nref = nref;
}
void
getdepend(Dependencies *d)
{
int i; /* HashTable index */
int nf = 0; /* index of file being read */
int maxf = 1000; /* guess at an upper bound for nf */
int nref, maxref = 5000; /* guess at an upper bound for nref */
int linelen;
char *name;
char line›1000!;
char **ref = Malloc(maxref*sizeof(*ref));
File *f = Malloc(maxf*sizeof(*f));
HashTable D = 0;
while(Fgets(line,sizeof(line),stdin)|=NULL){
if(line›0!==0 .linevert split..linevert split. line›0!==`#`) continue;
linelen = strlen(line);
if(line›linelen-1!==`.backslash.n`)
line›--linelen! = `.backslash.0`;
name = Strdup(line+2);
switch(line›0!){
case `F`:
if(nf>0)
save.sub.-- refs(&f›nf!,nref,ref);
nref = 0;
nf++;
if(nf>=maxf){
maxf *= 2;
f=Realloc(f,maxf*sizeof(*f));
}
f›nf!.name = name;
f›nf!.done = 0;
/* FALL THROUGH (file implicitly defines own name) */
case `D`: i = hashsrch(&D,name);
if(i<0) hashins(&D,i,name,nf);
break;
case `R`:
nref++;
if(nref>=maxref){
maxref *= 2;
ref=Realloc(ref,maxref*sizeof(*ref));
}
ref›nref-1! = name;
break;
/* ignore other lines */
}
}
if(nf>0 && nref>0)
save.sub.-- refs(&f›nf!,nref,ref);
free(ref);
d-->file = f;
d-->nfile = nf;
d-->Def = D;
}
typedef struct{
char **p;
int np, maxp;
} stack;
static void
new.sub.-- stack(stack *s)
{
s-->maxp = 1000;
s-->p = (char**)Malloc(s-->maxp*sizeof(char*));
s-->np = 0;
}
static void
push(char *p, stack *s)
{
if(s-->np==s-->maxp){
s-->maxp *= 2;
s-->p = (char**)Realloc(s-->p,s-->maxp*sizeof(char*));
}
s-->p›s-->np++! = p;
}
static char*
pop(stack *s)
{
if(s-->np==0) return 0;
return(s-->p›--s-->np!);
}
static void
resolve(Dependencies *d, stack *hit, stack *need)
{
int i, k;
char *r;
File *f;
while(r = pop(need)){
i = hashsrch(&d-->Def,r);
if(i>0){
f = &d-->file›d-->Def›i!.Code!;
if(|f-->done){
f-->done = 1;
push(f-->name,hit);
for(k = 0; k<f-->nref; k++)
push(f-->ref›k!,need);
}
} /* else unsatisfied external */
}
}
static int
cmp(const void*a, const void*b)
{
return(strcmp(*(char **)a, *(char **)b));
}
void
main(int argc, char**argv)
{
Dependencies D;
stack hit, need;
int i;
new.sub.-- stack(&hit);
new.sub.-- stack(&need);
while(argc>1)
push(argv›--argc!,&need);
getdepend(&D);
resolve(&D,&hit,&need);
qsort(hit.p,hit.np,sizeof(char*),cmp);
for(i = 0; i<hit.np; i++)
printf("%s.backslash.n",hit.p›i!);
exit(0);
}
______________________________________
|
Same subclass Same class Consider this |
||||||||||
