Internet computer system with methods for dynamic filtering of hypertext tags and content6615266Abstract An Internet computer system with methods for dynamic filtering of hypertext tags and content is described. The system includes one or more Web clients, each operating a Web browser (e.g., Netscape Navigator or Microsoft Internet Explorer) with an Internet connection to one or more Web servers. Each client includes, interposed between its browser and communication layer, a Filter module of the present invention which traps and processes all communications between the browser and the communication layer. The Filter module, which implements client-side methodology at each individual Web client for dynamic filtering of hypertext tags and content, includes an output stream, a processing loop, a Filter method, and an input stream. During system operation, the Web browser generates multiple requests for retrieving content. More particularly, particular content is retrieved by a fetch or GET command (e.g., using HTTP protocol) transmitted to a target server from the client-side communication layer (e.g., Winsock driver). The command is, however, trapped by the Filter module. The "real" request or command is at this point processed by the Filter method of the Filter module. Accordingly, the system can modify the command, delete the command, synthesize new commands, or pass through unchanged the existing command. In an exemplary embodiment, the Filter method provides handlers for specific processing of various HTML (Hypertext Markup Language) tags, all operating according to user-configurable filtering preferences. Claims What is claimed is: Description COPYRIGHT NOTICE
<HTML>
<HEAD>
<TITLE>Title of the Web page </TITLE>
</HEAD>
<BODY>
An example of a simple
<B>Web</B>
page.
</BODY>
</HTML>
As illustrated, required elements include the <HTML>, <HEAD>, <TITLE>, and <BODY> tags, together with any corresponding end tags. The tags used function as follows. The first pair of tags, <HTML></HTML>, defines the extent of the HTML markup text. The <HEAD></HEAD> tag pair contains descriptions of the HTML page; this meta information is not displayed as part of the Web page. The <TITLE></TITLE> tag pair describes the title of the page. This description is usually displayed by the browser as the title of the window in which the Web page is displayed. This information is also used by some search engines to compile an index of Web pages. The next tag pair, <BODY></BODY>, delimits the body of the Web page. In the body is the text to be displayed as well as HTML markup tags to hint at the format of the text. For example, the <B></B> tag pair displays the enclosed text in a bold typeface. Further description of HTML documents is available in the technical and trade literature; see e.g., Ray Duncan, Power Programming: An HTML Primer, PC Magazine, Jun. 13, 1995, the disclosure of which is hereby incorporated by reference. 2. HTTP Communication HTTP is the foundation of the World Wide Web. This request/response protocol used on top of TCP (Transmission Control Protocol) carries commands from browsers to servers and responses from servers back to browsers. HTTP is not a protocol, not for transferring hypertext per se, but for transmitting information with the efficiency necessary to make hypertext jumps. The data transferred by the protocol can be plain text, hypertext, audio, images, or any Internet-accessible information. HTTP is a transaction-oriented client/server protocol; it treats each transaction independently. A typical implementation creates a new TCP connection between a client and a server for each transaction, then terminates the connection as soon as the transaction completes. Since the protocol does not require this one-to-one relationship between transaction and connection lifetimes, however, the connection can stay open so that more transactions can be made. The transaction-based approach of HTTP is well-suited to its typical application. A normal Web session involves retrieving a sequence of pages and documents. The sequence is, ideally, performed rapidly, and the locations of the various pages and documents may be widely distributed among a number of servers, located across the country or around the globe. In typical HTTP configuration, a client, such as a Web browser, initiates a request (HTTP message) for a resource, for instance, from a Web server where a desired home page is located. The client opens a direct connection that is end-to-end or direct connection between the client and the server. The client then issues an HTTP request. The request consists of a specific command (referred to as a method), a URL, and a message containing request parameters, information about the client, and perhaps additional content information. When the server receives the request, it attempts to perform the requested action and returns an HTTP response. The response includes status information, a success/error code, and a message containing information about the server, information about the response itself, and possible body content. The TCP connection is then closed. Instead of the end-to-end TCP connection between a client and a server, an alternative configuration employs one or more intermediary systems with TCP connections between (logically) adjacent systems. Each intermediary system acts as a relay, so that a request initiated by the client is relayed through the intermediary system(s) to the server, and the response from the server is relayed back to the client. A "proxy," for example, is an intermediary system which acts on behalf of other clients and presents requests from other clients to a server. There are several scenarios that call for the use of a proxy. In one scenario, the proxy acts as an intermediary through a firewall. In this case, the server must authenticate itself to the firewall to set up a connection with the proxy. The proxy accepts responses after they have passed through the firewall. Clients and servers communicate using two types of HTTP messages: request and response. A request message is sent by a client to a server to initiate some action. Exemplary actions include the following.
GET: A request to fetch or retrieve information.
POST: A request to accept the attached entity as a new subordinate to
the identified URL.
PUT: A request to accept the attached entity and store it under the
supplied URL. This may be a new resource with a new URL,
or it may be a replacement of the contents of an existing
resource with an existing URL.
DELETE: Requests that the origin server delete a resource.
The server, in response to a request, returns a response message. A response message may include an entity body containing hypertext-based information. In addition, the response message must specify a status code, which indicates the action taken on the corresponding request. Status codes are organized into the following categories:
INFORMATIONAL: The request has been received and processing
continues. No entity body accompanies this
response.
SUCCESSFUL: The request was successfully received,
understood, and accepted.
REDIRECTION: Further action is required to complete the request.
CLIENT ERROR: Request contains a syntax error or request cannot
be fulfilled.
SERVER ERROR: The server failed to fulfill an apparently valid
request.
Further description of HTTP is available in the technical and trade literature; see e.g., William Stallings, The Backbone of the Web, BYTE, October 1996, the disclosure of which is hereby incorporated by reference. As the explosive growth of the Web continues, and as new features are added to both browsers and servers, a standardized transfer protocol is essential to maintain the Web's growing functions and interoperability. HTTP provides the standardized definition required to meet these needs. B. Approaches to Filtering One approach to addressing the problem of spurious Web content is to adopt a server-based solution. For instance, one approach would be to employ a "proxy server" with the capability to perform text-based parsing (e.g., using PERL or AWK text processing). That approach is problematic, however. In particular, the proxy server (or set of proxy servers) presents a bottleneck through which Web content would have to be funneled. In addition to resource limitations (e.g., limited bandwidth), the approach also raises the issue of content ownership, such as the issue of mirror storage of copyrighted content at a proxy server. A better approach, therefore, is a client-side solution, one which can be implemented at each individual Web client. FIG. 2B is a block diagram illustrating the approach. The figure illustrates a Web client 245a with an Internet connection to one or more Web servers 280. More particularly, the client 245a comprises a Web browser (e.g., Netscape Navigator or Microsoft Internet Explorer) operating on a personal computer (e.g., system 100) or workstation which communicates with the Internet via a communication layer 241, such as Microsoft Winsock (Winsock.dll)--a Windows implementation of Transmission Control Protocol/Internet Protocol (TCP/IP). Interposed (functionally) between the browser 245a and the communication layer 241 is the Filter module 225 of the present invention. In this fashion, the Filter module 225 can trap and process all communications between the browser 245a and the communication layer 241. With the advent of Winsock 2 (Microsoft Corp. of Redmond, Wash.), a third party module can register itself with the Winsock driver and, thereby, trap and process communication in a manner which has the support of the driver. C. Filter Module 1. Internal Architecture FIG. 3 is a block diagram illustrating detailed internal architecture of the Filter module 225. As shown, the Filter module 225 includes an output stream 301, a dispatching loop 311, Filter logic 313, and an input stream 321. For assisting with user configuration of its operation, the Filter module 225 includes a graphical user interface (GUI) administration module 325. The input stream 321 is responsible for getting input; it interfaces directly with the Winsock communication driver. In a corresponding manner, the output stream 301 communicates with the (client) browser; it is responsible for providing output to the browser which is to be ultimately rendered on screen for the user. Accordingly, the output stream 301 represents the data pool right before it is sent to the browser. The Filter logic 313, on the other hand, represents the workhorse or core module for performing the actual filtering. Its functionality is described in further detail below. At a high level, the module 225 operates as follows. The Web browser operates by generating requests for content, both for retrieving an initial Web page as well as for retrieving objects (e.g., bitmaps) intended for display on the page. In operation, the system issues fetch or GET commands, which are communicated to the server via the communication driver. Any such command is, however, trapped by the Filter module 225. The "real" request is at this point processed by the Filter core logic (Filter method) 313. At this point, the system can modify the command, delete the command, synthesize new commands, or pass through unchanged the existing command. 2. Filter Construction The detailed construction of the Filter core logic 313 is as follows. The Filter is implemented as a C language routine having an internal message or dispatcher loop which "switches" on different (HTML) tag types. Based on the particular tag being processed, the loop in turn dispatches the information to a particular handler, for performing the desired processing on that HTML tag type. As the example of this approach, consider, for instance, the processing of an image tag type. Upon encountering an image tag, the system dispatches the tag to the appropriate handler, an image tag handler. At this point, the handler can now proceed to process the information. For instance, the image handler could determine whether the image tag includes a reference to material which is to be filtered (e.g., the image tag references an image stored in a "/ad/" directory). In the event that the handler "kills" the tag, the system employs a "kill" routine for correctly managing the buffer, including incrementing the current buffer position beyond the tag (so that it can locate the next tag). Alternatively, the system can synthesize new tags, or pass through unchanged the existing tags. Certain tag types require more complex processing. Consider, for instance, a href or "hyper reference" tag type, which is employed for establishing a hyperlink. An instruction to kill an href tag is, instead, an instruction to kill the image contained within the href tag. Accordingly, the corresponding handler must include logic not for killing the href tag but, instead, for setting a status flag indicating that the system should cycle through (in the dispatcher loop) the tag and kill the image tag contained within the href tag. 3. Filter Methodology In accordance with the present invention, the Filter comprises a core routine the Filter method--for providing filtering finctionality. In an exemplary embodiment, the Filter method may be constructed as follows (e.g., in the C programming language).
1: int Filter (char *Buffer, char *BaseURL, int BuffSize, BOOL CRCOn, BOOL
*
2: INJavaScript, BOOL * KillNextIMG)
3: {
4: int decrement_val; // How much for the next buffer receive
5: char * Found; // Where was it found
6: char * IMGFound; // Where was it found
7: char * BlinkFound; // Where was it found
8: char * JavaFound;
9: char * ScriptFound;
10: char * HREFFound;
11: char * Full_Tag;
12: char * Left_Less_Than; // Keep up with that left greater
than....
13: char * Right_Greater_Than; // Keep up with the other tag...
14: char * Found_IMG_URL; // found img url...
15: char* THE_END = Buffer + BuffSize; // The real end of the buffer
16: char * getstring;
17: int Length;
18: int count=0;
19: int DoKill; // Are we at the end of the buffer
20: pURL imgURL_struct;
21: pURL Base_URL;
22: DWORD Xor; // DWORD to do the XOR compare
23: SOCKET hControlChannel; // Socket handle for the control channel
24: HFILE hFile; // Handle for gif download save file
25: BOOL Bad_Tag;
26: Chunk imgloc;
27: Chunk javabinloc;
28: int imgsize;
29:
30:
31: imgURL_struct = (pURL) malloc (sizeof (URL));
32: if ( imgURL_struct == NULL)
33: {
34: MessageBox(NULL, "Malloc Failed in Filter(),
35: imgURL_struct.backslash.nExpect a Crash!",
36: "Malloc Failed", MB_OK.vertline.MB_ICONSTOP);
37: }
38: Base_URL = (pURL) malloc (sizeof (URL));
39: if ( Base_URL == NULL)
40: {
41: MessageBox(NULL, "Malloc Failed in Filter(), Base_URL.backslash.nExpect
a Crash!",
42: "Malloc Failed", MB_OK.vertline.MB_ICONSTOP);
43: }
44:
45: // SpawnOnOff = 1; /* use for testing */
46:
47: ////////////////////////////////////////////////////////////////
48: //
49: // In Java Script situation stuff
50: //
51: ////////////////////////////////////////////////////////////////
52:
53: if (*INJavaScript)
54: {
55:
56: ScriptFound = strstri (Buffer, "/script");
57: if (ScriptFound == NULL) // didn't find the end
58: {
59: Right_Greater_Than = (char *)memchr(Buffer, `<`,
(THE_END -
Buffer));
60: if (Right_Greater_Than == NULL)
61: {
62: decrement_val = (THE_END - Buffer);
63: *INJavaScript = FALSE;
64: if (Base_URL)
65: free (Base_URL);
66: if (imgURL_struct)
67: free (imgURL_struct);
68: return decrement_val; //RETURN
69: }
70: Buffer = Right_Greater_Than - 1;
71: #ifdef scriptd
72: vErrorOut (fg_pink, "INJavaScript = False due to no Found /script and a
found >.backslash.n");
73: #endif
74: *INJavaScript = FALSE;
75: }
76: else
77: {
78: #ifdef scriptd
79: vErrorOut (fg_pink, "INJavaScript = False due found
/script.backslash.n");
80: #endif
81: *INJavaScript = FALSE;
82: Buffer = ScriptFound + 7;// just move on past the
/script...
83: }
84: }
85: /////////////////////////////////////////////////
86:
87:
88: while (TRUE)
89: {
90:
91:
92: Xor = 0;
93: imgsize = 0;
94: Found = NULL;
95: IMGFound = NULL;
96: BlinkFound = NULL;
97: JavaFound = NULL;
98: HREFFound = NULL;
99: ScriptFound = NULL;
100: Full_Tag = NULL;
101: Bad_Tag = FALSE;
102: Left_Less_Than = (char *)memchr (Buffer, `<`, (THE_END - Buffer
+ 1));
103: if (Left_Less_Than == NULL)
104: /* Can't find a tag in the text at all -- This means we are
done */
105: {
106: if (Base_URL)
107: free (Base_URL);
108: if (imgURL_struct)
109: free (imgURL_struct);
110:
111: return 0; //RETURN
112: }
113:
114:
115: if (*(Left_Less_Than+1) == `!`
116: && *(Left_Less_Than+2) == `-`
117: && *(Left_Less_Than+3) == `-`
118: {
119: Right_Greater_Than = strstr (Left_Less_Than, "-->");
120: if (Right_Greater_Than == NULL)
121: {
122: Right_Greater_Than
123: = (char *)memchr(Left_Less_Than, `>`, (THE_END -
Left_Less_Than));
124: if (Right_Greater_Than == NULL)
125: {
126: decrement_val = (THE_END - Left_Less_Than);
127: if (decrement_val > 0)
128: {
129: if (Full_Tag)
130: free (Full_Tag);
131: if (Base_URL)
132: free (Base_URL);
133: if (imgURL_struct)
134: free (imgURL_struct);
135: return 0; //RETURN
136: }
137:
138: if (Full_Tag)
139: free (Full_Tag);
140: if (Base_URL)
141: free (Base_URL);
142: if (imgURL_struct)
143: free (imgURL_struct);
144:
145: return decrement val; //RETURN
146: }
147: }
148: Right_Greater_Than +=2;
149: /* Buffer is going to be assigned Right_Greater_Than + 1 */
150: IMGFound = NULL;
151: /* This is to cause a break and a continue to the next buffer */
152: }
153: ////////////////
154: else // Its not a comment
155: {
156: Right_Greater_Than
157: = (char *)memchr(Left_Less_Than, `>`, (THE_END -
Left_Less_Than));
158: if (Right_Greater_Than == NULL)
159: /* if Couldn't find a right side to the current tag
160: We are done but have stuff hanging
161: */
162: {
163: decrement_val = (THE_END - Left_Less_Than);
164: if (decrement_val > 0)
165: {
166: if (Full_Tag)
167: free (Full_Tag);
168: if (Base_URL)
169: free (Base_URL);
170: if (ImgURL_struct)
171: free (imgURL_struct);
172: return 0; //RETURN
173: }
174: if (Full_Tag)
175: free (Full_Tag);
176: if (Base_URL)
177: free (Base_URL);
178: if (imgURL_struct)
179: free (imgURL_struct);
180:
181: return decrement_val; //RETURN
182: }
183: Length = (Right_Greater_Than - Left_Less_Than + 2);
184: Full_Tag = (char *)malloc(Length + 2);
185: if ( Full_Tag == NULL)
186: {
187: MessageBox(NULL, "Malloc Failed in Filter(),
Full_Tag.backslash.nExpect a
Crash!",
188: "Malloc Failed", MB_OK.vertline.MB_ICONSTOP);
189: }
190: lstrcpyn (Full_Tag, Left_Less_Than, (Length));
191: }
192: //
193: // TAG HAS BEEN FOUND!!!!!
194: //
195: /////////////////
196: IMGFound = strstri ((char *)Full_Tag, "img");
197: /* Actually try to find the image.... */
198: if (IMGFound == NULL)
199: {
200: Buffer = Right_Greater_Than +1;
201: *KillNextIMG = FALSE;
202: if (Full_Tag != NULL)
203: {
204: BlinkFound = strstri ( (char *)Full_Tag, "blink");
205: JavaFound = strstri ( (char *) Full_Tag, "applet");
206: ScriptFound = strstri ( (char *) Full_Tag, "script");
207: HREFFound = strstri ( (char *) Full_Tag, "href");
208: // FrameFound = strstri ( (char *) Full_Tag, "frameset");
209: CheckForBase (Full_Tag, BaseURL);
210: }
211: // else
212:
213:
214:
215:
216: // Tag processing
217:
218: ///////////////////////////////////////////////////////////////BLINK
219: if (BlinkFound != NULL && BlinkOnOff)
220: {
221: count = (Right_Greater_Than - Left_Less_Than + 1);
222: memset (Left_Less_Than, 0.times.20, (count));
223: num_blink_killed++;
224: #ifdef debug
225: vErrorOut (fg_blue, "Killed a blink tag.backslash.n");
226: #endif
227: free(Full_Tag);
228: }
229: /////////////////////////////////////////////////////////JAVA
230: if (JavaFound != NULL && (*(char *) (JavaFound - 1) != `/`) &&
AdsOnOff
231: && !isalpha(*(JavaFound + 6)) && Full_Tag) //found a java app
232: {
233: #ifdef debug
The description which follows will focus on the use of the Filter for deleting or "killing" unwanted tags, such as image tags (and image content contained therein). Those skilled in the art will appreciate that once tags have been fully qualified, in accordance with the methodology described herein, identified tags (commands) can be modified (e.g., to point to a new URL, or to contain new content), replaced with new tags (e.g., replacing "blink" tags with "bold" tags synthesized on-the-fly), or simply passed through unchanged. The Filter method or routine is invoked by the output stream, an internal client responding to requests of the browser. The method is invoked with six parameters. The first parameter, Buffer, is a pointer to the memory buffer containing the data of interest--that is, the raw buffer received from the network. The second parameter, BaseURL, comprises a (pointer to) character string storing the base URL (address) of the page for the request. If the page were Netscape's home page, for instance, the base URL data member would point to www.netscape.com as the base URL string. The third parameter, BuffSize, simply stores the size of the buffer; it is used for housekeeping purposes. The fourth parameter, CRCOn, is a Boolean data member indicating whether Cyclic Redundancy Checking (CRC)--a well-known checksum technique--is activated; a simple checksum can be constructed, for instance, by simply adding all the units together which comprise the content of interest, such as adding together all of the byte values of a particular image in an HTML document. The CRC checksum is used to control certain conditional branches of the Filter routine. The fifth parameter, INJavaScript, is a (pointer to) Boolean which can be modified within the Filter method. INJavaScript addresses the following problem. When the current buffer is entirely a tag of type <script> (which can be 3-5K in length), the tag might require multiple passes (e.g., three passes) through the dispatcher loop to process. If INJavaScript is true, the dispatcher loop should attempt to find the end of the script instead of initially looking for a next tag. When parsing HTML, the Filter loop (described below) may iterate before the end of the current HTML tag has been found. Therefore, both the INJavaScript and KillNextIMG Boolean parameters serve as housekeeping flags facilitating this process. After declaring local variables (lines 4-28), the method allocates memory at line 31 for storing an imgURL_struct. This data structure serves to characterize a URL. In an exemplary embodiment, the data structure may be created as follows.
typedef struct {
char protocol[15];
char server[SERVER_STRUCT_SIZE];
int port;
char URI[URI_STRUCT_SIZE];
} URL, *pURL;
As shown, the structure stores four data members. The first data member, protocol, is a character string (array) storing a particular protocol; it is not used within the Filter method. The second data member, server, stores a text string identifying the server (i.e., Web server). The third data member, port, stores a port number (i.e., IP port address). The fourth data member, URI, stores a character string indicating the actual URL address. If the URL data structure cannot be allocated (tested at line 32), the method displays an error message (lines 34-36). In a similar manner, the method allocates a URL structure for storing the base URL. This information is helpful in the event that relative URL addressing is employed. Again, if the allocation fails, the method displays an error message. Beginning at line 53, the method begins processing for JavaScript. Specifically, the method first tests whether the JavaScript Boolean flag is set to "true." This flag will be set, for instance, when executing a lengthy JavaScript. Recall that the network delivers a sequence of blocks. As a result, a lengthy JavaScript may span multiple blocks or packets. Therefore, the flag tested at line 53 determines whether the dispatcher loop is still examining a JavaScript (as a result of having begun inspecting the current JavaScript segment in a prior buffer block.) Specific processing in the event that the method is still executing in JavaScript is as follows. At line 56, the method searches for the string "/script" in the buffer. If this is not found (the "if" statement at line 57 evaluates to "false"), the method calculates a decrement value at line 62; this value is the difference between the current pointer position in the buffer (cursor) and the end of the buffer. At line 63, the method resets the INJavaScript Boolean to "false." At lines 64-67, the method frees the two previously-allocated URL structures. Then, at line 68, the method returns the decrement value. By returning a "decrement," the method is indicating that it desires to keep the data around (so that it can complete processing once all relevant JavaScript blocks have been received). If, on the other hand, the end of the script is found (true at line 72), the method proceeds to execute the "else" statement at lines 76--83. At this point, the method sets the INJavaScript Boolean to "false," at line 81. Then, at line 82, the method adjusts the current buffer position (cursor) to move past the JavaScript (i.e., past where the "/script" tag is found). In the event that the "/script" tag is not found, the method searches for the next tag, by searching for a ">" symbol; this is performed at line 102. If the next tag cannot be found (the result is tested at line 103), the method returns, at line 111. If a next tag is found, however, the system will continue parsing from that point. Specifically, the method changes its position in the buffer and returns a decrement value accordingly. Another possibility, however, is that no end tag is found. In such a case, the system requires more information. In the event that the "/script" tag is found, the system can move past the JavaScript and continue parsing. Having dealt with JavaScript issues, the Filter method can now proceed to parse HTML. This is done by entering the "while" loop, at line 88. The method proceeds as follows. At line 102, the method searches for a"<" symbol, for locating the beginning of an HTML tag. If it cannot find a tag at all, the method has processed the buffer and may return (line 111). At this point, the method must also handle any comments which might be encountered within the HTML. If a comment is found, the method then proceeds to find the end of the comment. If the end is not found, the method is again faced with a decrement scenario (which can be processed in a manner similar to that described above). Otherwise, the method proceeds to the end of the comment and continues parsing from that point onward. In the event that this is not a comment, the method continues execution at the "else" statement starting at line 154. The method at this point now begins looking for the right side of the tag (i.e., ">" symbol), having found a valid left side. If the right side is not found (i.e., a hanging tag scenario), the method proceeds in a manner similar to that described above for obtaining more data. If a full tag has been found, however, the method allocates memory for the tag's data, including the tag delimiters. Now the method has a fully qualified tag and is ready to apply filtering methodology in accordance with the present invention. At this point, the method searches through the full tag, first trying to locate any embedded image tags. This is done by searching for the substring "IMG" using a case insensitive substring match. If an image tag is not found (false at line 198), the method at this point tests whether the tag is one of the following types: blink, applet, script, href, and frameset. Respective Boolean flags are set for each one encountered; these flags are employed later in the method. Now, the method will undertake specific processing for individual tags. For instance, the blink tag is processed at lines 219-228. If a blink tag has been found and the user has configured the Filter to turn off blinking text, the method will at this point kill the blink tag. Thus at this point, the system has already begun tag fixup based on user configuration of the Filter. Actual handling of a Java tag begins after line 229. The data structure javabinloc stores information indicating the location for the Java binary. At this point, the method looks for the end of the Java applet, by searching for the string "applet" at line 237. The method includes at this point error handling steps in the event that the end of the applet cannot be found. Ordinarily, however, the end is found and the method may proceed to kill the Java applet. This is done by passing the Java binary to a specialized handler, Killjava, at line 286. The subroutine call returns "true" if the Java applet is one to kill. The lookup can be performed, for instance, by comparing the applet name (text string) against a database of Java applets to kill. In the event that the applet is one to kill ("true" at line 286), the method overwrites the corresponding memory location of the applet with white space (e.g., space character), at line 293. For user interface purposes, the method increments a counter ("number of ads killed"), at line 295, which is available for user feedback of filtering activity. Processing of JavaScript begins at line 307. At the outset, the method verifies that it is in fact dealing with JavaScript at this point. This can be accomplished, for instance, by testing that the first six characters following the script tag are not alphabetic, as shown at line 311. Additionally, at line 314, the method finds the end of the script, by locating the "/script" tag. If the end of the script is not found, tested at line 316, the method asks for more data, by returning a decrement value (in a manner similar to that previously described). Note also at this point that the INJavaScript Boolean is set to "true," at line 322. Eventually (if no error occurs), the end of the script will be found. In such a case, the method executes the steps beginning at line 357, for moving past the script (i.e., moving beyond the ">" symbol) and continuing filter processing from that point on. Processing for hypertext reference or href tags begins at line 363. An href tag defines a hypertext link or "hyperlink" to another object. For instance, the tag: <A HREF="http://adsl.zdnet.com/adverts/SampleAd.html "></A> defines a link to a document, SampleAd.html. At line 365, the method confirms that an href tag has been found (by examining the previously-set Boolean) and confirms that the tag is fully qualified. An href or hypertext reference tag is often associated with an image. Therefore, the method must process the href tag to kill any image tags contained within it. By examining the href itself, the system can often discern whether images contained within it are ads. Specifically, the system can discern a hypertext jump destination which is invoked from a particular image. If the destination is an ad site (e.g., such as the adsl destination shown above), the system can block the image(s) associated with the href tag for the hypertext reference to that site. The actual subroutine call to lookup the image as one referencing an ad site occurs at line 370. In the event that the image is identified as an ad, the method sets the KillNextIMG flag to "true" at line 377 and increments the "number (of ads) killed" counter at line 380. Since the kill next image flag has been set, the next iteration through the loop will kill any image which it follows within this href tag. At line 399, the method filters any spawning activity--that is, when a URL site "spawns" a new browser window. Since this action may have undesirable user interface consequences, the user can turn spawning off. In such a case, the Filter method kills spawning when found, as shown at lines 401-414. Beginning at line 429, the method begins processing of an image tag. If the "kill next image" flag has been set to "true" (such as described above), the method proceeds to perform setup for killing the image, at lines 432-437. Before actually killing an image, however, the method verifies that the image tag does in fact have an image. This is shown at line 438, where the method searches for the image source (src). If an actual image is not located within the image tag, then the tag is a "bad tag" and, thus, requires error handling. Normally, however, an image tag will not be bad (tested at line 489) and the method can proceed to kill the image. To kill or filter the image, the method allocates a memory buffer at line 458, for storing an image URL data structure. At line 475, the full tag is copied into the image URL structure. Then, at line 477, the method parses the image URL, and then parses the base URL at line 478. This setup allows the system to establish a network connection with the site, if needed. At lines 483-484, the method creates a fully qualified image URL (structure), converting from relative addressing if needed. How the image is to be killed is determined at line 491, by referencing a per session cache storing results on how to process images. If the image is on the user's personal kill list (tested at line 492), the method proceeds to kill the image. The specific call for killing the image occurs at line 501. If the Filter is configured to kill ads or kill images larger than a preselected image size (tested at line 506), the method proceeds as follows. The method establishes a network connection. At this point, the server is queried for determining the image size. If the image size exceeds a maximum image size desired by the user, the image will be killed. For ads, if the browser is at a site which requires further inspection, the method undertakes a CRC check of the image. Specifically at this point, the system connects to the Web site, grabs the image file from the site, and then proceeds to perform a CRC calculation (line 584) on the image. The calculated CRC value is then compared against a list of image signatures or IDs. In other words at this point, the method creates a dummy signature for the image, for identifying the images contents. If the computed ID is determined to represent a "bad" image--that is, an unwanted image which is targeted for filtering (e.g., ad image)--the method proceeds to kill the image at lines 620-631. Additionally, the image is killed at lines 611-617 in the event that it exceeds a maximum image size. If, on the other hand, the signature value for the image does not indicate a "bad" image and the image does not exceed a maximum size, the image is passed through without filtering. After processing the image tag (with or without filtering of the image), the method may loop on the next tag. Note, however, that if a corrupt or "bad" tag had been encountered during the loop (i.e., the Bad_Tag flag is set to "true"), the method sets the current buffer position beyond the bad tag before looping, as shown at line 660-663. At this point, the method will loop for another iteration of the "while" loop for further processing of tags. While the invention is described in some detail with specific reference to a single-preferred embodiment and certain alternatives, there is no intent to limit the invention to that particular embodiment or those specific alternatives. Thus, the true scope of the present invention is not limited to any one of the foregoing exemplary embodiments but is instead defined by the appended claims.
|
Same subclass Same class Consider this |
||||||||||
