Abstract: Method for aggregating syndicated Web content, comprising the steps of: - Retrieving (100) updated content from predetermined Web feeds; - Comparing (120) said updated content with stored content previously retrieved; - Storing (160) the updated content if it is considered different from the stored content; - Deleting (170) the updated content if it is considered identical to the stored content. FIGURE 2
Method for aggregating Web feed minimizing redundancies
FIELD OF THE INVENTION
The invention relates to the aggregation of Web feed.
BACKGROUND OF THE INVENTION
Since its beginnings in the early 1990s, the World Wide Web has become very popular and it now comprises several billions of Web pages including various contents such as texts, images, videos, and links (also referred to as hyperlinks) to other Web pages. The World Wide Web is used daily by billions of Web surfers.
Getting online nowadays is quite simple and requires neither particular skills nor particular proceedings before a national or international office, which in fact does not exist. Surprisingly, no attempt was made to classify the Web in order to group Web sites within families (based upon predetermined criteria), although anyone would benefit from such a classification. Therefore, it is becoming more and more difficult for the Web surfers to retrieve substantive and reliable updated information. Web browsers are of help, of course, but with the increasing number of Web pages, numerous semantic search requests result in raw content which is mostly unclassified, often redundant, inexplicit, and, in the end, simply unworkable.
In the early 2000s, a solution was provided though, called syndication, to help surfers get the right information at the proper moment. In syndication, a section of a Web site is made available for other Web sites to use. More specifically, in Web syndication, content (commonly referred to as Web feed) is put on a Web site in a particular format - often XML-based (XML stands for extensible Markp Language), such as RSS (Real Simple Syndication) or Atom - and associated with a feed link which another user (client) can subscribe to in order to retrieve the corresponding content by means of a particular application called a feed aggregator, also referred to as a feed reader or a news reader, running locally on the client's terminal or server.
Having subscribed to a feed, a feed aggregator may be configured to check for and retrieve updated content at predetermined intervals (which may be user-defined). Modern Web browsers often include built-in aggregators, such as iGoogle™ and My Yahoo™. US patent applications No. US 2008/0034058 (Assigned to Marchex, Inc.) and US 2008/0046543 (Assigned to RealNetworks) both illustrate methods for obtaining Web feeds.
Although feed aggregators are a powerful resource for retrieving updated information from the World Wide Web and making it available to an end user via a user-friendly graphical interface (GUI), the volume of articles can sometimes be overwhelming, especially when the user has subscribed to many Web feeds. To address this problem, some feed aggregators include functionalities to allow users to tag the feeds with keywords in order to sort and filter the available articles into easily
navigable categories. However this solution is time-consuming, since the user has to do a pre-classification of the feeds from which he wishes to obtain updated content. In addition, tagging Web feeds is simply useless when the content to be retrieved changes subject with each update (such as in newspapers Websites).
SUMMARY OF THE INVENTION
Clearly there is a need for a solution allowing Web surfers to get, in an automated way, information retrieved from Web feeds in a fully workable manner.
It is an object of the invention to provide such a solution. Accordingly, according to one aspect, the invention provides a method for aggregating syndicated Web content, comprising the steps of:
- Retrieving updated content from predetermined Web feeds;
- Comparing said updated content with stored content previously retrieved (and stored e.g. as an entry within a feeds historic database);
- Storing the updated content if it is considered different from the stored content;
- Deleting the updated content if it is considered identical to the stored content.
Further steps may be provided, i.e.:
- a step of adding the updated content to a stored content if it is considered complementary thereto;
- a step of computing a similarity index Sjk to denote a degree of similarity between the updated content and the stored content;
- a step of comparing said similarity index Sjk to one or more thresholds. More specifically, the similarity index Sjk is compared to two thresholds Smin and S max, whereby
- If the similarity index Sjk is lower than Smin, the updated content is considered different from the stored content;
- If the similarity index Sjk is greater than Smax, the updated content is considered identical to the stored content and may therefore be added thereto;
- If the similarity index Sjk is comprised between Smin and Smax, the updated content is considered complementary to the stored content.
According to another aspect, the invention provides a feed aggregator comprising:
- A feed reader configured for checking for and retrieving updated content from predetermined Web feeds;
- A filtering module configured for managing comparison of said updated content to stored content previously retrieved;
- A feeds historic database for storing content.
The feed aggregator may also comprise one or more entry analyzers linked to the filtering module, configured for comparing said updated content to stored content. More specifically, the one or more entry analyzer may be configured for:
- computing a similarity index Sjk denoting a degree of similarity between the updated content and the stored content, and
- comparing said similarity index to one or more thresholds to determine whether the updated content is to be removed or added to the feeds historic database.
The above and other objects and advantages of the invention will become apparent from the detailed description of preferred embodiments, considered in conjunction with the accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a diagrammatic view showing the structure of a feed aggregator according to the invention.
FIG. 2 is flow chart of a method for aggregating syndicated Web content according to the invention.
DESCRIPTION OF PREFERRED EMBODIMENTS
Turning now to the drawings, there is shown on FIG.1 a feed aggregator 1, implemented as a computer program on a processing unit of a computer device such as a personal computer (PC), a server, a communicating personal digital assistant (PDA), a Smartphone, etc.
The feed aggregator 1 comprises a feed reader module 2, configured to check and retrieve updated syndicated Web content from Web feeds 3 provided on distant Web sites. Although the feed aggregator 1 is preferably configured to deal with RSS feeds, any other format may be used (such as Atom). Three Web feeds 3 are drawn on FIG.1, for illustrative purposes only, for the feed aggregator 1 may be linked to Web feeds as numerous as might be configured by the user.
The feed aggregator 1 also comprises a filtering module 4, linked to the feed reader 2 and to which the feed reader 2 transfers the retrieved updated content to be further processed so as to eliminate redundancies, as will be disclosed hereinafter.
The feed aggregator 1 further comprises a feeds historic database 5, wherein feed content previously retrieved is stored as entries.
The feed aggregator 1 further comprises at least one entry analyzer 6, linked to the filtering module 4, configured to compare updated content with content previously retrieved and stored as entries within the feeds historic database 5, in order to determine whether the updated content should be:
removed from the feed aggregator 1,
stored within the database 5 as a new entry,
or added to an existing entry within the database 5.
The feed aggregator 1 also comprises a historic access module 7, interposed between the filtering module 4 and the feeds historic database 5, and configured to access and manage the feeds historic database 5 in order to store the updated content which has been determined by the filtering module 4 as suitable for being stored as a new entry or added to an existing entry.
Precise operation of the feed aggregator 1 will now be described.
As often as user-configured, the feed reader 2 checks for and retrieves updated content from predetermined Web feeds 3 subscribed to by the user (100). In the following description, it is assumed that the feed reader 2 is configured with P subscriptions, P being an integer number greater than or equal to 1 (P≥1). Fj where j is an integer number smaller than or equal to P (l≤j≤P), denotes the updated content retrieved from the jth Web feed 3. F, may comprise complete Web pages or part therefrom: text, images, videos, hyperlinks, etc. It is further assumed that the feeds historic database 5 comprises a number N of entries, N being an integer number greater than or equal to 1 (N≥ 1). Ek, where k is an integer number smaller than or equal to N (1≤k≤N), denotes the k th entry stored in the feeds historic database 5.
Updated content F. is transferred by the feed reader 2 to the filtering module 4 where it is temporarily stored. The filtering module 4 retrieves, through the historic access module 7, the list of stored entries Ek to which the updated content F, shall be compared. More precisely, the filtering module 4 iterates over the list of stored entries Ek, to which the updated content Fj is to be compared in order to be classified among the following categories: to be removed (deleted); to be added to a stored entry Ek; to be stored as a new entry EN+l.
More precisely, with k initially set equal to 1 (110), the updated content Fj is compared (120) with the stored entry Ek by at least one entry
analyzer 6. Comparison may be achieved through various methods:
- Basic string comparison of titles;
- Syntax analysis, to determine common keywords between the compared contents;
- Semantic-based analysis, to determine common ontological concepts used in the compared contents. In order to limit the field of comparison, the ontology domains may be restricted by predetermined tags associated with the entries Ek.
These comparison methods may be used alone, or combined as different steps of a whole comparison process. A combined use of the comparison methods may be hierarchic. In other words, the syntax analysis step may be launched only if the basic string comparison step of the titles has lead to an assertion that the titles are identical, in order to determine whether contents having same titles may although be different. In turn, the semantic-based analysis step may be launched only if the syntax analysis step has determined that the syntax is similar, in order to further increase the degree of precision of the comparison. Each step of the comparison may be run on a different entry analyzer 6.
Having achieved comparison between updated content Fj and stored entry Ek, the entry analyzer 6 computes (130) a similarity index Sjk, which denotes a degree of similarity between Fj and Ek. The similarity index Sjk is provided to the filtering module 4, which, firstly, determines, based upon the value of Sjk, whether Fj and Ek are to be considered identical or different, and, secondly, consequently takes the corresponding decision regarding Fr Basically, Sjk =0% means that Fj and Ek are to be considered different, whereas Sjk=100% means they are to be considered identical. As Sjk may be different both from 0 and 100, meaning that, although Fj and Ek may not be considered identical, they may not be considered different either. In order to make an appropriate decision regarding the updated content Fj, the filtering module 4 may be implemented with at least one threshold S1such that:
- If Sjk
| # | Name | Date |
|---|---|---|
| 1 | 6509-chenp-2010 power of attorney12-10-2010.pdf | 2010-10-12 |
| 1 | 6509-CHENP-2010-Correspondence to notify the Controller [10-07-2020(online)].pdf | 2020-07-10 |
| 2 | 6509-chenp-2010 pct 12-10-2010.pdf | 2010-10-12 |
| 2 | 6509-CHENP-2010-FORM-26 [10-07-2020(online)].pdf | 2020-07-10 |
| 3 | 6509-CHENP-2010-US(14)-HearingNotice-(HearingDate-27-07-2020).pdf | 2020-07-01 |
| 3 | 6509-chenp-2010 others 12-10-2010.pdf | 2010-10-12 |
| 4 | Correspondence by Agent_Form 1_13-05-2019.pdf | 2019-05-13 |
| 4 | 6509-chenp-2010 form-5 12-10-2010.pdf | 2010-10-12 |
| 5 | 6509-CHENP-2010-Proof of Right (MANDATORY) [07-05-2019(online)].pdf | 2019-05-07 |
| 5 | 6509-chenp-2010 form-3 12-10-2010.pdf | 2010-10-12 |
| 6 | 6509-CHENP-2010-FORM 3 [18-04-2019(online)].pdf | 2019-04-18 |
| 6 | 6509-chenp-2010 form-2 12-10-2010.pdf | 2010-10-12 |
| 7 | 6509-CHENP-2010-FORM 13 [12-04-2019(online)].pdf | 2019-04-12 |
| 7 | 6509-chenp-2010 form-1 12-10-2010.pdf | 2010-10-12 |
| 8 | 6509-CHENP-2010-RELEVANT DOCUMENTS [12-04-2019(online)].pdf | 2019-04-12 |
| 8 | 6509-chenp-2010 drawings 12-10-2010.pdf | 2010-10-12 |
| 9 | 6509-chenp-2010 description (complete) 12-10-2010.pdf | 2010-10-12 |
| 9 | Correspondence by Agent_Power of Attorney_16-11-2018.pdf | 2018-11-16 |
| 10 | 6509-chenp-2010 correspondence others 12-10-2010.pdf | 2010-10-12 |
| 10 | 6509-CHENP-2010-ABSTRACT [09-11-2018(online)].pdf | 2018-11-09 |
| 11 | 6509-chenp-2010 claims 12-10-2010.pdf | 2010-10-12 |
| 11 | 6509-CHENP-2010-CLAIMS [09-11-2018(online)].pdf | 2018-11-09 |
| 12 | 6509-chenp-2010 abstract 12-10-2010.pdf | 2010-10-12 |
| 12 | 6509-CHENP-2010-COMPLETE SPECIFICATION [09-11-2018(online)].pdf | 2018-11-09 |
| 13 | 6509-CHENP-2010 CORRESPONDENCE OTHERS 30-12-2010.pdf | 2010-12-30 |
| 13 | 6509-CHENP-2010-CORRESPONDENCE [09-11-2018(online)].pdf | 2018-11-09 |
| 14 | 6509-chenp-2010 form-3 14-02-2011.pdf | 2011-02-14 |
| 14 | 6509-CHENP-2010-DRAWING [09-11-2018(online)].pdf | 2018-11-09 |
| 15 | 6509-chenp-2010 correspondence others 14-02-2011.pdf | 2011-02-14 |
| 15 | 6509-CHENP-2010-FER_SER_REPLY [09-11-2018(online)].pdf | 2018-11-09 |
| 16 | 6509-CHENP-2010 CORRESPONDENCE OTHERS 07-03-2011.pdf | 2011-03-07 |
| 16 | 6509-CHENP-2010-FORM-26 [09-11-2018(online)].pdf | 2018-11-09 |
| 17 | abstract6509-chenp-2010.jpg | 2011-09-05 |
| 17 | 6509-CHENP-2010-OTHERS [09-11-2018(online)].pdf | 2018-11-09 |
| 18 | 6509-CHENP-2010 CORRESPONDENCE OTHERS 12-04-2012.pdf | 2012-04-12 |
| 18 | 6509-CHENP-2010-PETITION UNDER RULE 137 [09-11-2018(online)].pdf | 2018-11-09 |
| 19 | 6509-CHENP-2010 FORM-18 12-04-2012.pdf | 2012-04-12 |
| 19 | Correspondence by Agent_Power of Attorney and Assignment_27-08-2018.pdf | 2018-08-27 |
| 20 | 6509-CHENP-2010 FORM-3 17-07-2013.pdf | 2013-07-17 |
| 21 | 6509-CHENP-2010 CORRESPONDENCE OTHERS 17-07-2013.pdf | 2013-07-17 |
| 22 | 6509-CHENP-2010 FORM-3 05-03-2014.pdf | 2014-03-05 |
| 23 | 6509-CHENP-2010 CORRESPONDENCE OTHERS 05-03-2014.pdf | 2014-03-05 |
| 24 | 6509-CHENP-2010 FORM-3 23-10-2014.pdf | 2014-10-23 |
| 25 | 6509-CHENP-2010 CORRESPONDENCE OTHERS 23-10-2014.pdf | 2014-10-23 |
| 26 | 6509-CHENP-2010 FORM-3 26-06-2015.pdf | 2015-06-26 |
| 27 | 6509-CHENP-2010 CORRESPONDENCE OTHERS 26-06-2015.pdf | 2015-06-26 |
| 28 | Form 3 [26-08-2016(online)].pdf | 2016-08-26 |
| 29 | 6509-CHENP-2010-FORM 3 [12-04-2018(online)].pdf | 2018-04-12 |
| 30 | 6509-CHENP-2010-FER.pdf | 2018-05-11 |
| 31 | 6509-CHENP-2010-PA [23-08-2018(online)].pdf | 2018-08-23 |
| 32 | 6509-CHENP-2010-Changing Name-Nationality-Address For Service [23-08-2018(online)].pdf | 2018-08-23 |
| 33 | 6509-CHENP-2010-ASSIGNMENT DOCUMENTS [23-08-2018(online)].pdf | 2018-08-23 |
| 34 | 6509-CHENP-2010-8(i)-Substitution-Change Of Applicant - Form 6 [23-08-2018(online)].pdf | 2018-08-23 |
| 35 | Correspondence by Agent_Power of Attorney and Assignment_27-08-2018.pdf | 2018-08-27 |
| 36 | 6509-CHENP-2010-PETITION UNDER RULE 137 [09-11-2018(online)].pdf | 2018-11-09 |
| 37 | 6509-CHENP-2010-OTHERS [09-11-2018(online)].pdf | 2018-11-09 |
| 38 | 6509-CHENP-2010-FORM-26 [09-11-2018(online)].pdf | 2018-11-09 |
| 39 | 6509-CHENP-2010-FER_SER_REPLY [09-11-2018(online)].pdf | 2018-11-09 |
| 40 | 6509-CHENP-2010-DRAWING [09-11-2018(online)].pdf | 2018-11-09 |
| 41 | 6509-CHENP-2010-CORRESPONDENCE [09-11-2018(online)].pdf | 2018-11-09 |
| 42 | 6509-CHENP-2010-COMPLETE SPECIFICATION [09-11-2018(online)].pdf | 2018-11-09 |
| 43 | 6509-CHENP-2010-CLAIMS [09-11-2018(online)].pdf | 2018-11-09 |
| 44 | 6509-CHENP-2010-ABSTRACT [09-11-2018(online)].pdf | 2018-11-09 |
| 45 | Correspondence by Agent_Power of Attorney_16-11-2018.pdf | 2018-11-16 |
| 46 | 6509-CHENP-2010-RELEVANT DOCUMENTS [12-04-2019(online)].pdf | 2019-04-12 |
| 47 | 6509-CHENP-2010-FORM 13 [12-04-2019(online)].pdf | 2019-04-12 |
| 48 | 6509-CHENP-2010-FORM 3 [18-04-2019(online)].pdf | 2019-04-18 |
| 49 | 6509-CHENP-2010-Proof of Right (MANDATORY) [07-05-2019(online)].pdf | 2019-05-07 |
| 50 | Correspondence by Agent_Form 1_13-05-2019.pdf | 2019-05-13 |
| 51 | 6509-CHENP-2010-US(14)-HearingNotice-(HearingDate-27-07-2020).pdf | 2020-07-01 |
| 52 | 6509-CHENP-2010-FORM-26 [10-07-2020(online)].pdf | 2020-07-10 |
| 53 | 6509-CHENP-2010-Correspondence to notify the Controller [10-07-2020(online)].pdf | 2020-07-10 |
| 1 | Search_23-03-2018.pdf |