Abstract: A method and system for enhancing the relevance and usefulness of information searches, such as web searches, by introducing individual and shared user’s judgment; first, to define the universe of the search, automatically internalizing the content of that universe (via a copyright-compliant system) in an automatically updated repository that can integrate other (internally generated or imported) content and enable sharing according to user preferences; and, secondly, to organize the internalized content through tagging, book marking and filtering.
FORM 2
THE PATENTS ACT. 1970
(39 of 1970)
&
THE PATENTS RULES, 2003
PROVISIONAL SPECIFICATION
[See section 10, Rule 13]
METHOD AND SYSTEM FOR ENHANCING THE RELEVANCE AND USABILITY OF SEARCH RESULTS, SUCH AS THOSE OF WEB SEARCHES, THROUGH THE APPLICATION OF USER JUDGMENT.
THE INFORMATION COMPANY PVT LTD, A COMPANY INCORPORATED UNDER THE COMPANIES ACT, 1956, WHOSE ADDRESS IS 606, CHANDERMUKHI TOWER A, SECTOR 11, BELAPUR, NAVI MUMBAI 400 614, MAHARASHTRA, INDIA.
THE FOLLOWING SPECIFICATION
DESCRIBES THE NATURE OF THIS INVENTION.
1
DESCRIPTION OF THE BACKGROUND ART:
An unprecedented volume of business information is available today on the
Internet, and the volume is growing every day. Web search engines have made it possible for users to search through very, very large volumes of information, and this has opened up fantastic opportunities for people seeking information from known and unknown sources across the world. However, web search engines have their limitations.
Web search engines offer the advantage that the wider they search the greater the chance that they will throw up information on a website you did not know existed, or had forgotten about. The drawback is that the wider they search, the greater is the proportion of irrelevant links that are thrown up by the search results.
For certain purposes - for example, when you are looking for something and you don't know where to look - such wide-ranging searches are useful. Where you know broadly where to look, such wide-ranging search becomes overkill, causing people to waste time wading through a mix of some relevant and mostly irrelevant web content.
Search engines also do not provide the means to aggregate documents in one place and integrate them with repositories of documents created internally or imported previously through search engines and other means. This has added to the serious problem of information overload, and has made it harder for enterprise users to manage information, share it with others and add value to it.
There is a heavy cost to all this. Research has shown that companies are losing millions of dollars every week or month or year (depending on their size) as a result of their employees wasting hours of time searching for business information on the Internet, half the time not finding it and not being able to locate content previously downloaded from the Internet.
Despite the vast amount of readily available information on the 'free' Internet, employees are spending an inordinate and unproductive amount of time searching the Internet for answers to everyday business challenges; time that could be better spent making smarter, faster business decisions.
2
In its 2004 report on taxonomy and enterprise search issues, "Information Intelligence: Content Classification and the Enterprise Taxonomy Practice", Delphi Research addresses the question of the time professionals spend in computer-based search, and how they feel about it. According to a Delphi Group summary of this report, 'The results of a new survey of over 300 companies shows that a surprising number of people spend at least the equivalent of a full work day per week trying to find electronic information."
"For example, 30% reported spending more than 8 hours per week in search activities, or more than a full day per week. Over 40% reported spending 7 or more hours. Another 30% reported spending between 4 and 8 hours, or over half a day. These findings indicate once again that the delivered search experience for most professionals is still a long way from the visions of sub-second relevance and enhanced productivity, which often galvanize new search technology investments.
"This finding appears to drive respondents' level of satisfaction with their search
experience as expressed in the survey. Over 60% say they are dissatisfied or
very dissatisfied with their search experience."
(http://www.delphiweb.com/knowledgebase/newsflash_guest.htm?nid=953)
Matters have got worse since 2004. According to the Outsell Information Industry Outlook 2006, the time users spend searching for (but not necessarily finding) business information on the internet has risen by three hours per week over the past four years; that's an aggregate productivity drain on U.S. employees of more than 5.4 billion hours wasted in 2005.
This is what the Outsell Information Industry Outlook 2006 says: "Over the past four years, the time users spend on information tasks has risen by three hours per week, from 8 to 11 hours per average workweek. What's more, the proportion of that time spent looking for information rather than analyzing and applying it has flipflopped: users now spend more time looking than applying, whereas in 2001 it was the opposite. This is an aggregate productivity drain on U.S. knowledge workers - 5.4 billion hours - that the information industry has yet to address with more targeted and efficient products, and more content integrated into work tasks."
3
Search engines are free, but employee time is not. According to the Society of Competitive Intelligence, the average senior analyst salary is about $70,000 per year. If this analyst spends 11 hours per week searching for information, that's an investment of roughly $500 per week, $2,000 per month, or $24,000 per year, not including overhead and lost opportunity costs.
If we go by the Outsell calculations, we are talking of an avoidable salary cost of over $150 billion in the US at an average knowledge worker salary level of $4,600 per month. The cost globally would be at least three times that.
Here is what Bill Gates, chairman of Microsoft, had to say about this problem (at a Microsoft meeting on 17 May 2006): "The problem, really, is twofold. The first is information overload. Faced with the endless deluge of data that is generated every second of every day, how can we hope to keep up? And in the struggle to keep up, how can we stay focused on the tasks that are most important and deliver the greatest value?
"The other problem is something I call information under load. We're flooded with information, but that doesn't mean we have tools that let us use the information effectively.
"Companies pay a high price for information overload and under load. Estimates are that information workers spend as much as 30 percent of their time searching for information, at a cost of $18,000 each year per employee in lost productivity. Meanwhile, the University of California, Berkeley predicts that the volume of digital data we store will nearly double in the next two years."
The massive proportion of irrelevance thrown up by massively wide web searches has caused some developers to use concepts such as clustering and progressively narrowing the search within a given set of search results. While these do provide a means to reduce the levels of irrelevance in the search results, they deal with only a small part of the problem. Other methods, such as 'federated searches' (which use more than one search engine at the same time to provide combined results from such search engines), actually compound the problem rather than solve it.
4
As of now, business (or academic or government) users do not have an option but to continue to depend heavily on the web search engines for their information needs. Organisations are stuck between a rock and a hard place; in today's intensely competitive world, they have to be updated on the latest information, and have no choice but to have their employees scour the Internet for information.
An inordinate amount of time is wasted by otherwise busy users either on manual housekeeping of the content (if they have worked out some sort of system for doing this) or (in its absence) on revisiting the World Wide Web repeatedly for the same content because they are unable to figure out where they had saved it the first time.
The problem is that business, academic or government users do not have a method that ensures systematic and automatic tracking of all the websites they are interested in. The result is that the effectiveness and efficiency of their information monitoring efforts are constantly compromised by their being tied up in meetings or other work, or because of travel or vacation.
There have been attempts in the past to address the problem; but they have not solved the problem. For example, enterprise searches allow some level of integration, but when it comes to the web, they function just as regular web search engines do; they do not download content automatically; nor do they offer the ability to organise and share content.
'Web crawlers', some of which do enable downloads, do not refine the organisation and management of the downloaded content, let alone integrating it with content created internally or imported through other means.
Search programs that use clustering techniques also do not allow users to use their expertise and judgment to refine searches and to manage content. Knowledge management systems that allow some level of integration, communication and management of content for the user to apply his judgment once content has been found, don't allow the application of user judgment at the search level.
5
Despite the availability of a variety of solutions, the problem of information overload has become worse, as is reflected in surveys and reports by research firms. Given the serious levels of information overload suffered by business, academic and government users, there is need for a system that will help organizations reduce this dependence on web search engines by providing an alternative to massively wide-ranging searches by allowing users to select the sources they think are important to search through routinely, then automating the search and download of newly created content at these sources, and making its management much more individual user-friendly as well as amenable to central control.
BRIEF DESCRIPTION OF DRAWINGS:
Figure 1 is an overview of the application of user judgment in defining sources and the subsequent crawling of the sources to extract content into a repository in a user-defined way.
Figure 2 is an overview of the internal processing used to apply user judgement and enhance value after web content has been downloaded.
Figure 3 is an overview of the various processes used to apply individual and shared user judgment in defining sources and in managing content.
Figure 4 is an illustration of the process of defining the search universe by choosing the sources.
Figure 5 is an illustration of the process of defining or profiling a source.
Figure 6 is an illustration of the process of defining or profiling a section of the source.
Figure 7 is an illustration of the process of internalizing the user-defined content from external sources.
Figure 8 is an illustration of the process of displaying the internalized content via a copyrighted-content filter.
Figure 9 is an illustration of a display of the internalized content in a user-defined manner along with a display of associated content.
6
Figure 10 is an illustration of the process of attaching centralized labels to the external content.
Figure 11 is an illustration of the process of attaching personal labels to the external content.
Figure 12 is an illustration of the process of attaching bookmarks to the external content.
Figure 13 is an illustration of the viewing of a list of documents that have a particular label attached to it.
Figure 14 is an illustration of the first part of the process of associating other content with the external content.
Figure 15 is an illustration of the second part of the process of associating other content with the external content.
Figure 16 is an illustration of the process of forwarding annotated documents to other users (or persons outside the system).
Figure 17 is an illustration of real-time conferences related to a particular item of content.
Figure 18 is an illustration of the process of finally searching through the combined and organized content.
Figure 19 is an illustration of the display of updates to the content through a personal dashboard.
Figure 20 is an illustration of the process by which users can incorporate documents found through conventional web searches into the system.
Figure 21 is an illustration of the process by which the system can be implemented on the users' (both individuals and organizations) own computers.
7
DESCRIPTION OF INVENTION:
What is described is one embodiment of a method and system, henceforth called "Informachine", for improving the relevance and usefulness of web information searches through the introduction of user judgment.
1. Overview
Figure 1 gives a bird's eye-view of the process by which user judgment is introduced at the first stage of choosing, defining and downloading content from the sources to include in the search universe.
Informachine allows (101) users (100) to define all the sources (such as company websites) they believe will offer them content relevant to their interests and adding them to a database of web sources after tagging them with descriptors. It also allows users to define which portions (such as the titles, dates and main text of pages in the press release section) of the sources they will find most relevant. Then the Informachine web crawler (102) will use the source profiles created by the users to visit the web sources, look for fresh content of the type described by the user, download the content as described by the user into the Informachine content repository (103), which also contains content imported from users' own computers (206) and content created through the Informachine Content Management System (205).
This process is also represented in figure 3, which is a closer look at the variety of ways in which individual and shared user judgement can be applied, by 300 (individual user judgement used to define sources), 301 (shared user judgement used to define sources) and 304 (sharing of individual user judgement used to define sources).
Informachine also allows (as shown in figure 20), the importing of external documents found through conventional web search engines into the system for the purpose of storing, organizing, combining with other content, sharing and searching through. This content can be searched and sorted through (105) as shown in figure 18, with facilities to allow the user to make use of the descriptors attached to the sources in the search.
8
Figure 2 gives an illustration of the process by which user judgment is introduced after the content has been downloaded and stored in a repository for search and retrieval at the user's convenience.
v To allow the application of user judgment to the content in the repository and make it more usable, Informachine introduces an assemblage of processes (201).
Through the Informachine Content Management System (205), users (200) can create communicable content such as comments, notes, Blog posts, forum posts and conference chat and associate them with the external content so as to discuss and analyze it.
Users can also import (206) content from their own computers into Informachine.
User judgment can be applied at this stage in three ways:
q through a document management system (202) that allows the labeling/tagging, book marking and filtering of the repository content
q through the combination or association (203) of different types of other content (such as that created with the content management system and the content imported from the users' own computers) with the content downloaded from external sources, a process which acts in a way similar to tagging by allowing users
q through the sharing (204) of (the combinations of) content and the labels used to organize it within an organization or community so that other users' judgment can be availed of
These processes are also represented in figure 3 by 302 (individual user judgement used to manage content), 303 (shared user judgement used to manage content) and 304 (sharing of individual user judgement used to manage content).
After the external (web) content has been downloaded, extracted, organized, combined with other content and shared within the organization or community, Informachine provides the search and sorting tools (207) to exploit all the user
9
judgment applied to the web content to search through the content and find more relevant information.
Users can also make use of plugged-in tools such as currency and other converters, diaries and planners, etc.
2. INTRODUCING USER JUDGMENT TO DEFINE THE SEARCH UNIVERSE
Informachine enables organisations and individual users to use their knowledge and judgement to choose and add to a database all the sources, such as websites, from which they are likely to find content of relevance to their needs and from which they would like the system to regularly download fresh content so that it can be managed and searched as required.
As illustrated in figure 4, Informachine allows the user to profile each source by:
q identifying the sections of the source that need to be profiled, identifying portions of the pages of that section, such as the title and main content, to be extracted, as shown as process 401 in figure 4 and in figures 5-6,
q assigning attributes to these sources through different styles of tagging as illustrated by processes 300 and 301 in figure 3, and processes 402 and 403 in figure 4.
The above method of source management, indicated by process 100, forms the first stage of the process delineated in figure 1.
Figure 4 illustrates the process in greater detail. When a user chooses a particular source, the system checks whether the source already exists in the database. If it exists, then the source is added to the user's profile (process 400). If it is not in the database, then the user or a knowledge officer/librarian is given the facility to add the source to the database by profiling it as described by figures 5-6 and assigning two types of tags/labels to it: source categories, which are personalised labels specific to an individual user, and source areas, which are centrally administered source labels common to all users in a community. The source areas may be administered by a knowledge officer or librarian.
Figure 5 is an example of the kind of information that might be entered while adding and profiling a new source such as a corporate website: the company's
10
name (500), the company's website address or universal resource locator (URL) (501), and the name of the folder in the repository (web server or a computer on the local network) in which the files (such as images or .doc, .xls, .ppt or .pdf documents) downloaded from the website will be stored (502).
Figure 6 is an example of the kind of information that might be entered in profiling a new section of a chosen source (such as the 'news release' or 'white papers' sections of a corporate website): the name of the section (600), for example, "ABC company news releases"; the web address or URL of the section (601), e.g. http://www.ABCcompany.com/news; the type of document content downloaded from the section will be (602), e.g. press release or white paper; the index page qualifier start (603), which would be a fragment of HTML that the system will use to identify the beginning of the portion of the section index page that contains all the hyperlinks that need to be read and visited; the index page qualifier end (604), which would be a fragment of HTML that the system will use to identify the end of the portion of the section index page that contains all the hyperlinks that need to be read and visited; the hyperlink identifier (605), which identifies which hyperlinks on the section index page the system's web crawler should visit to download content, and which could be a fragment of HTML code of the web page, for example, a part of the full path of that type of hyperlink that will present in all hyperlinks of that type ("/newsrelease" from "http://www.ABCcompany.com/news/newsrelease/filename.htmr); the title start identifier (606), which identifies the start of the title of the content to be downloaded once the link has been identified and visited and could again be a fragment of HTML code that is always present in that type of page and can always be relied on to identify the start of the title; the title end identifier (607), a fragment of HTML code which can be used to identify the end of the title of the content to be downloaded; the main text start identifier (608), a fragment of HTML code which can be used to identify the start of the title of the main text to be downloaded; the main text end identifier (609), a fragment of HTML code which can be used to identify the end of the main text of the content to be downloaded.
11
In a similar manner, other identifiers can be included if other portions of content from the web page, such as the published date of the content, have to be downloaded.
Information will also need to be added about the nature of the content: whether it is an ordinary web page or a syndicated feed (610), for instance; whether the source content is copyright-governed or not (611); and also whether the content requires subscription or registration and the user has to log in using a user name and password (612).
3. CRAWLING THROUGH THE DEFINED SOURCES TO EXTRACT FRESH CONTENT
Once these profiles have been added to the database, on a regular, cyclical, basis the web crawler uses the identifiers entered to first identify freshly added web pages through the new hyperlinks it notices on a section index page and visits those fresh pages to identify and download the user-desired portions of the pages by making use of the identifiers entered.
Figure 7 describes the process followed by the web crawler once the sources have been added into the database.
The crawler obtains (700) source profiles from the database and checks (701) if the content of the section is a syndicated feed or an ordinary web page. If the content is a syndicated feed, the crawler reads (702) the syndicated feed and checks (703) if the URLs or web addresses provided in the feed are already in the web source database. If they are not present in the database, the web addresses are visited and the content found is downloaded (705). If the syndicated content is a web page, identifiers (606-609) are used to identify the portions to be extracted from it and the rest of the web page is stripped so that the extracted content can be stored (706) in the Informachine database. If the content found at the web address is a file other than an HTML file (e.g. files in .pdf, .doc, .ppt, .gif, .jpg,.xls, MP3, wav, mpeg, .dat, format), it is downloaded (707) into the folder specified (502) in the section profile (figure 5).
If the content is not a syndicated feed, the crawler visits the section of the source specified by using the URL provided (601) in the section profile and, in the page code, uses the index page qualifiers (603 and 604) and the hyperlink identifier
12
(605) to identify (704) hyperlinks of the type that the user desires and checks (703) if each URL identified in this way is present in the database or not. If a URL doesn't exist in the database, the system first checks (710) if the content requires subscription or registration and the user to log in (as specified in the source section profile (612). If it does, the full content is not downloaded into the repository. Instead only the titles, web addresses and publishing dates of the content (as defined by the user in the source profile) are downloaded into the database (711), so that the user can go to the original web page to enter subscription or registration details before downloading the full content for personal use. If it does not require the user to log in, the source section is visited and the content found is downloaded (705). If the content is a web page, identifiers (606-609) are used to identify the portions to be extracted from it and the rest of the web page is stripped so that the extracted content can be stored (706) in the Informachine database. If the content found at a web address is a file (e.g. a .pdf, .doc, .ppt, .gif, .jpg or .xls file), it is downloaded (707) into the folder specified (502) in the section profile (figure 5).
The date of the download is recorded.
When all content downloads for a particular cycle are complete, the web crawler generates (708) an XML (or any other similar extensible marked-up format) file residing on the web server and containing profile information, such as URL, title, date, description, about the freshly downloaded content. This will allow embodiments of Informachine that have the application installed on a client company's local network (see figure 21) to independently download content using the profiles stored in XML form. This process (as described by figure 21), by which each independent individual or organisation using Informachine is forced to download content afresh from copyright-protected websites, helps to ensure that laws that prevent the unauthorised distribution of copyrighted content are not flouted.
Each cycle of the web crawler also includes a process (714) for tracking errors arising out of a mismatch between the identifiers used to identify portions of a source, such as a web page, and the structure of the content (if and when such structure is modified by the owner of the source website), and notifying the system of the errors.
i
13
4. DISPLAYING THE EXTRACTED CONTENT IN A USER-DEFINED FORMAT
Once the content is downloaded, as described in figure 8 the system checks (800) in the profile if the use and distribution of source content is restricted by copyright protection. If it is, then the copyright-protected portions (the main text) of the content downloaded are not displayed to the user. The user is instead shown (801) only the titles and short descriptions of the content and when the user clicks on the title of the downloaded content, he/she is taken directly to the original version of the web page on the source website.
If the content requires subscription or registration, again, only the titles, web addresses and publishing dates of the content (as defined by the user in the source profile) are displayed, so that the user can go to the original web page to enter subscription or registration details before viewing the content in its original form on the Internet. Once the user has entered the subscription details, she/he can download the content for personal use by clicking on the 'download this item' button on the display page of such content. The system will check if the user has already entered subscription information or not before downloading it.
The content extracted and downloaded from copyright-protected sources and stored in the Informachine database (or external content) can be used by the user for search (802) and management (803) purposes, but cannot be viewed.
If the content is in the public domain (as in the case of company press releases), the content extracted and stored in the Informachine database is displayed (804) in a visual display designed to suit the user's tastes and usability preferences as shown in figures 9-10 (a summary description can also be provided) and the user can view it without having to visit the source website on the internet.
The content can be displayed through a browser on the user's computer, or, if the user desires it, on other devices and applications capable of reading the content, such as a PDA or mobile phone.
The viewer can also view (by clicking on the website address displayed above the title in the display shown in figure 9 or by clicking on the 'Online' link next to each item of content in the display shown in figures 10, 11, 12, 13) the original
14
version of the content on the source website through the Internet if he/she chooses using the link provided (805).
5. INTRODUCING USER JUDGMENT TO ORGANISE THE EXTRACTED CONTENT
Whether the content is copyright-protected or not, Informachine allows users to organise it once it has been downloaded. In figure 3, 302 and 303 represent the ways in which users can organise content through Informachine:
q the application of individual user judgement through personalised labelling or tagging.as shown in figures 11 and 13, and book-marking, as shown in figure 12 (both managed by the individual user himself/herself), which can be shared through searches such as the type shown in figure 18 and represented by 304 in figure 3.
q the application of shared judgment through hierarchical centralised labelling that allows an organisation or community (through perhaps a knowledge officer or librarian) to apply a set of labels (managed collectively), as shown in figure 10, to the content that will be common to all users in the community.
q the automatic filtering of freshly downloaded content using a pre-defined keyword search as shown in figures 18 and 19 (see "Your preferred search filters") so that content is automatically organised by keyword or by a (user-defined) combination of keywords and several other descriptors, such as source, source area and category, and users are alerted whenever there is fresh content that contains particular keywords and are from particular sources or source types.
Both types of labelling systems can be managed by adding, deleting or renaming labels. In the case of centralised labelling, the labels are arranged in a hierarchical manner and are managed centrally by users authorised to do so.
Figure 10 illustrates the process by which the user can apply a 'central label'. First the user selects the documents to be labelled by clicking on a checkbox next to them. Then the user chooses the label he/she wants to attach to or detach from the document.
A similar process, illustrated by figure 11, can be used to apply 'personal labels'.
15
Bookmarking, as shown by figure 12, can be done by first selecting the documents to be bookmarked and then clicking on the bookmark toggle icon.
To save a search as a filter Informachine allows (see figure 18) users to click on 'Save search as filter' as shown to create a new filter that will consist of all the parameters entered in the search that are applicable at the source level. These filters can be modified (by changing their component elements) or deleted by individual users.
6. INTRODUCING USER JUDGMENT TO COMBINE THE EXTRACTED CONTENT WITH OTHER
CONTENT AND SHARE AND DISCUSS THE OUTPUT
To allow users to add value to the downloaded content and hold discussions around it, Informachine allows the combination of the content with other types of content:
q with content created through Informachine's content management system (e.g. blog posts, discussion forum posts, notes and memos) as shown in figures 14, 15 and 16. As shown in figures 14 and 15, after the content to be created has been entered, the user can attach the content downloaded from external sources by clicking on 'browse' (figure 14), selecting the documents to be attached after searching for them (figure 15), and clicking on 'attach selected documents to '. As shown by figure 16, the user can also forward the content downloaded to other users with attached comments
q with conference chats: Informachine allows users to discuss particular documents on a real-time basis with other users through document-related conference chats as shown in figure 17
q with content imported into Informachine through other means, such as from the user's local computer: Informachine allows a search for content
on the user's personal computer or local computer network, its incorporation into the Informachine system and its association with content downloaded from web sources.
These associations (including the archived conference chats) are displayed along with the external document itself as shown in figure 9.
16
7. ALLOWING THE SHARING OF THE EXTRACTED, ORGANISED AND COMBINED CONTENT WITH OTHER USERS IN A COMMUNITY
This combined content and the labels attached with them can be shared between users in a community. This allows not only the sharing of user judgment, which would result in easier location of content in a community or organisation, it also allows the use and discussion of the web content.
Sharing is done either through direct forwarding as shown in figure 16, or through combination with items of communication (notes, forums, blog posts, forum posts, etc) as shown in figures 14, 15 and 16.
Informachine's user management system controls access rights given to users and allows only authorised users to see any specific content. Informachine's contacts management system allows users to manage their contacts list -including organising them into groups or communities of practice - and they can share content with individuals or groups of persons in their contacts list.
Documents forwarded to such other users will appear in such other users' 'inboxes' and they can click on and read the content and the comments or notes forwarded (or just the comments). Documents can also be forwarded to users' email addresses and mobile phones, especially if the user is not a part of the community or organisation.
Informachine allows users to share labels attached to documents by other users in the community by allowing them to search through these labels for keywords, as shown in figure 18. This is an important way of sharing user judgement in the system.
8. ALLOWING A SEARCH OF THE EXTRACTED CONTENT, MAKING USE OF INDIVIDUAL AND
SHARED USER JUDGMENT USED TO ORGANISE IT
Users can ultimately make use of the user judgment that has been applied in various ways (as described above) to content from web sources to find information more easily through two ways:
17
q sorting and sifting through content: as shown in figure 15, the user can sort through the external content using the tagging done at the source level (source areas, source categories, document types), the date of the download, and the sources themselves, to find the content they are looking for
q searching through content in a variety of ways: as figure 18 shows, the user can look for a particular document by simultaneously searching for particular keywords in the external content, for particular keywords in associated (attached) documents, for content labelled with particular source and document labels, for particular keywords in other users' source and document labels, for content from particular sources, only within bookmarked content, for content filtered through specific filters, for content downloaded between particular dates ('download dates'), and for content having particular publishing dates ('document dates')
As shown at the bottom of figure 18, Informachine allows users to save their searches as filters, so that whenever new content downloaded from external sources fits the saved search parameters (only those that are applicable at the source level) the user can be alerted.
9. ALERTING USERS ABOUT NEW CONTENT THAT ACCORDS WITH THEIR PREFERENCES
Through the Informachine dashboard (see figure 19), users can see the latest updates in web content (and also internal content) in the areas they are interested in. These alerts are made immediately after the content has been downloaded and extracted and therefore, they are only organised according to the labels and other descriptors applied at the source level.
The user can choose search filters, sources, source areas, source categories, document types (and also communication formats) for which they would like to get summaries of updates. They can also choose another set of download and document dates to view the updates that took place in that period.
Users can choose to receive the same update summaries in their areas of interest by email or on the mobiles or PDAs. The content would either be sent to the PDA or mobile, if the user wishes so, or just a hyperlink would be sent to
18
him/her so that he/she can follow it and, after logging into the Informachine system with a user name and password, view the content within the system.
10. INCLUSION OF DOCUMENTS FOUND THROUGH WEB SEARCH ENGINES
Informachine allows users to use a conventional web search (such as Google, Yahoo or MSN) to search the Internet and then displays the search results in a manner shown in figure 20, with checkboxes next to each item to allow users to select the items they find relevant. Once users have selected documents in this way, they can click on 'download selected documents', as shown in figure 20, and the content,is downloaded into the repository to be displayed and managed as shown in figures 9-19.
To conform to copyright laws, this content will be visible only to the user who conducted the search. If she/he shares it with other users, they will only be able to see the original online version as in the case of copyright protected content in general.
11. TOOLS TO FURTHER AID USE OF THE CONTENT
Informachine offers plugged-in tools such as currency converters, other converters and calculators, dictionary, thesaurus, diaries and planners for easier analysis and use of the content.
12. VARIETY OF WAYS OF STRUCTURING THE SYSTEM
Making the system available to individuals and organisations (through a multilevel, multi-user, role-based system) on the Web: In this version of Informachine, as shown in figure 7, both, the repository of content (712) and the tools (713) to manage, share, search and retrieve the content in the repository reside on the Web and are made available to users, both, independent and within organisations, through a device (such as a desktop or laptop computer or PDA or mobile phone) and application (such as a web browser) with access to the Web and capable of reading the content.
Making the system available to individuals and organisations (through a multilevel, multi-user, role-based system) on their own computers: In this version of
19
Infonmachine, as shown in figure 21, both, the repository of content (2101) and the tools (2102) to manage, share, search and retrieve the content in the repository reside on the individual's computers or the organisation's network of computers and are made available to users, both, independent and within organisations, through a device (such as a desktop or laptop computer or PDA or mobile phone) and application (such as a web browser) capable of reading the content.
As shown in figure 7, and explained earlier, when all content downloads for a particular cycle are complete, the web crawler generates (708) an XML (or any other similar extensible marked-up format) file residing on the web server (containing profile information, such as URL, title, date, description, etc.) about the freshly downloaded content. Installations of the Informachine system on the users' own computers or computer network, then independently download content into their own repositories using the profiles stored in XML form (see figure 21).
First the system installed on the users' computers reads (2100 and 2101) the XML file residing on the web server to pick up profiles of the latest updates. Then it checks (2102) to see if the URL already exists in the database and then follows the same procedure as that followed in the case of the web version to accommodate content that the user to subscribe or register in order to view it (see figure 7), before downloading the content, stripping irrelevant elements from that content (2103) and storing it (2104) in the users' repository (2105).
After this the user can use the tools described above (but now residing on the users' machines) to manage, share, search and retrieve the content in the repository (2106). Copyright laws are respected through the same process described in figure 8.
This process (as described by figure 21), by which each independent individual or organisation using Informachine is forced to download content afresh from copyright-protected websites, helps to ensure that laws that prevent the unauthorised distribution of copyrighted content are not flouted through a centralised dissemination of that content.
20
Thus, the disclosed invention describes a method and system for sharply enhancing the relevance of search results, such as web searches, by introducing individual and shared user judgement, first, to define the universe of the search, automatically internalising the content of that universe (via a copyright compliance system) in an automatically updated repository that can integrate
other (internally generated or imported) content and enable sharing according to user preferences; and, secondly, to organise the internalised content through tagging, bookmarking and filtering, comprising:
a. enabling organisations and individual users to create a database of
sources such as websites and sections of websites;
b. enabling the user to profile each specific source, such as a web
page, by identifying portions of it, such as title, main content and
inserted images, and specifying those portions to be extracted;
c. allowing units/departments of organisations and individual users in
them to create their own profiles by assigning sources to
themselves, and by assigning attributes to these sources through
different styles of tagging;
d. crawling through the selected sources to identify fresh content and
downloading it; using the source profiles to identify user-desired
portions of the fresh content and stripping the rest;
e. tracking the process for errors arising out of a mismatch between
the identifiers used to identify portions of a source, such as a web
page, and the structure of the content (if and when such structure is
modified by the owner of the source website), and notifying the
system of the errors;
f. storing the content in an automatically created central repository in
a user-defined taxonomy;
g. filtering the updated content according to predefined search
parameters and displaying the results of the process;
h. displaying the content in an individual user-defined visual format, including summary analysis, according to his/her self-defined profile on a device (such as a desktop or laptop computer or PDA or mobile phone) and application (such as a web browser) with access to the Web and capable of reading the content;
i. enabling organisations to distinguish between content that can be legally downloaded and distributed internally, and that which cannot be legally downloaded and distributed internally without permission or payment, and to display each type of content in a manner that complies with I PR requirements;
21
j. enabling organisations to distinguish between content that requires subscription or registration and content that does not and displaying the former only after the user has entered subscription or registration details;
k. enabling a knowledge officer/librarian to tag the repository content through a hierarchical central labelling system while enabling the individual user to tag the content with personal labels that can be modified at will;
I . providing the user with the ability to associate or combine the downloaded content of the repository with other content created either through the system embodying the invention or imported into it from a directory of internally generated and other, including previously and currently imported, documents;
m. providing the user with the ability to associate or combine the repository content with the output of communication events, such as annotation, forwarding of documents with comments, forums, chats, conferences and notes;
n. providing the user with the ability to share the combined content and the labels used to organise it with other users in particular communities of practice using a role-based user management system;
o. providing the facility to search through the combined and organised content making use of a multiplicity of search and query parameters to widen or narrow the search in order to enhance the relevance of the results;
p. enabling users to create alerts and newsletters for individuals, communities of interest within the organisation, or wider groups, and to broadcast these in formats such as desktop alerts, email and mobile messages;
q. providing plugged-in tools such as a currency converter, a facility to export external content to content management systems so as to be able to create documents (such as HTML,doc, .xls, .ppt files) from it, diaries and planners to help integrate the content with time-bound processes;
r. the ability to manage the system described on a multi-level, multiuser, role-based system through the internet for an organisation;
s . the ability to manage the system described on a multi-level, multiuser, role-based system through an organisation's local network for an organisation;
t. the ability to manage the system described on a multi-level, multiuser, role-based system through the internet for an independent individual;
22
u. the ability to manage the system described on a multi-level, multiuser, role-based system through an organisation's local network for an independent individual;
v. a method and system to include external documents found through conventional search engines into the system described.
Dated this 11th day of January, 2007.
FOR THE INFORMATION COMPANY PVT.LTD By their Agent
(GIRISH VIJAYANAND SHETH) KRISHNA & SAURASTRI
23
| Section | Controller | Decision Date |
|---|---|---|
| # | Name | Date |
|---|---|---|
| 1 | 55-MUM-2007-OTHERS-ORIGINAL UR 6( 1A) FORM 26-140618.pdf | 2018-09-26 |
| 1 | Petition Under Rule 137 [19-01-2017(online)].pdf | 2017-01-19 |
| 2 | 55-mum-2007-abstract(10-1-2008).pdf | 2018-08-09 |
| 2 | Other Document [19-01-2017(online)].pdf_208.pdf | 2017-01-19 |
| 3 | Other Document [19-01-2017(online)].pdf | 2017-01-19 |
| 3 | 55-mum-2007-claims(10-1-2008).pdf | 2018-08-09 |
| 4 | Form 3 [19-01-2017(online)].pdf | 2017-01-19 |
| 4 | 55-mum-2007-correspondance-received.pdf | 2018-08-09 |
| 5 | Examination Report Reply Recieved [19-01-2017(online)].pdf | 2017-01-19 |
| 5 | 55-MUM-2007-CORRESPONDENCE(17-7-2009).pdf | 2018-08-09 |
| 6 | Description(Complete) [19-01-2017(online)].pdf_207.pdf | 2017-01-19 |
| 6 | 55-MUM-2007-CORRESPONDENCE(6-1-2011).pdf | 2018-08-09 |
| 7 | Description(Complete) [19-01-2017(online)].pdf | 2017-01-19 |
| 7 | 55-mum-2007-correspondence(9-1-2008).pdf | 2018-08-09 |
| 8 | Claims [19-01-2017(online)].pdf | 2017-01-19 |
| 8 | 55-mum-2007-description (provisional).pdf | 2018-08-09 |
| 9 | 55-mum-2007-description(complete)-(10-1-2008).pdf | 2018-08-09 |
| 9 | Abstract [19-01-2017(online)].pdf | 2017-01-19 |
| 10 | 55-mum-2007-drawing(10-1-2008).pdf | 2018-08-09 |
| 10 | 55-MUM-2007-Response to office action (Mandatory) [11-06-2018(online)].pdf | 2018-06-11 |
| 11 | 55-mum-2007-drawing(11-1-2007).pdf | 2018-08-09 |
| 11 | 55-MUM-2007-FORM-26 [13-06-2018(online)].pdf | 2018-06-13 |
| 12 | 55-mum-2007-drawings.pdf | 2018-08-09 |
| 12 | 55-MUM-2007-Written submissions and relevant documents (MANDATORY) [03-07-2018(online)].pdf | 2018-07-03 |
| 13 | 55-MUM-2007-FORM 18(6-1-2011).pdf | 2018-08-09 |
| 13 | OTHERS_55-MUM-2007.pdf | 2018-08-09 |
| 14 | 55-mum-2007-form 2(10-1-2008).pdf | 2018-08-09 |
| 14 | FER RESPONSE_55-MUM-2007.pdf | 2018-08-09 |
| 15 | 55-mum-2007-form 2(title page)-(complete)-(10-1-2008).pdf | 2018-08-09 |
| 15 | COMPLETE SPECIFICATION_55-MUM-2007.pdf | 2018-08-09 |
| 16 | 55-mum-2007-form 2(title page)-(provisional)-(11-1-2007).pdf | 2018-08-09 |
| 16 | CLAIMS_55-MUM-2007.pdf | 2018-08-09 |
| 17 | ABSTRACT_55-MUM-2007.pdf | 2018-08-09 |
| 17 | 55-mum-2007-form 3(11-1-2007).pdf | 2018-08-09 |
| 18 | 55-MUM-2007-FORM 3(17-7-2009).pdf | 2018-08-09 |
| 18 | abstract1.jpg | 2018-08-09 |
| 19 | 55-mum-2007-form 5(10-1-2008).pdf | 2018-08-09 |
| 19 | 55-MUM-2007_EXAMREPORT.pdf | 2018-08-09 |
| 20 | 55-mum-2007-form-1.pdf | 2018-08-09 |
| 20 | 55-MUM-2007-HearingNoticeLetter.pdf | 2018-08-09 |
| 21 | 55-mum-2007-form-5.pdf | 2018-08-09 |
| 22 | 55-mum-2007-form-2.pdf | 2018-08-09 |
| 22 | 55-mum-2007-form-3.pdf | 2018-08-09 |
| 23 | 55-mum-2007-form-26.pdf | 2018-08-09 |
| 24 | 55-mum-2007-form-2.pdf | 2018-08-09 |
| 24 | 55-mum-2007-form-3.pdf | 2018-08-09 |
| 25 | 55-mum-2007-form-5.pdf | 2018-08-09 |
| 26 | 55-MUM-2007-HearingNoticeLetter.pdf | 2018-08-09 |
| 26 | 55-mum-2007-form-1.pdf | 2018-08-09 |
| 27 | 55-mum-2007-form 5(10-1-2008).pdf | 2018-08-09 |
| 27 | 55-MUM-2007_EXAMREPORT.pdf | 2018-08-09 |
| 28 | 55-MUM-2007-FORM 3(17-7-2009).pdf | 2018-08-09 |
| 28 | abstract1.jpg | 2018-08-09 |
| 29 | 55-mum-2007-form 3(11-1-2007).pdf | 2018-08-09 |
| 29 | ABSTRACT_55-MUM-2007.pdf | 2018-08-09 |
| 30 | 55-mum-2007-form 2(title page)-(provisional)-(11-1-2007).pdf | 2018-08-09 |
| 30 | CLAIMS_55-MUM-2007.pdf | 2018-08-09 |
| 31 | 55-mum-2007-form 2(title page)-(complete)-(10-1-2008).pdf | 2018-08-09 |
| 31 | COMPLETE SPECIFICATION_55-MUM-2007.pdf | 2018-08-09 |
| 32 | 55-mum-2007-form 2(10-1-2008).pdf | 2018-08-09 |
| 32 | FER RESPONSE_55-MUM-2007.pdf | 2018-08-09 |
| 33 | 55-MUM-2007-FORM 18(6-1-2011).pdf | 2018-08-09 |
| 33 | OTHERS_55-MUM-2007.pdf | 2018-08-09 |
| 34 | 55-mum-2007-drawings.pdf | 2018-08-09 |
| 34 | 55-MUM-2007-Written submissions and relevant documents (MANDATORY) [03-07-2018(online)].pdf | 2018-07-03 |
| 35 | 55-mum-2007-drawing(11-1-2007).pdf | 2018-08-09 |
| 35 | 55-MUM-2007-FORM-26 [13-06-2018(online)].pdf | 2018-06-13 |
| 36 | 55-mum-2007-drawing(10-1-2008).pdf | 2018-08-09 |
| 36 | 55-MUM-2007-Response to office action (Mandatory) [11-06-2018(online)].pdf | 2018-06-11 |
| 37 | Abstract [19-01-2017(online)].pdf | 2017-01-19 |
| 37 | 55-mum-2007-description(complete)-(10-1-2008).pdf | 2018-08-09 |
| 38 | Claims [19-01-2017(online)].pdf | 2017-01-19 |
| 38 | 55-mum-2007-description (provisional).pdf | 2018-08-09 |
| 39 | Description(Complete) [19-01-2017(online)].pdf | 2017-01-19 |
| 39 | 55-mum-2007-correspondence(9-1-2008).pdf | 2018-08-09 |
| 40 | Description(Complete) [19-01-2017(online)].pdf_207.pdf | 2017-01-19 |
| 40 | 55-MUM-2007-CORRESPONDENCE(6-1-2011).pdf | 2018-08-09 |
| 41 | Examination Report Reply Recieved [19-01-2017(online)].pdf | 2017-01-19 |
| 41 | 55-MUM-2007-CORRESPONDENCE(17-7-2009).pdf | 2018-08-09 |
| 42 | Form 3 [19-01-2017(online)].pdf | 2017-01-19 |
| 42 | 55-mum-2007-correspondance-received.pdf | 2018-08-09 |
| 43 | 55-mum-2007-claims(10-1-2008).pdf | 2018-08-09 |
| 43 | Other Document [19-01-2017(online)].pdf | 2017-01-19 |
| 44 | 55-mum-2007-abstract(10-1-2008).pdf | 2018-08-09 |
| 44 | Other Document [19-01-2017(online)].pdf_208.pdf | 2017-01-19 |
| 45 | 55-MUM-2007-OTHERS-ORIGINAL UR 6( 1A) FORM 26-140618.pdf | 2018-09-26 |
| 45 | Petition Under Rule 137 [19-01-2017(online)].pdf | 2017-01-19 |