FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003
COMPLETE SPECIFICATION
(See Section 10 and Rule 13)
Title of invention:
RESULT MINING SYSTEM AND METHOD IN A QUERY SEARCH
PROCESS
Applicant
TATA Consultancy Services Limited A Company Incorporated in India under The Companies Act. 1956
Having address:
Nirmal Building, 9th Floor,
Nariman Point, Mumbai 400021,
Maharashtra, India
The following specification particularly describes the invention and the manner in which it is to be performed.
FIELD OF THE INVENTION
The present invention relates to result mining for one or more search query. More particularly, the invention relates to identification of a search area in a data repository for mining search results.
BACKGROUND OF THE INVENTION
Large amount of data is stored in the knowledge repositories of an enterprise. This data is generally stored in a semi-structured or un-structured format. This data stored in the knowledge repositories belongs to one or more industries of an enterprise and various dimensions of such type of data may not be categorized or classified accordingly.
Result mining through such large amount of data stored without categorization is a tedious and time consuming task. There are various techniques existing in this field pertaining to search methodologies which gives output depending upon the relevant keyword entered as a search query. The relevant result displayed contains thousands of entries, for which doing analysis is a daunting task. The existing techniques fail to provide for searching through only a specific industry type data.
The available methods are more focused towards customization of query input by providing one or more options of query classification. However, this query customization is automatic and cannot be modified as per the requirement of a user. After the query is customized, the search is perfumed throughout the database based on keyword mapping. Such type of search may provide results in large number for which relevancy cannot be assured. Therefore, it is very important to configure the data repository in manner where data is pre-categorized so that it gets easy to search in a particular portion rather than searching through the complete data repository.
Further, the search results displayed after the implementation of a search query are not categorized according to the data type. To enable easier analysis of the displayed
search results, categorization according to industry, data type and any other defined meta-data is an essentia) requisite. The requirements may vary from one user to another; therefore the user interface displaying the results of search needs to be user customizable. The existing user interface do not provide for user customization. Moreover, the configured databases are not updated in a regular fashion in order to retrieve best results every time.
Therefore, there is a need for configuring a comprehensive system to implement a result mining strategy intended to provide optimized search results and more particularly, to provide an user customizable interface enabling the categorization of the displayed search results.
OBJECTS OF THE INVENTION
It is the primary object of the present invention to provide a system to facilitate result mining for one or more search query.
It is another object of the invention to categorize the data with respect to one or more industry, thus identifying a query search area in a database.
It is yet another object of the invention to allow the users to modify the results obtained for a particular query.
It is yet another object of the invention to receive feedbacks of the user to make improvement while obtaining search results, suggesting keywords and fetching data from externa) sources.
It is yet another object of the invention to generate one or more statistical reports with respect to a search query to be analyzed for future reference.
It is yet another object of the invention to provide a step of search within the search to provide a detailed and relevant search.
BRIEF DESCRIPTION OF THE DRAWINGS
Further objects, embodiments, features and advantages of the present invention will become more apparent and may be better understood when read together with the detailed description and the accompanied drawings. The components of the figures are not necessarily to scales, emphasis instead being placed on better illustration of the underlying principle of the subject matter. Different numeral references on figures designate corresponding elements throughout different views. However, the manner in which the above depicted features, aspects, and advantages of the present subject matter are accomplished, does not limit the scope of the subject matter, for the subject matter may admit to other equally effective embodiments.
Figure 1 illustrates the system architecture for mining query results in accordance with an embodiment of the invention.
Figure 1(b) illustrates the user customization of one or more query in accordance with an alternate embodiment of the invention.
Figure 2 illustrates the detailed components of the system architecture in accordance with an alternate embodiment of the system.
Figure 3 illustrates the flow chart for result mining in accordance with an exemplary embodiment of the invention.
DETAILED DESCRIPTION OF THE INVENTION
Some embodiments of this invention, illustrating its features, will now be discussed:
The words "comprising", "having", "containing", and "including", and other forms thereof, are intended to be equivalent in meaning and be open ended in that an item
or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items.
It must also be noted that as used herein and in the appended claims, the singular forms "a", "an", and "the" include plural references unless the context clearly dictates otherwise. Although any systems, methods, apparatuses, and devices similar or equivalent to those described herein can be used in the practice or testing of embodiments of the present invention, the preferred, systems and parts are now described. In the following description for the purpose of explanation and understanding reference has been made to numerous embodiments for which the intent is not to limit the scope of the invention.
One or more components of the invention are described as module for the understanding of the specification. For example, a module may include self-contained component in a hardware circuit comprising of logical gate, semiconductor device, integrated circuits or any other discrete component. The module may also be a part of any software program executed by any hardware entity for example processor. The implementation of module as a software program may include a set of logical instructions to be executed by the processor or any other hardware entity. Further a module may be incorporated with the set of instructions or a program by means of an interface.
The disclosed embodiments are merely exemplary of the invention, which may be embodied in various forms.
The present invention provides a system and method for mining results obtained with respect to one or more search query executed by a user. For a particular query (for a particular attribute type) data flow is maintained by referring to one or more data repository. The stored data is pre-categorized for proper query identification. The pre-configured data is then processed and filtered in order to obtain the optimized
common keyword to search for related assets, then "transition" keyword will be added into Auto suggestion. Similarly the search words used by users are analyzed and based on frequency added into the list of auto suggestions. In addition based on sales team feedback or suggestion, we consider "special'' auto-suggested keyword for addition into the system,
The search query executed by the user may include but is not limited to search based on keyword, search based on the data asset type, search based on the category type, search based on metadata definition, search based on context and semantics, or a combination thereof.
Referring to figure 1(b). various types of search categories are explained below:
a) Keyword Search: Here the user can enter any keyword(s) and search for assets. The results are displayed in a grid and grouped according to Asset Category with Sorting and Filtering capabilities.
b) Asset Type Search: Here the user can select a specific asset type with / without a keyword and search for assets. The results are displayed in a single grid with Sorting and Filtering capabilities.
c) OU Search: Here the user can select a relevant OU from existing OU dropdown list and search for assets with / without any keyword. OU stands for 'Organizational Unit'.
d) Advanced Search: Here a user can select a desired combination of metadata from a predetermined set and search for assets.
e) Browse by Category: Here a user can browse through a list of available options and select a specific topic. The selection will trigger an implicit search and display a set of results in multiple categories with Sorting and Filtering capabilities.
The system (100) is further provided with an integration module (106) which is configured to update the corresponding data repository (104) in terms of latest and useful data. The integration module (106) being scalable refers to one or more external data sources (101) in order to update the data repository (104) in a continuous manner. The external data sources (101) may include but is not limited asset detailing from multiple sources, the type/category of asset, user keyword feedback and sales solution related information.
Referring to figure 2, the data repository (104) stores only structured data. Since the integration module (106) is fetching or receiving data from external sources (101) this data should be similar in terms of structure. The system (100) further comprises of a conversion module (116) configured to convert unstructured and semi-structured industry specific data into the structured data to be stored in the data repository (104).
Again referring to figure 1, now, with respect to the query received from the user, the search process is initiated by searching the data in the corresponding data repository (104). The system (100) further comprises of the classification module (108) which is configured to pre-categorize the data with respect to one or more attributes thus stored in the data repository (104).
The attributes may include but is not limited to industry (sales and pre-sales. finance, production, and human resources), metadata, asset type, category type, etc. This will identify a portion of repository storing the most relevant results with respect to the search query. This portion may refer to a query search area.
By way of example, the data repository (104) may have a categorized portion where finance related data is stored. Now, when the query is received, the classification module (108) will identify that the query is related to finance and will directly link
the query data to a respective portion of the data repository (104) where stored data is related to the finance.
The system (100) further comprises of the search engine (110) configured initiate the process of search by using the keywords and other search parameters in order to fetch optimized results with respect to one or more search query thus executed. The search engine (110) will search based on the pre-categorized data provided by the classification module (108).
The search engine (110) further comprises of the data filtration module (112) which is configured to filter the data. The data filtration module (112) will filter the query results through a relevance based converging schema which means that the query will be searched only in the query search area thus identified and not the entire data repository (104). This will result in quicker retrieval of query results with better accuracy. The results will be obtained by linking the clata from one category to other in order to ensure that if more than one portion of the data repository (104) has been identified as query search area, the results obtained are not repetitive and are relevant with respect to the query thus executed. This will provide a second stage of mining query results by sorting the results to avoid redundancy-
Once the search process is completed, results are obtained and are displayed to the user. To provide a last stage of mining, the system (100) further comprises of a customization module (114) configured to allow a user to modify the results as per his requirement. The user can customize the query results by selecting one or more option. For example, if a user is only interested in results related to case law of a particular subject matter, he can not only select case law option but can also provide his input in terms of subject matter he is interested in. This will automatically list the query results with respect to the user's requirements providing him a better display with a good visualization.
Customization module (114) also provides user an option of providing his feedback with respect to the results thus obtained. This feedback may be used for future references. The user may also categorize results as per their relevance so that at a time when the same subject matter is searched, the results are obtained by referring to user's categorization and feedback. This feedback may also include keyword suggestion suggested at the time of preparing search string by the user.
The search results may be displayed in a plurality of manner and can be used for one or more user specific purpose.
The various categories in which the search results may be displayed are listed below. This is to be understood these display categories should not limit the scope of the invention.
i. Delivery Experience - Provides a relevant list of Case Studies, Best Practices and Lessons Learnt that can be used to answer the question "Where have we done this before?"
ii. Sales Experience - Provides a relevant list of past proposals and proposal fragments that can be used to answer the question "Where have we pitched this before?"
iii. Sales Collaterals - Provides a relevant list of Domain & Technology Offerings that can be used to answer the question "What are our capabilities and/or offerings in this area?"
iv. Proposal Accelerators - Provides a relevant list of Frameworks & Methodologies, Proposal FAQs, Guidelines & Standards, Checklists etc. that can be used to answer the question "Is there something that can be reused?"
v. Partner Artifacts - Provides a relevant list of relevant artifacts created with/by our partners that can be used to bolster our pitch in new pursuits.
vi. Wiki - Provides list of wikis of FAQs and similar knowledge asset based on content search and display brief information.
The display of search results may be in a form of report where a user may directly view related information. For example, the report may include times lines in a particular section, references in some other section, and contact details of linked persons in different section. The system (100) will also search in background the information of the linked persons and will update the report accordingly for the utility of the user. This step may be performed by way of reverse search engine and may be termed as search within the search.
Referring to figure 2, the system (100) further comprises of a self-learning module (118) communicating with the customization module (314) and is configured to enable a self-learning capability of the system (100) by storing and analyzing feedback provided by the user with respect to a particular query type. This will improve the search process every time the same subject area is searched.
The system (100) may also store user's contact details for future reference maintaining confidentiality. For example, if a user provides his email address, then he will receive updates for a particular search query whenever data repository is updated w'ith respect to the corresponding search query. The user may then access the system (100) in order to perform the detailed search for said query,
As shown in figure 2. the system (100) further comprises of a selection module (120) communicatively coupled to the search engine (110) and self-learning module (118) to select pre-determined number of closes results with respect to a particular query. The selection module (120) based on repetitive search for a query type auto displays to the user the pre-determined number of closes results. These closest results may include but is not limited to results retrieved maximum number of times, results which are suggested plurality of times by different users or a combination thereof.
The pre-determined number may include but is not limited to displaying 3 results in three different tabs or windows as requested by the user.
The categorization of result is done automatically by using microsoft programming technology. For example the first categorized tab on result named "Delivery Experience" provides information under group type Best Practices, Case Studies, Caselets, Lessons Learnt, Success Stories. The program is build such that if any of the group type result info is not available then that tab will not be displayed. For example if there is no white paper or wiki group in result based on keyword used then "white paper" or "wiki" categorization tab will be displayed on result.
In the proposed system, the technique used is XML parsing technology, where result data mining are done based on custom business rules and data are categorized to show the result. Other technique may include but is not limited to using DB table, saving data into different external file formats or transferring the control to user to choose the type of categorization.
The system (100) further comprises of a reporting module (122) (as shown in figure 2) configured to generate one or more reports displaying statistical data related to the use of the system (100). The reports are generated for offline or future reference in order to perform one or more analytical operations. These reports may include but is not limited to reports related to what type of search is performed, when a search is performed, number of results retrieved, improvement in result retrieval based on feedback etc.
The system (100) is further platform independent wherein the search methodology could be performed on mobile, ipad etc.
BEST MODE/EXAMPLE FOR WORKING OF THE INVENTION
The system and method illustrated provides mining of search results in a query execution by working example stated in the following paragraph; the process is not restricted to the said example only:
In accordance with an exemplary embodiment, referring to figure 3. the query is entered by the user (step 300). The query is related to sales industry for example, sale of TATA Indica in 2010 in India. The query is generated by providing inputs for query customization (step 302).
Now. this sales related query is entered to a server (data repository) storing all the details of TATA Indica sales in a classified manner. Since, user has customized his query by restricting the timeline to year 2010. the query starts processing inside the server (step 304) only in a search identified area which stores information about TATA Indica for year 2010. The system fetches all the data regarding TATA Indica sale in 2010 in tabular form (in an excel sheet). Further, the results may also include names of parties in purchase category along with contact details. These contact details may be used for taking product performance feedback. At client side processing (step 306), this additional info may be updated by entering the feedbacks of the purchasers as obtained by the user by using the contact info.
Now, whenever next time the same query will get executed, the results will include feedbacks (after proper validation by referring to external sources) of the purchasers and the results will now be updated results. The same results may be exported in a report form for internal analysis of the product sale and performance.
We claim:
1. A system to facilitate result mining for one or more search query, the system comprising:
an integrating module, being scalable and configured to refer one or more data repository by integrating data types from the one or more external data sources in a continuous manner;
an interface configured to receive said one or more search query from one or more user in a user customized manner;
a classification module configured to pre-categorize the data with respect to one or more attributes thereby identifying a query search area in the said data repository; and
a search engine configured to process the said categorized data to fetch optimized results with respect to the one or more search query thus executed, the search engine further comprising:
a data filtration module configured to filter the data from the data repository by relevance based converging schema thereby searching results only in the identified query search area in the said data repository; and
a customization module configured to allow a user to modify the results thus obtained for a particular query, in order to provide a customized display of the search results according to the user's requirement.
2. The system as claimed in claim 1, wherein the system further comprises a conversion module configured to convert the unstructured and semi-structured industry specific data into the structured data to be stored in the data repository, wherein the said data may be of large volume.
3. The system as claimed in claim 1, wherein the interface is further coupled to a customizable dictionary module configured to automatic suggesting of the one or more keywords to the user with respect to the said search query.
4. The system as claimed in claim 1, wherein the search engine further comprises a selection module configured to select and display separately three or more closest results based on relevance parameter with respect to the said query.
5. The system as claimed in claim 1, wherein the system further comprises a reporting module configured to generate reports displaying the statistical data related to the use of the framework.
6. The system as claimed in claim 1, wherein the industry may include but is not limited to sales and pre-sales, finance, production, human resources.
7. The system as claimed in claim 1. wherein the search query executed by the user may include but is not limited to search based on keyword, search based on the data asset type, search based on the category type, search based on metadata definition, search based on context and semantics, or a combination thereof.
8. The system as claimed in claim 1, wherein the system further comprises a self-learning module which further helps in filtering data by analyzing results viewed by the user with respect to the query and searching for common results for a repetitive query.
9. A method to facilitate mining for one or more search query, the method comprising the steps of:
updating one or more data repository by integrating the data from one or more external resources in a continuous manner;
receiving the one or more search query from one or more users in a customized manner;
categorizing the data with respect to one or more attributes, thus identifying a
query search area in the data repository; and
processing the categorized data to fetch optimized results with respect to the
one or more query thus executed, the processing further comprising:
filtering data from the data repository by searching results only in the identified query search area present therein the data repository; and
allowing a user to modify the results thus obtained for a particular query, in order to provide a customized display of search results according to the user's requirements.
10. The method as claimed in claim 9, wherein the method further comprises of converting unstructured or semi structured industry specific data into structured data to be stored in the data repository.
11. The method as claimed in claim 9. wherein the method further comprises suggesting of the one or more keywords to the user with respect to the searched query.
12. The method as claimed in claim 9, wherein the method further comprises steps of self-learning which further helps in filtering data by analyzing results viewed by the user with respect to the query and searching for common results for a repetitive query.