Abstract: The present subject matter discloses system and method for optimizing a query formation for a search engine.. The system captures user-queries from a user in their own language. Further, keywords and context are extracted from the user-queries using rules and background knowledge and ontologies. The system transforms the keywords into a single or a set of SPARQL queries for further processing. Further, the verbalization may be performed i.e., transforming back the SPARQL query to the natural language query in order to reconfirm whether user wanted the same query or not. After the confirmation, the system further processes the SPARQL query and visualize the results of the query to the user in an appropriate format. Further, the system also keeps log of every user activity and learns from these activities and uses that information for better performance and more focused results.
DESC:FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003
COMPLETE SPECIFICATION
(See Section 10 and Rule 13)
Title of invention:
SYSTEM AND METHOD FOR OPTIMIZING A QUERY FORMATION FOR A SEARCH ENGINE
APPLICANT:
Tata Consultancy Services Limited
A company Incorporated in India under The Companies Act, 1956
Having address:
Nirmal Building, 9th Floor,
Nariman Point, Mumbai 400021,
Maharashtra, India
The following specification particularly describes the invention and the manner in which it is to be performed.
CROSS-REFERENCE TO RELATED APPLICATIONS AND PRIORITY.
[001] The present application claims priority to Indian Provisional Patent Application No. 3441/MUM/2014, filed on October 30, 2014, the entirety of which is hereby incorporated by reference.
TECHNICAL FIELD
[002] The present subject matter described herein, in general, relates to system and method for optimizing query formation for a search engine.
BACKGROUND
[003] Cyber-physical system (CPS) is a seamless integration of computational algorithms and physical components. The CPS provides better interaction with physical world by collaborating computational elements controlling physical entities of the physical world. The existing concerns in the CPS system are resolving user queries and providing proper and focused output for that user queries. In currently known art, SPARQL i.e., a RDF (Resource Description Framework) query language is used to resolve such user queries.
[004] User has to ask their queries in a machine readable/understandable format, for example SPARQL format, to the system for resolving the query. For forming the queries in the readable format, it is expected from the user to have exact knowledge of domain data structure and keywords related to the queries. This expectation limits the use and scalability of the systems resolving the queries formed by end user. Also, there is lack of encouragement provided by these systems to the user for building the queries in their own languages. No such guidance or support is provided by these query resolving systems.
[005] Moreover, re-converting the queries build in the machine readable format back into the user language for confirming the queries from the end user is another point of interest lacking in this query resolving environment. Further, the other concerns involved in the query resolution are lack of automatic learning from past experience, not taking user context, past user logs, user demographics, and other data sources.
SUMMARY
[006] This summary is provided to introduce aspects related to systems and methods for optimizing a query formation for a search engine and the concepts are further described below in the detailed description. This summary is not intended to identify essential features of subject matter nor is it intended for use in determining or limiting the scope of the subject matter.
[007] In one implementation, a system for optimizing a query formation for a search engine is disclosed. The system may comprise a processor and a memory coupled to the processor for executing a plurality of modules stored in the memory. The plurality of modules may comprise a creating module, a matrix generating module, a computing module, query generating module, transforming module, executing module, and displaying module. At first, the creating module may determine ontology of a first set of keywords input by a user. Further, the creating module may fetch a second set of keywords and one or more categories based on the ontology of the first set of keywords. The second set of keywords may comprise words related to the first set of keywords. Further, the second set of keywords may be associated to one or more categories. The creating module may further map the second set of keywords with the one or more categories. Further, the creating module may facilitate creation of a user-query based upon the mapping, wherein the user-query is created in at least one of a natural language format and a syntax-defined format. Further, the matrix generating module may generate a matrix comprising words present in the user-query. Further, the computing module may compute a path length between each pair of words present in the matrix. Further, the query generating module may generate an optimized user-query string in a query language based upon the path length between each pair of words, thereby optimizing the query formation for a search engine.
[008] In another implementation, a method for optimizing a query formation for a search engine is disclosed. The method may comprise determining, by a processor, ontology of a first set of keywords input by a user. The method may further comprise a step of fetching, by the processor, a second set of keywords and one or more categories based on the ontology of the first set of keywords. The second set of keywords may comprise words related to the first set of keywords. Further, the second set of keywords may be associated to one or more categories. The method may further comprise a step of mapping, by the processor, the second set of keywords with the one or more categories. Further, the method may comprise a step of facilitating, by the processor, a creation of a user-query based upon the mapping. Further, the user-query may be created in at least one of a natural language format and a syntax-defined format. Further, the method may comprise a step of generating, by the processor, a matrix comprising words present in the user-query. The method may further comprise a step of computing, by the processor, a path length between each pair of words present in the matrix. Further, the method may comprise a step of generating, by the processor, an optimized user-query string in a query language based upon the path length between each pair of words, thereby optimizing the query formation for a search engine.
[009] In yet another implementation, a non-transitory computer readable medium embodying a program executable in a computing device for optimizing a query formation for a search engine is disclosed. The program may comprise a program code for determining ontology of a first set of keywords input by a user. Further, the program may comprise a program code for fetching a second set of keywords and one or more categories based on the ontology of the first set of keywords. Further, the second set of keywords may comprise words related to the first set of keywords. Further, the second set of keywords may be associated to one or more categories. Further, the program may comprise a program code for mapping the second set of keywords with the one or more categories. The program may further comprise a program code for facilitating a creation of a user-query based upon the mapping. The user-query may be created in at least one of a natural language format and a syntax-defined format. The program may further comprise a program code for generating a matrix comprising words present in the user-query. Further, the program may comprise a program code for computing a path length between each pair of words present in the matrix. Further, the program may comprise a program code for generating an optimized user-query string in a query language based upon the path length between each pair of words, thereby optimizing the query formation for a search engine.
BRIEF DESCRIPTION OF THE DRAWINGS
[0010] The detailed description is described with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same numbers are used throughout the drawings to refer like features and components.
[0011] Figure 1 illustrates a network implementation of a system for providing query resolution, in accordance with an embodiment of the present subject matter.
[0012] Figure 2 illustrates the system, in accordance with an embodiment of the present subject matter.
[0013] Figure 3A-3C illustrates detail working of the system, in accordance with an embodiment of the present subject matter.
[0014] Figure 4 illustrates method for optimizing the query formation for a search engine, in accordance with an embodiment of present disclosure.
DETAILED DESCRIPTION
[0015] Systems and methods for optimizing a query formation for a search engine are disclosed. The present disclosure facilitates the query formation semantically by converting a user-query into a query string. The query string may be formed in SPARQL Protocol and RDF Query Language (SPARQL) i.e., in a machine understandable format. During the query formation, the system may take a set of keywords, context, data and background knowledge as an input and produces the desired result as output. Further, the system may also keep log of every activity of the user and also learns from and uses that information for better performance and more focused result. The system may further keep the underlying semantic web technologies as transparent as possible to the user.
[0016] At first, the system may take set of keywords indicating a query as an input from the user. From the set of keywords, the system further determines ontology associated with the set of keywords. Further, system fetches a richer set of keywords having words related to the set of keywords input by the user. The system may also fetch concepts or categories associated with the richer set of keywords. Further, the richer set of keywords is mapped with the concepts or the categories. Based on the mapping, the system facilitates the creation of the user-query in the language format or a syntax-defined format. While creating the user-query, the system may take only restricted set of keywords considered from the richer set of the keywords. According to aspects, the present disclosure discloses about question-answering model on the sensor web combined with semantic web.
[0017] Further, the system generates a matrix comprising the words present on the user-query created by the system. Further, path length may be computed corresponding to each pair of words present in the matrix. In next step, the system processes each pair of the words along with their path length for generating an optimized user-query string in the language like SPARQL. According to aspects of present disclosure, the system may provide verbalization i.e., transforming the user-query string back into the user-query i.e., in the natural language format or the syntax-defined format in order to confirm consistency between the user-query and the optimized user-query string. Post confirming the consistency, the system may execute the user-query string and display the results of the execution in a format based on the nature of the user-query input by the user. Further, the system may also learn as well with the past data.
[0018] While aspects of described system and method for optimizing a query formation for a search engine may be implemented in any number of different computing devices, environments, and/or configurations, the embodiments are described in the context of the following exemplary system.
[0019] Referring to Figure 1, a network implementation 100 of a system 102 for optimizing a query formation is illustrated, in accordance with an embodiment of the present subject matter. In one embodiment, the system 102 facilitates the query formation semantically for providing focused output to the user. Although the present subject matter is explained considering that the system 102 is implemented as a software application on a server, it may be understood that the system 102 may also be implemented as a variety of computing systems, such as a laptop computer, a desktop computer, a notebook, a workstation, a mainframe computer, a server, a network server, a tablet, a mobile phone, a robot and the like. In one implementation, the system 102 may be implemented in a cloud-based environment. It will be understood that the system 102 may be accessed by multiple users through one or more user devices 104-1, 104-2…104-N, collectively referred to as user 104 hereinafter, or applications residing on the user devices 104. Examples of the user devices 104 may include, but are not limited to, a portable computer, a personal digital assistant, a handheld device, and a workstation.
[0020] In one implementation, the network 106 may be a wireless network, a wired network or a combination thereof. The network 106 can be implemented as one of the different types of networks, such as intranet, local area network (LAN), wide area network (WAN), the internet, and the like. The network 106 may either be a dedicated network or a shared network. The shared network represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), Wireless Application Protocol (WAP), and the like, to communicate with one another. Further the network 106 may include a variety of network devices, including routers, bridges, servers, computing devices, storage devices, and the like.
[0021] Referring now to Figure 2, the system 102 is illustrated in accordance with an embodiment of present disclosure. In one embodiment, the system 102 may include at least one processor 202, an input/output (I/O) interface 204, and a memory 206. The at least one processor 202 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the at least one processor 202 is configured to fetch and execute computer-readable instructions or modules stored in the memory 206.
[0022] The I/O interface 204 may include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like. The I/O interface 204 may allow the system 102 to interact with a user directly or through the client devices 104. Further, the I/O interface 204 may enable the system 102 to communicate with other computing devices, such as web servers and external data servers (not shown). The I/O interface 204 can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. The I/O interface 204 may include one or more ports for connecting a number of devices to one another or to another server.
[0023] The memory 206 may include any computer-readable medium or computer program product known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, a compact disks (CDs), digital versatile disc or digital video disc (DVDs) and magnetic tapes. The memory 206 may include modules 208 which may perform particular tasks or implement particular abstract data types.
[0024] The modules 208 include routines, programs, objects, components, data structures, etc., which perform particular tasks or implement particular abstract data types. In one implementation, the modules 208 may include a creating module 210, a matrix generating module 212, computing module 214, query generating module 216, transforming module 218, executing module 220, displaying module 222, and other modules 224. The other modules 224 may include programs or coded instructions that supplement applications and functions of the system 102.
[0025] The data 226, amongst other things, serves as a repository for storing data processed, received, and generated by one or more of the modules 208. The data 226 may also include a query database 228, and other data 230.
[0026] Now referring to 3A-3C illustrates detail working of the system 102, in accordance with embodiments of present disclosure. As can be seen from figure 3A, the system 102 i.e., at the left hand side of the figure, is coupled with a set of external data 302. The set of external data 302 may aid in the functioning of the system 102. Further, the set of external data 302 may comprise static data, dynamic data, tools, web service, ontology, and semantic web resources. Further, the static data represents permanent or slowly changing which may include geographical data and user related data. The dynamic data represents the data whose truth value changes drastically over time like sensor readings. Further, the tool may be an external helper tools like WordNetTM for semantic extensions of the keywords. Further, the web service may include visualization of sensor data using say open street map services for spatial result data. Further, the ontology may represent the knowledge on concept and their relations, useful for query formation and reasoning. Further, the semantic web resources may represent factual data that may be used for query evaluation like DBpediaTM. The system 102 may further comprise a transactional log 304 as shown in figure 3A.
[0027] According to embodiments of present disclosure, the creating module 210 of the system 102 is provided to facilitate a user to create a user-query. In a first step, the creating module 210 may determine ontology of a first set of keywords input by the user. According to embodiments of present disclosure, the first set of keywords may be input by the user through a user input module. The user input module may be part of the creating module 210 of the system 102. In next step, the creating module 210 may fetch a second set of keywords and one or more categories based on the ontology of the first set of keywords. Further, the second set of keywords (i.e., the richer set of keywords) may be fetched using lexical database like WordNetTM, DBPediaTM or other type of database. Also, the second set of keywords may be associated to one or more categories. Further, the second set of keywords may comprise words (like synonyms and similar meaning words) related to the first set of keywords. According to embodiments of present disclosure, each of the words from this rich set (i.e., the second set of keywords) may be mapped with one or more categories or concepts (and consequently their uniform resource identifier uniform resource identifier URIs) from the underlying ontology.
[0028] For example, if the ontology has an object relation called “hasName” then ‘Name’ is the basic keyword. The second set of keywords (based on the WordNetTM search) may be fetched as Set A = {name, surname, middlename, signature, brand, moniker, tag, autograph}. Now, these second set of keywords may be mapped with the one or more categories/concepts from the ontology for which “hasName” is a relevant property. The one or more categories/concepts may comprise Set B = {company, person, pet, child, human, place, event}. Being a member of the ontology, each category provided in the Set B may have an URI associated with it. In the next step, the many-to-many mapping between the Set A and Set B and corresponding URI set may be created. A rich mapping may be created from other such properties, labels, annotations etc., which is contained within the ontology [as shown in Figure 3B]. It must be understood that this technically equivalent to “Same As” property of Web Ontology Language (OWL), but the link with the DBPediaTM or the WordNetTM makes it more oriented towards real world queries. When the user enters his/her queries then he/she will be restricted to enter words defined within Set A only. Thus, richer the Set A is, richer the query expressivity can be.
[0029] Thus, based on the mapping, the creating module 210 may facilitate the creation of the user-query. According to embodiments, the creating module 210may facilitate the user to write/create the user-query in a natural language (NL) format or in a syntax-defined format. For defining the syntax, the creating module 210 may identify related keywords from object and data properties, annotations, and labels of the underlying ontology. The user-query created may be stored in the query database 228 of the system 102.
[0030] According to embodiments of present disclosure, system 102 may further provide a feature of indexing and ranking of the one or more categories and the second set of keywords. This feature may enable the user to enter the first set of keywords using words present in the Set A (i.e., the second set of keywords). To help the user, the system 102 discloses an effective indexing so that when user types first few alphabets of the first set of keywords, a list of “did you mean” type suggestions from the Set A is provided. Without effective indexing this will not be possible quickly and in real-time. Further, the present disclosure may also disclose to keep a rank of the second set of keywords and the one or more categories/concepts in order to find out which are the most frequently used keywords and which are the most frequently used categories/concepts connected with the keyword of the second set of keywords. Keeping a log of each transaction that happen in the creating module 210, each keywords/queries that the user selects from suggestion list may play key role in such ranking.
[0031] According to embodiments of present disclosure, the creating module 210 may further provide the rich set of keywords. In order to make the second set of keywords (Set A) richer, the system 102 may include the SPARQL keywords (including operators) Set C = {Describe, Select, From, Ask, Construct, Filter, Bind, Distinct, Order by, Limit, Offset}, SPARQL function keywords (including standard and user defined functions) as Set D = Set D1 Union Set D2 where D1={Avg} and D2={far, nearby, between}, keywords from normal set and graph operations as Set E = {union, intersection, subset, within, and, or, not}, keywords like common fillers as Set F = {Where, When, Who, What, How}, thus creating the rich Set A = (A U C U D U E U F U G).
[0032] Further, the creating module 210 may provide a feature of “keyword categorization” in which the categorization of the keywords in the Set A are based on hierarchy of related mapped the categories/concepts in the ontology, external knowledge, and annotations. For example, keywords like {place, location, at, here, where, north, south, bottom, under} etc., may be categorized as “Geo-location” categories/concepts, {before, after, peak hour, in the meantime, when} etc. can be categorized as “Temporal”, {happened, at, when, results, due to, before} etc., can be categorized as “Event” and so on. It may be observed that a single keyword can have multiple category associated with it (like when, at, before as shown above). Categorization of the second set of keywords may help to create corresponding query formation or template query selection as discussed in subsequent paragraphs of the specifications.
[0033] After creating the user-query, the matrix generating module 212 of the system 102 may generate a matrix comprising words present in the user-query. Further, the computing module 214 of the system 102 may compute a path length between each pair of words present in the matrix. The matrix and the path length computed may be further used for query formation.
[0034] Query formation can be done in many ways as described in state of the art. However, to meet the latency requirements, pre-computation and minimal online computation is considered to be the best way. According to embodiments of present disclosure, query templates may be used for fast mapping of the second set of keywords to query patterns. The path length computed i.e., weights given to each path based on distance as well as frequency of usage and data property of the words present in the second set of keywords. As can be seen in the figure 3C, from node to node there is a path length 5, whereas from node to the path has value -1, because the data corresponding to the node is not present in RDF data set. So even if a query is formed, it will not be evaluated due to missing data.
[0035] So in such cases, the query formation step is halted as soon as such an edge is encountered that has no corresponding data. In other words, based on the path length computed, the system may discard one or more words from the second set of keywords during the query formation in the query language. Further, this speeds up query formation process. Another data property stored with each edge is the volume of data, which will indicate the processing times on query evaluation. Edges having high volume data may be kept at the last of the query string to minimize join costs of SPARQL operation. Another aspect is the frequency of data change of an edge, an edge having more dynamic data should be kept as the last part of the SPARQL query to give the latest results on evaluation. According to various embodiments of present disclosure, the system 102 discloses a path storage module (not shown in the figure) for getting feedback from a learning module (306) as shown in figure 3A to refine the pre-computed path structure like frequency of traversal and data property changes. There is an optional step for the user to confirm the queries which they want to run and the feedback helps in learning.
[0036] Further, the query generating module 216 of the system 102 may generate an optimized user-query string in a query language based upon the path length between each pair of words, thereby optimizing the query formation for a search engine. According to the embodiments of present disclosure, the system 102 further may enable the user to create query templates based on cohesiveness of the second set of keywords and inter-relations of corresponding one or more categories/concepts. For example, with three keywords like {company, name, sensor} and related inter-relations {like hasName, makes}, the system 102 may pre-compute template queries like {who manufactures the sensor A, What is the name of sensor made by company B, List all sensor manufacturing companies}. For these queries, related SPARQL queries can easily be pre-computed and can be cached. Similarly, many more such templates can be created by the query formation module 212 using the keywords and the related inter-relations. For example, when the user will type keywords like company, makes etc., then these queries (discussed above) will be suggested as a list to the user and on selection of any of the queries (from that list), related pre-computed SPARQL may directly run without any further computational steps. Thus, this whole process reduces the query execution time and complexity for forming the queries.
[0037] According to other embodiments of present disclosure, based on the first set of keywords entered or selection from suggestive list of queries, the user-query may be formed based on concept oriented and data oriented. The concept oriented queries may be like “What is a sensor”, “What an accelerometer measures”, “What are measurement capabilities of an energy meter”. For these concept oriented queries, the query generating module 216 need not enquire into triple database, instead it will enquire into concepts, relation, hierarchy triples in the ontology/OWL and can answer the questions from there only. Since relations and hierarchies are more or less static in nature, pre-computation may not make much of difference here.
[0038] On other hand, for the data oriented queries like “How many new sensors have been deployed in last month”, “What is the average power consumption in industrial area during festive seasons” etc., the query generating module 216 needs to enquire into the triple database. Data being dynamic in nature, query outputs may vary. For example, once a SPARQL data query is created and on execution it is observed that there is no output data then clearly execution of such queries is an overhead to the system 102. The pre-computation of data weightage in paths (i.e., the path lengths) involving data centric queries may play an important role here. When generating the SPARQL query (i.e. optimized user-query string), query generating module 216 needs to traverse the graph database to reach to correct nodes. During such process, if it is found that there is no data triple in any of the related graph paths then the output of the query generating module 216 may be null and in that case there is no point in forming the query.
[0039] According to other embodiments of present disclosure, when there are multiple possible graph paths between two nodes and all of them holds data then multiple query may be pre-computed but when they will be shown as a suggestive list, the query with heavily populated data path may get higher rank than the query with lowly populated data. According to embodiments of present disclosure, two matrixes may be maintained, one containing the individual (node-edge-node) information about data volume and presence/absence, another matrix containing path lengths and path information between two nodes. Alternative paths up to a limit may be stored as well. Computation of the path length may be based on these matrices and they may help to decide query formation and query ranking.
[0040] Further, transforming module/verbalization module 218 of the system 102 may transform the optimized user-query string into the user-query in the at least one of the natural language format and the syntax-defined format for confirming consistency between the user-query and the optimized user-query string. In other words, once the user-query is SPARQL-ed (i.e., the optimized user-query string is generated), then there might be difference in what user actually wanted and what the system 102 is going to SPARQL. This difference may be a result of inefficiency between aforementioned steps (i.e., generating of the optimized user-query string). In this case, the transforming/verbalization module 218 may recreate the natural language query from SPARQL format and may display to the user for his/her OK/approval or changes, if required. The transforming/verbalization module 218 may act as a rectification which helps enhancing the result correctness and query efficiency of the system 102 as a whole. Further, a sentence mapping may aid in mapping the SPARQL expression to corresponding English sentences based on the patterns.
[0041] In the next step, the executing module 220 of the system 102 may take the SPARQL queries (i.e., the optimized user-query string) as the input, runs the optimized user-query string on semantic triple database of a search engine and provides output. Once optimized user-query string is rectified, the executing module 220 may run the optimized user-query string on semantic database (linked data in form of RDF triplets). According to embodiments of present disclosure, the system 102 further provides a reasoning engine 308 as shown in figure 3A to support reasoning on a sensor data which may be required to derive meaning of higher level semantic term to easy query evaluation. For example, sensor locations (latitude, longitude) pairs can be transformed to place names if the corresponding ontology deals with place names. Also, the reasoning engine 308 may be used to isolate one sensor providing “low” temperature reading while other sensors in the same network provides “high” temperatures. The terms “low” and “high” are semantically defined in knowledgebase.
[0042] After executing the optimized user-query string, the displaying module 222 of the system 102 may display results of the execution of the optimized user-query string based on nature of the user-query. In other words, the output data generated after optimized user-query string execution is visualized as per user choice, data nature, and query context in order to show the data in meaningful format. For example, output of the optimized user-query string like “Locate the traffic camera sensors in Kolkata” is best visualized in a map of Kolkata traffic guard region with sensor locations brightly pointed out. Similarly a comparison data query like “what is the share of Samsung and Philips made sensor deployments in India” is best visualized with the help of a pie chart. Further, the displaying module 222 provides features like “Selection of Visual mode”, “Incremental Query Result Visualization”, and “Weightage formula based automatic visualization”. Each of these features is explained in detail in subsequent paragraphs of the specification.
[0043] In the feature i.e., “Selection of Visual mode”, the hierarchy of the second set of keywords or the one or more categories involved in the user-query, their annotations may suggest the class/nature of data that come as output. Also, over all nature of the user-query may also decide data nature for e.g. output of a query with “where” keyword can be categorized as spatial data. This nature or classification will in turn may suggest the visualization mode of query output. Further, the previous user’s selection mode may also play a role here. The displaying module 222 may keep track of such choice and may suggest future users according to that choice log.
[0044] In the next feature, of the displaying module 222, i.e., “Incremental Query Result Visualization”, while the user selects multiple queries from multiple suggestive lists then execution of those queries will run serially and result output of each such query can be shown serially (i.e. incrementally) but the final result output may be shown as a union of all the output data from all the queries. For example, if a user wants to know location of an event in terms of latitude-longitude and in terms of name of place. Then two separate queries can run and two separate set of result can be obtained (one showing latitude-longitude and the other showing places where the event occurred). Further, if the first set of output comes first then it may be shown as it is, but when the second query result will come then the output may be shown as a union of both sets of results. With its other relevant links, recommendations, relevant other queries ordered according to their rank may be shown.
[0045] Further, in the next feature i.e., “Weightage formula based automatic visualization”, the visualization may depend on two main factors. One is user selection for visualization mode which means that from the given set of visualization options user selects one option manually.
[0046] According to embodiments of present disclosure, the other process performed may be automatic i.e. the displaying module 222 may decide which is the best method of visualization for a given type of query, type of output data, and what visualization users have selected earlier for same type of query and data. To formulate it mathematically, let’s suppose that weight for j-th visual technique for i-th type of the user-query, is the weight for j-th visual technique for i-th type of data and is the weight for j-th visual technique based on its selection in previous similar situation by other users. Of these, first two are almost static but the third one is very dynamic as it depends on the user selection. For a given user-query, its output and past records, total weightage for j-th visualization technique can be calculated as:
[0047] Further, may decide which visualization technique can be used. Also, based on the above equation, it may be observed that the deciding factors may be more than three and accordingly the formula may change. The weightage values for and can be decided by domain experts. Few examples of visual techniques may be chart, text, graph, map and the like.
[0048] According to embodiments of present disclosure, the system 102 may further comprise the transactional log 304 (as shown in figure 3A). The transactional log 304 may capture all type of transactions that occur during life cycle of each and every user-query i.e. keyword inputs, user-query selection, mapping of the keyword and the one or more categories, selection of the user-query after confirming the consistency, and the display/visualization mode selection. Further, the transactional log 304 may be used for betterment of user-query performance, enrichment of knowledge base, better query selection and execution, and better and enriched visualization of user-query output.
[0049] When the user types a/few keyword/s, some suggestions of the user-queries may come as a list and the user may select one/some of them. Against such selection, records associated with the keywords and the selected query may be kept at transactional log 304. Upon repeated occurrence of such keywords, selection of the user-query may vary among different users. Thus, a ranking of selected user-query against a particular keyword may be kept to enhance future suggestive list of the user-query. Similarly, keeping track of a natural language query selected against the SPARQL query, at the end of transforming/verbalization module 218 may be also used in order to show mostly selected user-query and thus guiding the user in the selection.
[0050] Upon receiving the output data at the end of query cycle, the transactional log 304 may log the selection of view mode that user chooses. This may be important because this log data may be used for providing suggestions upon receiving such data next time. All these data can be used to enrich the knowledge base itself using some learning tools. Considering the volume of the log, it should be kept following a very efficient data structure so that quick retrieval is possible. One example data structure for user input vs. terms processed is shown in the figure 3C.
[0051] Further, along with all other logs, the system 102 may also keep track of context in which the user asks the question. To understand the context, the system 102 analyze the profile of the user to identify what type of queries he/she generally asks, his/her spatial and temporal location data etc. Also, if the user is new to the system 102 then looking at his/her profile and query history of similar other user profiles, the system 102 may anticipate the query the new user is going to ask. Further, the system 102 may also take into account what queries he/she asked in the present session. Further, the next query may be assumed to be of the same context or same domain/ type as the series of previous questions in the same sessions. Further, the use of some keywords in a particular query may let the system 102 to know what the contextual goal of the user-query is. For example, if user writes “how” then in most of the cases he/she looks for a process, if he/she uses how many then off course it is quantitative query.
[0052] According to the embodiments of present disclosure, the work-flow may be explained with an example by considering a user query i.e., “Show distribution of market share by energy meter manufacturer for the year 2013”. This is considered as user-query input by the user in a natural language format. This is one way of putting the query in the query front end. But another way can be by using keywords defined by keyword set A. Thus, the above query can be entered like “manufacturer energy meter model no. 10 units”. It may be observed that, this query contains keywords that are defined in keyword set dictated by the system 102. Further, the system 102 may also pre-compute the possible queries using these keywords. Following table shows a sample relation graph between these keywords i.e. nodes. Thus, the possible queries can be:
1. meterModel isA EnergyMeter
EnergyMeter isMadeBy manufacturer
Manufacturer hasMarketShare marketShare
Manufacturer hasName Name
meterModel madeInYear Year
Year hasValue int
2. manufacturer establishedIn Year
meterModel isMadeBy Manufacturer
meterModel isA EnergyMeter
[0053] From the above table, it can be observed that two options can be formed as SPARQL queries and after running the transforming/verbalization module 218, the options for the user may be seen as (1) Which are the manufacturers that made model no 10 last year; and (2) Which are the manufacturers established in last year make model no 10. Of these two query options, the user may select one and accordingly corresponding SPARQL query can be executed. These two queries being pre-computed will reduce the response time of the system 102.
[0054] Now after running the query, output data would be shown through visualization model. For example, if the user selects query 1 then it would be best represented by a comparative bar graph showing manufacturers vs. meter m. Further, the pre-computed SPARQL queries, user selection from the query options, and related other activities may be stored in transaction log 304 of the system 102. Now upon entering such a keyword set, the system 102 may suggest sample queries like (a) market share of energy meter manufacturer for last year; (b) Market of energy meter for the last year; and (c) Energy market share in last year as metered by.
[0055] Further, another example is explained in detail to understand the functionalities and interconnectivities of the system 102. Let us assume that the user U wants to know the average temperature of sensors in a zone Z. The user will like to input in the system “average temperature in Z”. But as the user input (here a browser text-box) uses a restricted vocabulary, so the term “average” may come as a drop-down option as the user starts typing the first few letters. Further, a meta-data contains information that “avg” and “average” are representations of same operations, so any one can be selected. The term “temperature” is coming from the ontology class name, while “in” is a operator of form , the “thing” in this case is a geo-spatial space named “Z”, coming from the static geo-spatial data or via a web-service for geo-spatial place names. Further, “Z” may be represented in terms of a polygon of latitude and longitude points. Using query template of form , the system 102 may determine that the computation that needs to be done is “within a place” denoted by latitude and longitude points, that corresponds to the Geo-SPARQL function “within”. The system 102 may also know that it will have to use the average value of the result that corresponds to SPARQL “avg” operation.
[0056] Assume that the ontology contains the following categories/concepts: . “Temperature” is a category/concept, so an operation cannot be made on this, hence the system 102 searches for value field corresponding to the “temperature” concept to operate and hence the SPARQL pattern is formed as shown in table below:
1 ?sensor
2 ?sensor ?value
[0057] Now, the task for the system 102 is to connect the “temperature” category/concept with a place named “Z” to form a SPARQL query. It is known that “Z” is a place name as in the present approach of restricted vocabulary, the meta-data associated with the keywords is known. So the possible SPARQL connecting patterns formed/generated by the system 102 are shown in table below.
1 ?sensor ?temperature. ?sensor
2 ?sensor ?temperature. ?sensor ?place. ?place
3 ?sensor ?temperature. ?sensor ?latlng. geo:within(latlng, {polygon of `Z'})
[0058] From the above table, it can be observed that Query 3 is written in a simple way for ease of understanding without expanding the polygon of Z conforming to Geo-SPARQL syntax. Let us assume that the RDF data does not contain triples of form at the current system state. So it is useless to form and execute query 3 as there is no data to support a result. As the system 102 takes care of data property in query formation by means of Graph Path Storage, only query pattern 1 and 2 will be taken forward. As the volume of data corresponding to an edge is also stored, the query patterns will be restructured in order of increasing volume. So the restructured query patterns are shown in table below.
1 ?sensor .?sensor
2 ?place . ?sensor ?place. ?sensor
[0059] Now, the final two queries formed by linking the meta-data which are in table below.
Q1 select avg(?value) where { ?sensor . ?sensor . ?sensor ?value. }
Q2 select avg(?value) where { ?place . ?sensor ?place. ?sensor . ?sensor ?value }
[0060] There is an option for the end users (having knowledge of SPARQL) to have a look at the queries for confirmation and if none of the queries seem to convey the user's intent, the user does not proceed and the information in the log may be used for system 102 improvement by manual checking for shortcomings. If the user is not proficient in SPARQL, the confirmation is done where the queries Q1 and Q2 are mapped to suitable sentences for easy understanding for the user which is shown in table below.
V1 select average value of temperature of sensors having location of Z
V2 select average value of temperature of sensors having location of place located within Z
[0061] The above verbalizations may be done based on the meta-data associated with the relation and concepts in the ontology and static data in form of sentential templates. For example, if the user selects V1, then Q1 gets executed and the data is displayed to the user. This information may be stored in the log, so that if the same user directly wants to run a query again without verbalization, past choices like V1 are taken into account for ranking of query execution. However, if the query was about getting location of sensors in “Z”, then the end result will be (latitude, longitude) or place names whose meta-data contains (latitude, longitude) and in that case visualization will be shown in the Map, as by reading the URIs of the results and reading the associated meta-data, the system 102 may understand that this is a list of geo-spatial co-ordinates.
[0062] Further, if the user has asked for “mean” temperature, backed by WordNetTM, the user may be prompted if they actually want average temperature. In this way, the lexical databases like WordNetTM helps in extending the keyword set. If the user has entered a term say “hotness” and the system 102 cannot map the term to any of the restricted keyword set, then the same information is stored in the log so that the ontology can be enhanced to grow the restricted vocabulary and connect “hotness” with “temperature” by a fuzzy threshold.
[0063] Further, the reasoning engine 308 (as shown in figure 3A) applies user defined rules on streaming data combined with background knowledge to infer new facts. An example use may be mapping of point location (latitude, longitude) of sensor data to place names, to match query patterns of the form as sometimes too much location granularity is not desired. Other uses are removing or flagging contradicting data like if all weather sensors report cold temperature and one does not, it is probably faulty.
[0064] Referring now to Figure 4, method for optimizing a query formation for a search engine is shown, in accordance with an embodiment of the present disclosure. The method 400 may be described in the general context of computer executable instructions. Generally, computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, etc., that perform particular functions or implement particular abstract data types. The method 400 may also be practiced in a distributed computing environment where functions are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, computer executable instructions may be located in both local and remote computer storage media, including memory storage devices.
[0065] The order in which the method 400 is described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method 400 or alternate methods. Additionally, individual blocks may be deleted from the method 400 without departing from the spirit and scope of the disclosure described herein. Furthermore, the method can be implemented in any suitable hardware, software, firmware, or combination thereof. However, for ease of explanation, in the embodiments described below, the method 400 may be considered to be implemented in the above described system 102.
[0066] At block 402, ontology may be determined for a first set of keywords input by a user.
[0067] At block 404, a second set of keywords and one or more categories may be fetched based on the ontology of the first set of keywords. Further, the second set of keywords may comprise words related to the first set of keywords. Further, the second set of keywords may be associated to one or more categories.
[0068] At block 406, the second set of keywords may be mapped with the one or more categories.
[0069] At block 408, a user-query may be created, based upon the mapping, in at least one of a natural language format and a syntax-defined format.
[0070] At block 410, a matrix comprising words present in the user-query may be generated.
[0071] At block 412, a path length between each pair of words present in the matrix may be computed.
[0072] At block 414, an optimized user-query string may be generated in a query language based upon the path length between each pair of words. Thus, the query formation may be optimized for the search engine.
[0073] At block 416, the optimized user-query string is transformed into the user-query in the at least one of the natural language format and the syntax-defined format for confirming consistency between the user-query and the optimized user-query string.
[0074] At block 418, the optimized user-query string is executed in the search engine.
[0075] At block 420, the results of the execution of the optimized user-query string is displayed based on nature of the user-query.
[0076] Although implementations for methods and systems for optimizing the query formation have been described in language specific to structural features and/or methods, it is to be understood that the appended claims are not necessarily limited to the specific features or methods described. Rather, the specific features and methods are disclosed as examples of implementations for optimizing the query formation semantically and facilitating focused output to the user. ,CLAIMS:WE CLAIM:
1. A method for optimizing a query formation for a search engine, the method comprising:
determining, by a processor, ontology of a first set of keywords input by a user;
fetching, by the processor, a second set of keywords and one or more categories based on the ontology of the first set of keywords, wherein the second set of keywords comprises words related to the first set of keywords, and wherein the second set of keywords are associated to the one or more categories;
mapping, by the processor, the second set of keywords with the one or more categories;
facilitating, by the processor, a creation of a user-query based upon the mapping, wherein the user-query is created in at least one of a natural language format and a syntax-defined format;
generating, by the processor, a matrix comprising words present in the user-query;
computing, by the processor, a path length between each pair of words present in the matrix; and
generating, by the processor, an optimized user-query string in a query language based upon the path length between each pair of words, thereby optimizing the query formation for a search engine.
2. The method of claim 1, further comprising transforming the optimized user-query string into the user-query in the at least one of the natural language format and the syntax-defined format for confirming consistency between the user-query and the optimized user-query string.
3. The method of claim 1, further comprising executing the optimized user-query string in the search engine.
4. The method of claim 3, further comprising displaying results of the execution of the optimized user-query string based on nature of the user-query.
5. The method of claim 1, further comprising creating a query template based on cohesiveness of the first set keywords and inter-relations of the one or more categories associated with the words present in the first set of keywords.
6. The method of claim 1, further comprising indexing and ranking the second set of keywords and the one or more categories for providing suggestions to the users while facilitating the generation of the user-query, wherein the indexing and the ranking is based on the frequency of usage of the second set of keywords and the one or more categories associated therewith.
7. A system 102 for optimizing a query formation for a search engine, wherein the system 102 comprises:
a processor 202;
a memory 206 coupled with the processor 202, wherein the processor 202 executes a plurality of modules 208 stored in the memory 206, and where the plurality of modules 208 comprises:
creating module 210 to
determine ontology of a first set of keywords input by a user,
fetch a second set of keywords and one or more categories based on the ontology of the first set of keywords, wherein the second set of keywords comprises words related to the first set of keywords, and wherein the second set of keywords are associated to the one or more categories,
map the second set of keywords with the one or more categories, and
facilitate creation of a user-query based upon the mapping, wherein the user-query is created in at least one of a natural language format and a syntax-defined format;
matrix generating module 212 to generate a matrix comprising words present in the user-query;
computing module 214 to compute a path length between each pair of words present in the matrix; and
query generating module 216 to generate an optimized user-query string in a query language based upon the path length between each pair of words, thereby optimizing the query formation for a search engine.
8. The system 102 of claim 7, further comprises transforming module 218 to transform the optimized user-query string into the user-query in the at least one of the natural language format and the syntax-defined format for confirming consistency between the user-query and the optimized user-query string.
9. The system 102 of claim 7, further comprises executing module 220 execute the optimized user-query string in the search engine.
10. The system 102 of claim 9, further comprises displaying module 222 to display results of the execution of the optimized user-query string based on nature of the user-query.
11. A non-transitory computer readable medium embodying a program executable in a computing device for optimizing a query formation for a search engine, the program comprising:
a program code for determining ontology of a first set of keywords input by a user;
a program code for fetching a second set of keywords and one or more categories based on the ontology of the first set of keywords, wherein the second set of keywords comprises words related to the first set of keywords, and wherein the second set of keywords are associated to the one or more categories;
a program code for mapping the second set of keywords with the one or more categories;
a program code for facilitating a creation of a user-query based upon the mapping, wherein the user-query is created in at least one of a natural language format and a syntax-defined format;
a program code for generating a matrix comprising words present in the user-query;
a program code for computing a path length between each pair of words present in the matrix; and
a program code for generating an optimized user-query string in a query language based upon the path length between each pair of words, thereby optimizing the query formation for a search engine.
| # | Name | Date |
|---|---|---|
| 1 | 3441-MUM-2014-Response to office action [24-05-2023(online)].pdf | 2023-05-24 |
| 1 | Form 2.pdf_107.pdf | 2018-08-11 |
| 2 | Form 2.pdf | 2018-08-11 |
| 2 | 3441-MUM-2014-US(14)-HearingNotice-(HearingDate-01-06-2023).pdf | 2023-05-04 |
| 3 | 3441-MUM-2014-CLAIMS [05-06-2020(online)].pdf | 2020-06-05 |
| 4 | Figure of Abstract.jpg | 2018-08-11 |
| 4 | 3441-MUM-2014-COMPLETE SPECIFICATION [05-06-2020(online)].pdf | 2020-06-05 |
| 5 | 3441-MUM-2014-Power of Attorney-191214.pdf | 2018-08-11 |
| 5 | 3441-MUM-2014-FER_SER_REPLY [05-06-2020(online)].pdf | 2020-06-05 |
| 6 | 3441-MUM-2014-OTHERS [05-06-2020(online)].pdf | 2020-06-05 |
| 6 | 3441-MUM-2014-Form 1-201114.pdf | 2018-08-11 |
| 7 | 3441-MUM-2014-FER.pdf | 2019-12-05 |
| 7 | 3441-MUM-2014-Correspondence-201114.pdf | 2018-08-11 |
| 8 | 3441-MUM-2014-Correspondence-191214.pdf | 2018-08-11 |
| 9 | 3441-MUM-2014-FER.pdf | 2019-12-05 |
| 9 | 3441-MUM-2014-Correspondence-201114.pdf | 2018-08-11 |
| 10 | 3441-MUM-2014-Form 1-201114.pdf | 2018-08-11 |
| 10 | 3441-MUM-2014-OTHERS [05-06-2020(online)].pdf | 2020-06-05 |
| 11 | 3441-MUM-2014-Power of Attorney-191214.pdf | 2018-08-11 |
| 11 | 3441-MUM-2014-FER_SER_REPLY [05-06-2020(online)].pdf | 2020-06-05 |
| 12 | Figure of Abstract.jpg | 2018-08-11 |
| 12 | 3441-MUM-2014-COMPLETE SPECIFICATION [05-06-2020(online)].pdf | 2020-06-05 |
| 13 | 3441-MUM-2014-CLAIMS [05-06-2020(online)].pdf | 2020-06-05 |
| 14 | Form 2.pdf | 2018-08-11 |
| 14 | 3441-MUM-2014-US(14)-HearingNotice-(HearingDate-01-06-2023).pdf | 2023-05-04 |
| 15 | Form 2.pdf_107.pdf | 2018-08-11 |
| 15 | 3441-MUM-2014-Response to office action [24-05-2023(online)].pdf | 2023-05-24 |
| 1 | 3441_05-12-2019.pdf |