Abstract: The present disclosure relates to a system and method for query reformulation using query understanding. The system receives a search query from a user, and extracts one or more attributes associated with one or more input tokens in the search query to identify a search intent. The system retrieves one or more related tokens corresponding to each of the input tokens based on the identified search intent. The one or more related tokens may be semantically related to the corresponding input tokens. The system generates one or more candidate queries with the retrieved related tokens. The system executes the one or more candidate queries to retrieve a set of search artifacts. The system reformulates the candidate queries based on number of search artifacts retrieved.
DESC:RESERVATION OF RIGHTS
[0001] A portion of the disclosure of this patent document contains material, which is subject to intellectual property rights such as, but are not limited to, copyright, design, trademark, Integrated Circuit (IC) layout design, and/or trade dress protection, belonging to Jio Platforms Limited (JPL) or its affiliates (hereinafter referred as owner). The owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all rights whatsoever. All rights to such intellectual property are fully reserved by the owner.
FIELD OF DISCLOSURE
[0002] The embodiments of the present disclosure generally relate to natural language processing. More specifically, the present disclosure relates to systems and methods for query reformulation using query understanding.
BACKGROUND OF DISCLOSURE
[0003] The following description of related art is intended to provide background information pertaining to the field of the disclosure. This section may include certain aspects of the art that may be related to various features of the present disclosure. However, it should be appreciated that this section be used only to enhance the understanding of the reader with respect to the present disclosure, and not as admissions of prior art.
[0004] Natural Language Processing (NLP) focuses on processing the text in a literal sense, like what was said. Conversely, Natural Language Understanding (NLU) focuses on extracting the context and intent, or in other words, what was meant. In the world of ecommerce, n-grams coming from search queries may have different meanings or interpretations in different contexts, for example, word ‘apple’ may be fruit in one context (grocery) and it may also be a brand in another context (electronics).
[0005] A number of techniques have been mentioned in the existing arts regarding the original search query retrieved from a user which is often inadequate in fulfilling their information need when search engines use keyword matching. This is because of two aspects of language, synonymy (different words can convey the same meaning), and polysemy (a word can mean different things depending on context). The issues resulting from both synonymy and polysemy are more prevalent in shorter queries. Given that users performing web searches use 2.4 words per query on average, addressing vocabulary mismatch and intent ambiguity elegantly is really important in providing a good experience.
[0006] With the increase in electronically accessible material, the ability to find that material in an organized and efficient manner is becoming more difficult. One technique typically used for finding and organizing electronically accessible material is the use of search engines. While the above searching technique works well for searching a single search index and obtaining an ordered query result set from that search index, it does not provide an efficient means for searching multiple indexes and obtaining a single ordered query result set. Using the above technique, an entity may have to individually search for each search index and then manually review each of the different query result sets. One technique that has been implemented to attempt to resolve this deficiency is through the simple merging of multiple query result sets based on the ranking of individual items identified within those sets. However, because different search indexes utilize different properties and definitions for determining the relevance of items matching the parameters of a query, the simple merged query result set may likely not contain an accurate ordering of the item identifications contained therein. Because different search indexes utilize different definitions and parameters for ranking matching items, an item of higher actual relevance to the query may be ranked lower than another item from a different search index that is of less actual relevance because of the different factors used in ranking those items by the different search indexes.
[0007] Further, existing techniques mainly focus on expanding queries using synonyms and similar tokens/phrases, and these expanded queries are then used to increase the recall of the original search query. It does not rely on the knowledge-aware query understanding, but on the query tokens i.e., the query tokens are directly replaced with synonyms or similar words without having a contextual understanding of token. Accordingly, there is a need for a system and method that at least partially addresses one or more limitations of the prior arts.
[0008] There is, therefore, a need in the art to provide a system and method that can overcome the shortcomings of the existing systems and methods.
OBJECTS OF THE PRESENT DISCLOSURE
[0009] Some of the objects of the present disclosure, which at least one embodiment herein satisfy are as listed herein below.
[0010] An object of the present disclosure is to facilitate query reformulation by using query understanding to automatically transform search queries in order to better represent the searcher’s intent.
[0011] Another object of the present disclosure is to enable the implementation of query understanding as the system obtains the search query along with the user’s metadata as input and predicts the true intent of the query.
[0012] Another object of the present disclosure is to enable query reformulation by utilizing predicted intent to reformulate the query, and the query reformulation can be achieved with different goals in mind such as narrowing down of scope of the query (reducing number of search results) and broadening of the scope of the query (increasing number of search results).
[0013] Another object of the present disclosure is to provide query reformulation by using query understanding, which is cost-effective, time-effective, and adapted widely.
[0014] Another object of the present disclosure is to facilitate the evaluation of query reformulation candidates to determine improved (e.g., more relevant) search results when incorporated into an original query.
[0015] Another object of the present disclosure is to provide a global approach that modifies the query regardless of the results that expand the query based on documents initially returned by the search.
[0016] Another object of the present disclosure is to eliminate outdated search techniques, which are cumbersome, time consuming, and involves considerable user interaction.
[0017] Another object of the present disclosure is to provide query reformulation by using query understanding which enables the use of different sources of information.
[0018] Another object of the present disclosure is to provide optimized, dynamic, and flexible results for diversified application scenarios in the future.
[0019] Another object of the present disclosure is to eliminate the occurrence of complexity in understanding the user’s query.
SUMMARY
[0020] This section is provided to introduce certain objects and aspects of the present disclosure in a simplified form that are further described below in the detailed description. This summary is not intended to identify the key features or the scope of the claimed subject matter.
[0021] In an aspect, the present disclosure relates to a system for query reformulation using query understanding. The system includes one or more processors, and a memory operatively coupled to the one or more processors, where the memory includes processor-executable instructions, which on execution, cause the one or more processors to receive a first set of data packets indicative of a search query having one or more input tokens from a computing device associated with a user and extract one or more attributes associated with each of the one or more input tokens from a knowledge graph to identify a search intent of the user. The one or more processors may retrieve one or more related tokens corresponding to each of the one or more input tokens from the knowledge graph based on the identified search intent, the one or more related tokens being semantically related to the one or more input tokens. The one or more processors may also generate one or more candidate queries by performing one or more edit operations on the search query based on the identified search intent. The one or more processors may further execute the one or more candidate queries on a search engine to retrieve a search result having a set of search artifacts from a database.
[0022] In an embodiment, the one or more processors may be configured to assign a similarity value to each of the one or more candidate queries based on similarity between the one or more candidate queries and the search query, and rank each search artifact in the set of search artifacts retrieved in each search result corresponding to the one or more candidate queries based on a relevance value determined using the similarity value assigned to said one or more candidate queries.
[0023] In an embodiment, to identify the search intent from the search query, the one or more processors may be configured to generate one or more n-grams of the one or more input tokens in the search query, wherein n may be a predetermined integer indicative of a context size of each n-gram, and disambiguate the one or more attributes corresponding to each of the one or more input tokens from a context provided in the one or more n-grams having said one or more input tokens.
[0024] In an embodiment, to generate the one or more candidate queries, the one or more processors may be configured to perform any one or a combination of adding one or more of the retrieved related tokens to the search query, deleting one or more of the input tokens from the search query, and substituting one or more of the input tokens in the search query with the corresponding related tokens, wherein the one or more related tokens may be either added or substituted based on the search intent.
[0025] In an embodiment, the one or more related tokens retrieved may either have a hypernymous semantic relation or a hyponymous semantic relation with the corresponding one or more input tokens, where the one or more candidate queries generated using hypernymous related tokens may be assigned a lower restrictiveness value compared to the one or more candidate queries generated using hyponymous related tokens, the restrictiveness value being determined based on a distance value and a direction value between the one or more related tokens from the corresponding one or more input tokens in the knowledge graph, and where the one or more processors may select the one or more related tokens having either the hypernymous or hyponymous semantic relation with the corresponding one or more input tokens based on the search intent identified for the search query to generate the one or more candidate queries.
[0026] In an embodiment, where a number of search artifacts returned in the search result may be less than a predetermined hits threshold, the one or more processors may be configured to reformulate the one or more candidate queries by either deleting one or more of the related tokens from the one or more candidate queries or substituting one or more of the related tokens with the corresponding one or more input tokens in the one or more candidate queries to decrease a restrictiveness value between the one or more candidate queries, and execute the one or more reformulated candidate queries to obtain the corresponding search results.
[0027] In an embodiment, the one or more attributes of the one or more input tokens and the one or more related tokens in the knowledge graph may include any one or a combination of named entity tags, one or more associated word-senses, named entity tags, dependency tags, part-of-speech tags, word embeddings, morphological features, translation and transliteration in a plurality of languages, and semantic and linguistic relationships between each of the one or more input tokens and the corresponding one or more related tokens.
[0028] In another aspect, the present disclosure relates to a method for query reformulation using query understanding. The method includes receiving, by a processor, a first set of data packets indicative of a search query having one or more input tokens from a computing device associated with a user. The method may include extracting, by the processor, one or more attributes associated with each of the one or more input tokens from a knowledge graph to identify a search intent of the user. The method may also include retrieving, by the processor, one or more related tokens corresponding to each of the one or more input tokens from the knowledge graph based on the identified search intent, the one or more related tokens being semantically related to the one or more input tokens. The method may further include generating, by the processor, one or more candidate queries by performing one or more edit operations based on the identified search intent. The method may also include executing, by the processor, the one or more candidate queries on a search engine to retrieve a search result having a set of search artifacts from a database.
[0029] In an embodiment, the method may include assigning, by the processor, a similarity value to each of the one or more candidate queries based on similarity between the one or more candidate queries and the search query, and ranking, by the processor, each search artifact in the set of search artifacts retrieved in each search result corresponding to the one or more candidate queries based on a relevance value determined using the similarity value assigned to said one or more candidate queries.
[0030] In an embodiment, identifying the search intent from the search query may include generating, by the processor, one or more n-grams of the one or more input tokens in the search query, wherein n may be a predetermined integer indicative of a context size of each n-gram, and disambiguating, by the processor, the one or more attributes corresponding to each of the one or more input tokens from a context provided in the one or more n-grams having said one or more input tokens.
[0031] In an embodiment, generating the one or more candidate queries may include any one or a combination of adding, by the processor, one or more of the retrieved related tokens to the search query, deleting, by the processor, one or more of the input tokens from the search query, and substituting, by the processor, one or more of the input tokens in the search query with the corresponding related tokens, wherein the one or more related tokens may be either added or substituted based on the search intent.
[0032] In an embodiment, the one or more related tokens retrieved may either have a hypernymous semantic relation or a hyponymous semantic relation with the corresponding one or more input tokens, where the one or more candidate queries generated using hypernymous related tokens may be assigned a lower restrictiveness value compared to the one or more candidate queries generated using hyponymous related tokens, the restrictiveness value being determined based on a distance value and a direction value between the retrieved one or more related tokens from the corresponding one or more input tokens in the knowledge graph, and where the method includes selecting the one or more related tokens having either the hypernymous or hyponymous semantic relation with the corresponding one or more input tokens based on the search intent identified for the search query to generate the one or more candidate queries.
[0033] In an embodiment, when the number of search artifacts returned in the search query may be lower than a predetermined hits threshold, the method may include reformulating, by the processor, the one or more candidate queries by either deleting one or more of the related tokens from said one or more candidate queries or substituting one or more of the related tokens with the corresponding one or more input tokens in the one or more candidate queries to decrease the restrictiveness value between of the one or more candidate queries, and executing, by the processor, the reformulated one or more candidate queries to obtain the corresponding search results.
[0034] In an embodiment, the one or more attributes of the one or more input tokens and the one or more related tokens in the knowledge graph may include any one or a combination of named entity tags, one or more associated word-senses, named entity tags, dependency tags, part-of-speech tags, word embeddings, morphological features, translation and transliteration in a plurality of languages, and semantic and linguistic relationships between each of the one or more input tokens and the corresponding one or more related tokens.
[0035] In another aspect, the present disclosure relates to a user equipment. The user equipment includes one or more processors, and a memory operatively coupled to the one or more processors, where the memory includes processor-executable instructions, which on execution, cause the one or more processors to transmit a first set of data packets indicative of a search query having one or more input tokens to a system, receive a second set of data packets from the system indicative of one or more search artifacts, and render the one or more search artifacts on a graphical user interface associated with the user equipment based on a rank assigned to said one or more search artifacts.
[0036] In an embodiment, the one or more input tokens in the search query may be indicative of any one or a combination of words, phrases and linguistic elements of expressed in a plurality of languages.
BRIEF DESCRIPTION OF DRAWINGS
[0037] The accompanying drawings, which are incorporated herein, and constitute a part of this invention, illustrate exemplary embodiments of the disclosed methods and systems in which like reference numerals refer to the same parts throughout the different drawings. Components in the drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the present invention. Some drawings may indicate the components using block diagrams and may not represent the internal circuitry of each component. It will be appreciated by those skilled in the art that invention of such drawings includes the invention of electrical components, electronic components or circuitry commonly used to implement such components.
[0038] FIG. 1 illustrates an exemplary network architecture (100) for implementing a proposed system for query reformulation using query understanding, in accordance with an embodiment of the present disclosure.
[0039] FIG. 2 illustrates an exemplary block diagram (200) of the proposed system for query reformulation using query understanding, in accordance with an embodiment of the present disclosure.
[0040] FIG. 3 illustrates a query reformulation pipeline (300), in accordance with an embodiment of the present disclosure.
[0041] FIG. 4 illustrates a graphical representation (400) of a knowledge base and query reformulation therewith, in accordance with an embodiment of the present disclosure.
[0042] FIG. 5A illustrates a flow chart representation (500A) of the proposed method for enabling contextual narrowing/broadening of a search query, in accordance with an embodiment of the present disclosure.
[0043] FIG. 5B represents an exemplary representation (500B) for narrowing or broadening reformulation of the search query, in accordance with an embodiment of the present disclosure.
[0044] FIGs. 6A-6B illustrate exemplary representations (600A, 600B) for contextual query reformulation in which or with which the proposed system may be implemented, in accordance with an embodiment of the present disclosure.
[0045] FIG. 7 illustrates an exemplary computer system (700) in which or with which embodiments of the present invention can be utilized, in accordance with embodiments of the present disclosure.
[0046] The foregoing shall be more apparent from the following more detailed description of the invention.
DETAILED DESCRIPTION
[0047] In the following description, for the purposes of explanation, various specific details are set forth in order to provide a thorough understanding of embodiments of the present disclosure. It will be apparent, however, that embodiments of the present disclosure may be practiced without these specific details. Several features described hereafter can each be used independently of one another or with any combination of other features. An individual feature may not address all of the problems discussed above or might address only some of the problems discussed above. Some of the problems discussed above might not be fully addressed by any of the features described herein.
[0048] The ensuing description provides exemplary embodiments only, and is not intended to limit the scope, applicability, or configuration of the disclosure. Rather, the ensuing description of the exemplary embodiments will provide those skilled in the art with an enabling description for implementing an exemplary embodiment. It should be understood that various changes may be made in the function and arrangement of elements without departing from the spirit and scope of the disclosure as set forth.
[0049] Specific details are given in the following description to provide a thorough understanding of the embodiments. However, it will be understood by one of ordinary skill in the art that the embodiments may be practiced without these specific details. For example, circuits, systems, networks, processes, and other components may be shown as components in block diagram form in order not to obscure the embodiments in unnecessary detail. In other instances, well-known circuits, processes, algorithms, structures, and techniques may be shown without unnecessary detail in order to avoid obscuring the embodiments.
[0050] Also, it is noted that individual embodiments may be described as a process which is depicted as a flowchart, a flow diagram, a data flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be re-arranged. A process is terminated when its operations are completed but could have additional steps not included in a figure. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, its termination can correspond to a return of the function to the calling function or the main function.
[0051] The word “exemplary” and/or “demonstrative” is used herein to mean serving as an example, instance, or illustration. For the avoidance of doubt, the subject matter disclosed herein is not limited by such examples. In addition, any aspect or design described herein as “exemplary” and/or “demonstrative” is not necessarily to be construed as preferred or advantageous over other aspects or designs, nor is it meant to preclude equivalent exemplary structures and techniques known to those of ordinary skill in the art. Furthermore, to the extent that the terms “includes,” “has,” “contains,” and other similar words are used in either the detailed description or the claims, such terms are intended to be inclusive—in a manner similar to the term “comprising” as an open transition word—without precluding any additional or other elements.
[0052] Reference throughout this specification to “one embodiment” or “an embodiment” or “an instance” or “one instance” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Thus, the appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
[0053] The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items.
[0054] Various embodiments of the present disclosure provide a system and method for query reformulation using query understanding. The present disclosure facilitates query reformulation by using query understanding to automatically transform search queries in order to better represent a user’s intent. The present disclosure enables the implementation of query understanding as the system obtains the search query along with the user’s metadata as input to predict the true intent conveyed in the query. The present disclosure enables query reformulation by utilizing predicted intent to reformulate the query, the query reformulation can be achieved with different goals in mind such as narrowing down of scope of the query (reducing number of search results) and broadening of the scope of query (increasing number of search results). Thus, the present disclosure provide query reformulation by using query understanding, which is cost-effective, time-effective, and adapted widely.
[0055] The present disclosure facilitates the evaluation of query reformulation candidates to determine improved (e.g., more relevant) search results when incorporated into an original query. The present disclosure provides a global approach that modifies the query regardless of the results that expand the query based on documents initially returned by the search. The present disclosure eliminates outdated techniques, which are cumbersome, time consuming, and involves considerable user interaction. The present disclosure provides query reformulation by using query understanding which enables the use of different sources of information. The present disclosure provides optimized, dynamic, and flexible results for diversified application scenarios in the future. The present disclosure eliminates the occurrence of complexity in understanding the user’s query.
[0056] In an aspect, systems and method for query reformulation using query understanding may receive a search query from a user, and may extract one or more attributes associated with one or more input tokens in the search query to identify a search intent. The system may retrieve one or more related tokens corresponding to each of the input tokens based on the identified search intent. The one or more related tokens may be semantically related to the corresponding input tokens. The system may generate one or more candidate queries with the retrieved related tokens. The system may execute the one or more candidate queries to retrieve a set of search artifacts. The system may reformulate the candidate queries based on number of search artifacts retrieved.
[0057] Certain terms and phrases have been used throughout the disclosure and will have the following meanings in the context of the ongoing disclosure.
[0058] The term “tokens” may refer to data structures of discrete units or symbols of meaning that represent including, but not limited to, words, phrases or linguistic elements, in a corpus of words.
[0059] The various embodiments throughout the disclosure will be explained in more detail with reference to FIGs. 1-8.
[0060] FIG. 1 illustrates an exemplary network architecture (100) in which or with which a proposed system may be implemented, in accordance with an embodiment of the present disclosure.
[0061] Referring to FIG. 1, the network architecture (100) includes a system (108) for query reformulation using query understanding, in accordance with an embodiment of the present disclosure. As illustrated, the system (108) may be equipped with a search engine (110) for retrieving a set of search artifacts based on query reformulated using query understanding by the system (108), and transmitting the retrieved set of search artifacts to users (102-1, 102-2…102-N) (individually referred to as the user (102) and collectively referred to as the users (102)) associated with one or more computing devices (104-1, 104-2…104-N) (individually referred to as the computing device or user equipment (UE) (104) and collectively referred to as the computing devices or UEs (104)). The search engine (110) may be communicatively coupled to the one or more computing devices (104). The search engine (110) may be coupled to a centralized server (112). The centralized server (112) may also be operatively coupled to the one or more computing devices (104) through a communication network (106). In an embodiment, the search engine (110) may be embedded into the system (108).
[0062] The computing devices (104) may be referred to as a UE. Those with ordinary skill in the art will appreciate that the terms “computing device(s)” and “UEs” may be used interchangeably throughout the disclosure. Although three computing devices (104) are depicted in FIG. 1, however, any number of computing devices (104) may be included without departing from the scope of the ongoing description. In an embodiment, the one or more computing devices (104) may include, but not limited to, any electrical, electronic, electro-mechanical or an equipment or a combination of one or more of the above devices such as mobile phone, smartphone, Virtual Reality (VR) devices, Augmented Reality (AR) devices, laptop, a general-purpose computer, desktop, personal digital assistant, tablet computer, mainframe computer, or any other computing device. In an embodiment, the computing device (104) may include one or more in-built or externally coupled accessories including, but not limited to, a visual aid device such as camera, audio aid, a microphone, a keyboard, input devices for receiving input from a user such as touch pad, touch enabled screen, electronic pen, receiving devices for receiving any audio or visual signal in any range of frequencies and transmitting devices that can transmit any audio or visual signal in any range of frequencies. It may be appreciated by those skilled in the art that the one or more computing devices (104) may not be restricted to the mentioned devices and various other devices may be used.
[0063] In an exemplary embodiment, the communication network (106) may include, by way of example but not limitation, at least a portion of one or more networks having one or more nodes that transmit, receive, forward, generate, buffer, store, route, switch, process, or a combination thereof, etc. one or more messages, packets, signals, waves, voltage or current levels, some combination thereof, or so forth. The communication network (106) may include, by way of example but not limitation, one or more of: a wireless network, a wired network, an internet, an intranet, a public network, a private network, a packet-switched network, a circuit-switched network, an ad hoc network, an infrastructure network, a Public-Switched Telephone Network (PSTN), a cable network, a cellular network, a satellite network, a fiber optic network, or some combination thereof.
[0064] In another exemplary embodiment, the centralized server (112) may include or comprise, by way of example but not limitation, one or more of: a stand-alone server, a server blade, a server rack, a bank of servers, a server farm, hardware supporting a part of a cloud service or system, a home server, hardware running a virtualized server, one or more processors executing code to function as a server, one or more machines performing server-side functionality as described herein, at least a portion of any of the above, some combination thereof. In an embodiment, the system (108) may be coupled to the centralized server (112). In another embodiment, the centralized server (112) may also be operatively coupled to the computing devices (104). In some implementations, the system (108) may be associated with the centralized server (112).
[0065] In an embodiment, the system (108) may be configured to receive one or more input tokens from the user (102). In an embodiment, the system (108) may receive the search query via a first set of data packets transmitted from the UE (104).
[0066] In an embodiment, the system (108) may extract one or more attributes associated with each of the one or more input tokens from a knowledge graph (e.g., 402, as shown in FIG. 4) to identify the search intent of the user (102). In an embodiment, the search intent may be represented by the one or more attributes associated with each of the input tokens. In an embodiment, the one or more attributes of the tokens may include, but not limited to, named entity tags, one or more associated word-senses, named entity tags, dependency tags, part-of-speech tags, word embeddings, morphological features, translation and transliteration in a plurality of languages, and semantic and linguistic relationships between each of the one or more input tokens and corresponding one or more related tokens. In an embodiment, the one or more attributes may be used to disambiguate the meaning of the tokens used by the user (102).
[0067] In an embodiment, to identify the search intent from the search query, the system (108) may generate one or more n-grams of the one or more input tokens in the search query. In an embodiment, the ‘n’ in the n-gram may be a predetermined integer indicative of the context size of each n-gram. In an embodiment, the system (108) may disambiguate the one or more attributes corresponding to each of the one or more input tokens from the context provided in the one or more n-grams having said one or more input tokens. In an example, the input token ‘apple’ can be fruit in one context (grocery) and an electronic company in another context (electronics). In such example, other input tokens neighbouring the input token ‘apple’ in the n-gram may allow for disambiguation of the aforementioned word sense attributes for ‘apple.’ If the n-gram having the input token ‘apple’ is indicative of ‘apple is juicy,’ the system (108) may disambiguate the search intent of the user (102) when using the input token ‘apple’ to refer to the first word-sense.
[0068] In another aspect, the one or more attributes from each of the n-grams of input tokens may be disambiguated using a Machine Learning/Artificial Intelligence (ML/AI) engine. In an embodiment, the ML engine may automatically perform detection of including, but not limited to, synonyms, hypernyms, and semantically ‘nearby’ queries to generate the one or more candidate queries. In an embodiment, by disambiguating the one or more attributes of corresponding input tokens, the system (108) may be able to identify the search intent of the user (102). In an embodiment, the ML engine may identify the search intent of the search queries based on the context within which the one or more input tokens were used. In other embodiments, the one or more attributes may be disambiguated using including, but not limited to, supervised, semi-supervised, or rule-based language models.
[0069] In an embodiment, the one or more related tokens may be retrieved from a knowledge graph (402). In an embodiment, the one or more related tokens may be retrieved from the knowledge graph (402) based on the identified search intent. In an embodiment, the one or more related tokens may be semantically related to the one or more input tokens. In an embodiment, the one or more input tokens may also have corresponding tokens in the knowledge graph (402). In an embodiment, the system (108) may retrieve the one or more related tokens corresponding to each of the disambiguated input tokens from a knowledge graph (402).
[0070] In an embodiment, the knowledge graph (402) may be a graph data structure with one or more nodes having the one or more attributes thereto. In an embodiment, each node may correspond to the one or more input tokens and the one or more related tokens. In an embodiment, two or more of the nodes may be connected via an arc signifying semantic and linguistic relationship there between. In an embodiment, each node in the knowledge graph (402) may be indicative of one or more tokens representing words, phrases, and linguistic elements of a natural language. In an embodiment, the knowledge graph (402) may be indicative of including, but not limited to, WordNet, Freebase, and the like. In an embodiment, sematic relation between the input tokens and the related tokens may include, but not limited to, synonymy, antonymy, hypernymy, hyponymy, polysemy, meronymy, holonymy, and the like. In an embodiment, the one or more tokens in the knowledge graph (402) may include the one or more input tokens and the one or more related tokens. In an embodiment, each of the tokens in the knowledge graph (402) may be assigned the one or more attributes, including, but not limited to, one or more associated word senses, morphological features, translation and transliteration in a plurality of languages, syntactic features, semantic features, parts of speech tags, semantic and linguistic relationships there between, and the like.
[0071] In an embodiment, the system (108) may generate one or more candidate queries. In an embodiment, the one or more candidate queries may be generated by performing one or more edit operations on the search queries based on the identified search intent. In an embodiment, to generate the one or more candidate queries, the system (108) may perform any one or a combination of including, but not limited to, by adding one or more of the retrieved related tokens to the search query, deleting one or more of the input tokens from the search query, or substituting the one or more input tokens with the one or more corresponding related tokens based on the identified search intent.
[0072] In an embodiment, the one or more related tokens retrieved for generating the one or more candidate queries may either have a hypernymous semantic relation or a hyponymous semantic relation with the corresponding input token. In an embodiment, the system (108) may assign a lower restrictiveness value to the one or more candidate queries generated using the hypernymous related tokens when compared with the one or more candidate queries generated using the hyponymous related tokens. In an embodiment, the restrictiveness value may be determined based on a distance value and a direction value between the retrieved related tokens from the corresponding input tokens in the knowledge graph (402). In an example, when the knowledge graph (402) is indicative of a tree data structure having a hierarchy of tokens, the distance value may correspond to the path distance between two or more tokens, and the ancestry direction value may correspond to whether the related token is an ancestor or descendant of the corresponding input token. In an embodiment, the direction value may indicate whether the one or more related tokens have a hypernymous or hyponymous semantic relation with the corresponding one or more input tokens. In an embodiment, the system (108) may select the one or more related tokens having either the hypernymous or hyponymous semantic relation with the corresponding one or more input tokens based on the search intent identified for the search query to generate the one or more candidate queries.
[0073] In an embodiment, the restrictiveness value may indicative of how broad or narrow each of the candidate queries may be. In an embodiment, a low restrictiveness value may indicate the candidate query generated is broad or composed of the one or more hypernymous related tokens. In such embodiments, the candidate query may return higher number of search artifacts when executed by the search engine (110) when compared to the search query provided by the user (102). In an embodiment, a high restrictiveness value may indicate the candidate query generated is narrow or composed of the one or more hyponymous related tokens. In such embodiments, the candidate query may return lower number of search artifacts when executed by the search engine (110), when compared to the search query provided by the user (102).
[0074] In an embodiment, the system (108) may execute the one or more candidate queries on the search engine (110) to retrieve a search result having a set of search artifacts from a database (210). In an embodiment, the set of search artifacts may include, but not be limited to, Uniform Resource Locations (URLs), images, videos, text documents, indices of databases, and the like, that are retrieved by the search engine (110) for the executed candidate query. In an embodiment, the search engine (110) may include, but not be limited to, crawlers, indexers, human-powered engines, searchable databases, and the like.
[0075] In an embodiment, the system (108) may also be configured to assign a similarity value to each of the one or more candidate queries based on similarity between the candidate queries and the search query. Further, the system (108) may rank each search artifact in the set of search artifacts retrieved in each search result corresponding to the one or more candidate queries based on a relevance value determined using the similarity value assigned to said one or more candidate queries. In an embodiment, the relevance value may also be determined using including, but not limited to, keyword prominence, domain authority, user search history, and the like. In an embodiment, the relevance value may indicate relative importance of each search artifact in the set of search artifacts retrieved in each search result corresponding to the one or more candidate queries. In an embodiment, the system (108) may be configured to transmit a second set of data packets having the set of search artifacts obtained from execution of the one or more candidate queries to the computing device or UE (104) associated with the user (102). In an embodiment, the UE (104) may receive the second set of data packets, and render the one or more search artifacts on a graphical user interface associated with said UE (104) based on a rank assigned to said one or more search artifacts. In an embodiment, the one or more search artifacts may be rendered in descending order of the corresponding relevance value.
[0076] In an embodiment, the system (108) may generate the one or more candidate queries with the objective to increase recall or increase precision of search result obtained therefrom. Increasing recall may allow the search engine (110) to return a larger set of relevant results. In an embodiment, the search query may be reformulated to generate the one or more candidate queries using any one or combination of query expansion and query relaxation. Query reformulation can also be used to increase precision, thereby reducing the number of irrelevant results. While the increasing recall may be used for avoiding small or empty result sets, increasing precision may be used for queries that would otherwise return large, heterogeneous result sets. It may be appreciated by those in the art that the system (108) may be suitably adapted for applications requiring differing amounts of accuracy and precision.
[0077] In an aspect, query reformulation may refer to a set of techniques, where the search query may be reformulated into the one or more candidate queries to increase query retrieval performance. Query reformulation may be a useful technique both when the underlying product database is noisy and when the original query does not fully express (or over-conditions) the search intent.
[0078] In another aspect, query reformulation may be used to enhance operations including, but not limited to, search, discovery, personalization, and the like. In an embodiment, the system (108) may reformulate the search queries to generate the one or more candidate queries by either expanding the query scope or narrowing down the query scope depending on the use case.
[0079] In an embodiment, expanding the search query may cast a wider net for results that are relevant but do not match the query terms exactly. In an example, query reformulation by the system (108) may broaden the query by adding additional phrases or replacing existing phrases with new ones. These additional tokens may be related to the original query terms as synonyms, abbreviations, aliases, etc. In other examples, instead of adding tokens to the search query, the system (108) may delete one or more of the input tokens or the related tokens from candidate queries. By deleting tokens from the query, the restrictiveness value of the reformulated candidate tokens may reduce, thereby increasing recall. In an embodiment, the system (108) may only remove tokens from the candidate queries whose attributes do not align with the identified search intent.
[0080] In an embodiment, the system (108) may be configured to dynamically reformulate the candidate queries based on the number of search artifacts returned during execution of said candidate queries. In an embodiment, the system (108) may, while the number of search artifacts returned in the search query is less than a predetermined hits threshold, reformulate the one or more candidate queries by either deleting one or more of the related tokens from said candidate queries or substituting one or more of the related tokens with the corresponding input tokens in the candidate queries so as to decrease the restrictiveness value between the candidate queries. Further, the reformulating may also reduce the restrictiveness value of the candidate queries such that the number of search artifacts returned on execution of said candidate queries is increased. In such embodiments, the system (108) may relax the restrictiveness of the candidate queries. In an embodiment, the system (108) may execute the reformulated candidate queries to obtain the corresponding search results. In such embodiments, the system (108) may repeatedly reformulate and execute the one or more candidate queries until the number of search artifacts returned therefrom are above the predetermined hits threshold, thereby allowing the system (108) to gracefully fall back to less specific queries. In an embodiment, the system (108) may remove one or more tokens from the candidate queries in the order of occurrence.
[0081] In an embodiment, query reformulations may semantically narrow or broaden the scope of the candidate queries. In an embodiment, the system (108) may choose which type of reformulation may be suitable based on the search intent identified from the search query. Further, by reformulating the candidate queries based on the number of search artifacts returned, the system (108) may allow for graceful fallback and continue to provide the user (102) with relevant results.
[0082] Although FIG. 1 shows exemplary components of the network architecture (100), in other embodiments, the network architecture (100) may include fewer components, different components, differently arranged components, or additional functional components than depicted in FIG. 1. Additionally, or alternatively, one or more components of the network architecture (100) may perform functions described as being performed by one or more other components of the network architecture (100).
[0083] FIG. 2 illustrates an exemplary block diagram (200) of the proposed system for query reformulation using query understanding, in accordance with an embodiment of the present disclosure.
[0084] In an aspect, the system (108) may include one or more processor(s) (202). The one or more processor(s) (202) may be implemented as one or more microprocessors, microcomputers, microcontrollers, edge or fog microcontrollers, digital signal processors, central processing units, logic circuitries, and/or any devices that process data based on operational instructions. Among other capabilities, one or more processor(s) (202) may be configured to fetch and execute computer-readable instructions stored in a memory (204) of the system (108). The memory (204) may be configured to store one or more computer-readable instructions or routines in a non-transitory computer-readable storage medium, which may be fetched and executed to create or share data packets over a network service. The memory (204) may comprise any non-transitory storage device including, for example, volatile memory such as Random-Access Memory (RAM), or non-volatile memory such as Erasable Programmable Read-Only Memory (EPROM), flash memory, and the like.
[0085] In an embodiment, the system (108) may include an interface(s) (206). The interface(s) (206) may comprise a variety of interfaces, for example, interfaces for data input and output devices, referred to as I/O devices, storage devices, and the like. The interface(s) (206) may facilitate communication of the system. The interface(s) (206) may also provide a communication pathway for one or more components of the system (108). Examples of such components include, but are not limited to, processing unit/engine(s) (208) and a database (210).
[0086] In an embodiment, the processing engine(s) (208) may be implemented as a combination of hardware and programming (for example, programmable instructions) to implement one or more functionalities of the processing engine(s) (208). In examples described herein, such combinations of hardware and programming may be implemented in several different ways. For example, the programming for the processing engine(s) (208) may be processor executable instructions stored on a non-transitory machine-readable storage medium and the hardware for the processing engine(s) (208) may comprise a processing resource (for example, one or more processors), to execute such instructions. In the present examples, the machine-readable storage medium may store instructions that, when executed by the processing resource, implement the processing engine(s) (208). In such examples, the system may comprise the machine-readable storage medium storing the instructions and the processing resource to execute the instructions, or the machine-readable storage medium may be separate but accessible to the system (108) and the processing resource. In other examples, the processing engine(s) (208) may be implemented by electronic circuitry.
[0087] In an embodiment, the processing engine(s) (208) may include one or more engines selected from any of a query understanding engine (212), a query reformulation engine (214), the search engine (110), and other engines/units (216). The processing engine(s) (208) may further perform edge-based micro service event processing but not limited to the like. In an embodiment, the search engine (110) may be embedded into the system (108) within the processing engines (208). In other embodiments, the search engine (110) may be configured outside the system (108), wherein the search engine (110) may be in communication with the system (108) though a communicating network (e.g., 106). In an embodiment, the other engines (216) may include an AI/ML engine. The functioning of each of the processing engines (208) are explained with reference to a query reformulation pipeline (300) as shown in FIG. 3.
[0088] FIG. 3 illustrates an exemplary query reformulation pipeline (300) of the proposed system (108), in accordance with an embodiment of the present disclosure. The operation of the processing engines (208) of the system (108) is represented in FIG. 3.
[0089] Referring to FIG. 3, the query reformulation pipeline (300) includes various components such as the UE (104), the query understanding engine (212), the query reformulation engine (214) including a query expansion engine (302) and a query relaxation engine (304), the search engine (110), and the user (102).
[0090] In an embodiment, based on receiving the search query from the user (102) via the UE (104), the query understanding engine (212) may identify the search intent by retrieving one or more attributes associated with the one or more input tokens from a knowledge graph (e.g., 402). Once the search intent is identified, the query expansion engine (302) may generate the one or more candidate by reformulating the search query.
[0091] In an embodiment, the query expansion engine (302) may be configured to add the one or more related tokens to the search query, or substitute the one or more input tokens in the search query with the corresponding related tokens to generate the one or more candidate queries. In an embodiment, the one or more candidate queries may either broaden or narrow the scope of the search. In an example, the input tokens may be substituted by the corresponding related tokens that have a hypernymous semantic relation therewith to broaden the scope of the search. In other examples, the input tokens may be substituted by the corresponding related tokens that have a hyponymous semantic relation therewith to broaden the scope of the search. In an example, the query expansion engine (302) may reformulate the search query to generate the one or more candidate queries with an optimization objective to increase recall compared to the search query. In an embodiment, search results obtained using the one or more candidate queries may be assigned a lower relevance value compared to the relevance value for the search results obtained using search query.
[0092] In an embodiment, similarity value assigned to each of the candidate queries may be based on the similarity between said candidate query and the search query. In an embodiment, the similarity value may be determined using including, but not limited to, edit distance, Jaccard similarity, cosine similarity, latent semantic indexing, Latent Dirichlet Allocation (LDA), and the like. In an embodiment, the candidate queries having higher similarity value may be assigned higher weights when compared to candidate queries having lower similarity value. For instance, in FIG. 3, the products that come due to “latest v-neck blue GHI brand top” are ranked lower than the products that come due to “latest v-neck blue JKL brand top.” In an embodiment, the similarity value assigned to each of the one or more candidate queries may be used to rank the search artifacts obtained on execution of the corresponding candidate query.
[0093] In an aspect, the search engine (110) may execute the one or more candidate queries to return the search results. In an embodiment, the search results may be indicative of a set of search artifacts returned for the executed query by the search engine (110). In an embodiment, the search engine (110) may be coupled to store and retrieve search artifacts in the database (210). In an example, the search artifacts may be indicative of products made available on an e-commerce website.
[0094] In an embodiment, when the number of search artifacts in the search result are greater than a predetermined hits threshold, the search result may be transmitted to the user (102). In an embodiment, the predetermined hits threshold may be configurable based on the one or more attributes of the one or more input tokens in the search query. In an embodiment, the predetermined hits threshold may be dynamically reconfigured by the system (108) based on including, but not limited to, complexity of the search query, the relevance of the search results, and the like. In an embodiment, the search results may be transmitted as a second set of data packets to the computing device or UE (104) associated with the user (102). In an embodiment, the search artifacts in the search result may be re-ranked based on the similarity value assigned to the corresponding candidate value.
[0095] In an embodiment, when the number of search artifacts in the search query is less than the predetermined hits threshold, the query relaxation engine (304) may be actuated. In an embodiment, the query relaxation engine (304) may reformulate the one or more candidate queries by deleting the one or more related tokens from said candidate queries or substituting the one or more related tokens with the corresponding input tokens in the candidate queries, thereby decreasing the restrictiveness value of the candidate query. In an embodiment, the reformulated candidate query may also have increased similarity value with the search query.
[0096] In an embodiment, the query relaxation engine (304) may be configured to reformulate the one or more candidate queries such that the similarity value increased. In an embodiment, the one or more candidate queries reformulated by the query relaxation engine (304) may be executed by the search engine (110) to obtain the corresponding search results. In an embodiment, the system (108) may repeat executing the query reformulation engine (214) until the number of search artifacts in the search results are greater than the predetermined hits threshold.
[0097] In an example shown in FIG. 3, the query reformulation engine (214) may reformulate the search query as follows:
latest v-neck blue JKL brand top -> latest blue GHI top -> blue GHI top -> GHI top -> top
[0098] FIG. 4 illustrates a graphical representation (400) of a knowledge base and query reformulation therewith, in accordance with an embodiment of the present disclosure.
[0099] In the foregoing example, the search query may be indicative of “blue polka-dot dress on sale.” The system (108) may receive the one or more input tokens associated with the search query as [“blue”, “polka-dot”, “dress”, “on”, sale”]. The system (108) may then disambiguate the word-sense attribute associated with each of the input tokens by determining the contextual meaning from the one or more n-grams of said input tokens. For instance, word sense of the input token “sale” is determined by its contextual meaning in the bi-gram “on sale.” Here, the word-sense attribute associated with the input token “sale” relates to promotion or provision of discount, and not to the quantity of goods sold. The search intent is then identified based on the disambiguated word-sense. Accordingly, the system (108) retrieves the one or more related tokens that are semantically related to the input token “sale” based on the identified search intent. In the foregoing example, the related token corresponding to the input token “sale” may include “promotion,” and tokens related thereto including, but not limited to “discount,” “clearance,” and the like.
[00100] Thereon, the system (108) may generate the one or more candidate queries. In the foregoing example, the candidate queries include “blue pattern dress,” “spotted dress,” and the like.
[00101] In another example, the search query may be reformulated to generate the one or more candidate queries that may be broader or narrower than the search query as follows:
Disambiguating word sense and identifying search intent
a. “red pant” => [“crimson pant”, “maroon pant”]
Generating Narrow Candidate query
b. “ABC brand computer accessories” => [“ABC brand keyboard”]
Generating Broad Candidate query
c. “DEF brand tablet” => [“tablet”]
[00102] In an embodiment, the system (108) may be configured to reformulate queries such that the one or more candidate queries return greater number of search results can be compared to the search query provided by the user (102). In an example, query reformulation may be performed for increasing recall, that is when the number of search results for the search query is zero or below the predetermined hits threshold. The query reformulation may be performed after query understanding such that enriched output of query understanding is available beforehand based on the reformulation strategy selected by the system (108). In an embodiment, the search query may be reformulated using the one or more reformulation strategies to increase the number of relevant search results by broadening or narrowing the queries.
[00103] In an example, query expansion may be used in information retrieval to increase the number of relevant search results (“recall”) by adding or substituting the one or more related tokens in the search query retrieved from the knowledge graph (402). In an example, hypernyms of the input token may be used for generating one or more candidate queries that are broader than the search query. In another example, hyponyms of the input token may be used to generate one or more candidate queries that are narrower than the search query. In such examples, the related tokens may be included to generate the candidate query such that the search intent identified from the search query is preserved while returning search results with search artifacts with increased recall. In an embodiment, query relaxation may enable the system (108) to execute the most restrictive version of a query first, progressively relaxing the query until the required number of hits is obtained.
[00104] In another aspect, the system (108) combines query expansion and query relaxation for query reformulation, and provides the one or more candidate queries to the search and discovery engine (110).
[00105] In an embodiment, the knowledge graph (402) may be stored in the database (210). In an embodiment, the database (210) may be housed within the system (108). Alternatively, or additionally, the database (210) may be similar to a server, such as the server (112) of FIG. 1. In an embodiment, the database (210) may also be used for temporary and long-term storage of data associated with knowledge graph (402). In an embodiment, the database (210) may be queried to retrieve the one or more related tokens corresponding to the one or more input tokens, and the disambiguated attributes thereof.
[00106] FIG. 5A illustrates an exemplary flow chart (500A) of the proposed method for enabling contextual narrowing/broadening of a search query, in accordance with an embodiment of the present disclosure. FIG. 5B represents an exemplary representation (500B) for narrowing or broadening reformulation of the search query, in accordance with an embodiment of the present disclosure.
[00107] In an embodiment, the query expansion engine (302) may include broadening or narrowing related tokens in the search query based on the search intent identified by the query understanding engine (212). In an embodiment, the query expansion engine (302) may decide to broaden or narrow the scope in the run time.
[00108] In an embodiment, the query expansion engine (302) may receive the candidate queries at step (502). In an embodiment, the query expansion engine (302) may, based on recall of the search results corresponding to the search query determined at step (504), choose ‘broadening’ or ‘narrowing’ related tokens to generate the one or more candidate queries. At step (506), the query expansion engine (302) may choose to include the ‘broadening’ related tokens in the candidate queries if the recall of the search result is less than a predetermined recall threshold range for the search query. At step (508), the query expansion engine (302) may choose the narrowing related tokens in the candidate queries if the recall of the search result is greater than a predetermined recall threshold range for the search query.
[00109] In an example, suppose the search query is ‘Capital of India.’ In such examples, the system (108) may generate a candidate query indicative of ‘Capital of Asia’ that use ‘broadening’ relevant tokens may produce a recall that is greater than the predetermined recall threshold range. Furthermore, in such examples, the candidate query may deviate from the search intent identified from the search query. Hence, the system (108) may choose narrowing related tokens instead of broadening related tokens for generating the one or more candidate queries. Similarly, when the search query is indicative of ‘tangerines,’ the candidate queries may be generated using broadening related tokens such as ‘Citrus Fruits.’
[00110] In another example, where search query is indicative of ‘ABC established XYZ,’ the input token ‘XYZ’ may be a hyponym of the relevant token ‘Company’. As depicted in FIG. 5B, the system (108) may choose the broadening related tokens ‘Company’ to generate the candidate query ‘ABC established a company.’ In the example shown in FIG. 5B, the relevant token may also be chosen based on the linguistic relation between the one or more input token, i.e. dependency parsing between the input token ‘establish’ and the relevant token ‘Company’ being ‘obj.’ FIG. 5B further shows the mapping of various semantic and linguistic relations where specific form of query reformulation is to be performed.
[00111] In other examples, the search query may be indicative of ‘Reptiles have scales.’ In such examples, the system (108) may choose narrowing related tokens. For instance, the query expansion engine (302) may generate the candidate query indicative of ‘Crocodiles have scales,’ as the input token ‘Reptile’ may be a hypernym of the relevant token ‘crocodile.’ In such examples, the broadening related tokens (such as ‘animals’ which are hypernyms of ‘Reptiles’) may not be appropriate as it may not retain the identified search intent. The candidate query ‘animals have scales’ may not retain the search intent identified from the search query. Thus, the system (108) may be able to choose the related tokens to generate the one or more candidate tokens such that the search intent identified are retained.
[00112] The following indicate other examples where the system (108) may determine the strategy chosen for generating the one or more candidate queries based on the search intent:
Actor name movies ? Actor name movies| Actor son name movies (narrowing tokens work)
National Anthem ? State/County Anthem (narrowing tokens fail)
Tuberculosis medicine ? ? respiratory disease medicine (broadening tokens fail)
Almond nuts ? dry fruit nuts (approximate broadening when number of search artifacts in search results for the search query is less than predetermined hits threshold)
A1B2 brand whole wheat sandwich large bread => A1B2 brand whole wheat sandwich bread => A1B2 whole wheat bread => whole wheat bread
Brown leather high backrest love seat => leather high backrest love seat => leather love seat.
[00113] FIGs. 6-7 illustrate exemplary representations (600A, 600B) for contextual query reformulation in which or with which the proposed system may be implemented, in accordance with an embodiment of the present disclosure.
[00114] FIG. 6 illustrates an exemplary representation (600A) of contextual query reformulation in a search engine (110) associated with an e-commerce website. In an example shown in FIG. 6, the user (102) may provide a search query indicative of “vellam.” In an embodiment, the search query may contain one or more input tokens indicative of words and/or phrases in more than one language. In an embodiment, the query reformulation engine (214) may be configured to retrieve, for an input token in a first language, one or more corresponding related tokens in a second language. In an example shown in FIG. 6, the query reformulation engine (214) may be configured to retrieve the related token “jaggery” for the given input token “vellam,” where “jaggery” is the English word for the Tamil word “vellam.”
[00115] In an embodiment, the system (108) may be configured to identify the language associated with each of the input tokens, and retrieve translations and/or transliterations of said one or more input tokens. In an embodiment, the one or more attributes in the knowledge graph (402) may include translations and transliterations of each of the tokens, and may be retrieved therefrom. In other embodiments, the translations and the transliterations may be stored in a database (210), and may be retrieved therefrom. In yet other embodiments, as shown in FIG. 6B, the translations and transliterations of the tokens may be retrieved by transmitting requests to external Application Programming Interfaces (APIs), and receiving a response therefrom. In such embodiments, the system (108) may also be able to reformulate the search queries having one or more input tokens in a plurality of languages.
[00116] While FIGs. 6A-6B illustrate the present disclosure in the context of providing search queries with Indic input tokens, it may be appreciated by those skilled in that art that the system (108) may be suitably adapted for applications in providing queries for electronic devices, movies, media, grocery, home, and living scenarios/domains, in a plurality of contexts and platforms without departing from the scope of the present disclosure.
[00117] FIG. 7 illustrates an exemplary computer system (700) in which or with which embodiments of the present disclosure may be implemented.
[00118] As shown in FIG. 7, the computer system (700) may include an external storage device (710), a bus (720), a main memory (730), a read only memory (740), a mass storage device (750), communication port (760), and a processor (770). The processor (770) may include various modules associated with embodiments of the present disclosure. A person skilled in the art will appreciate that the computer system (700) may include more than one processor and communication ports. The communication port(s) (760) may be any of an RS-232 port for use with a modem-based dialup connection, a 10/100 Ethernet port, a Gigabit or 10 Gigabit port using copper or fiber, a serial port, a parallel port, or other existing or future ports. The communication port(s) (760) may be chosen depending on a network, such a Local Area Network (LAN), Wide Area Network (WAN), or any network to which the computer system (700) connects. The main memory (730) may be Random Access Memory (RAM), or any other dynamic storage device commonly known in the art. The read-only memory (740) may be any static storage device(s) e.g., but not limited to, a Programmable Read Only Memory (PROM) chips for storing static information e.g., start-up or BIOS instructions for the processor (770). The mass storage device (750) may be any current or future mass storage solution, which can be used to store information and/or instructions. Exemplary mass storage solutions include, but are not limited to, Parallel Advanced Technology Attachment (PATA) or Serial Advanced Technology Attachment (SATA) hard disk drives or solid-state drives (internal or external, e.g., having Universal Serial Bus (USB) and/or Firewire interfaces), one or more optical discs, Redundant Array of Independent Disks (RAID) storage, e.g. an array of disks (e.g., SATA arrays).
[00119] The bus (720) may communicatively couple the processor(s) (770) with the other memory, storage, and communication blocks. The bus (720) may be, e.g., a Peripheral Component Interconnect (PCI) / PCI Extended (PCI-X) bus, Small Computer System Interface (SCSI), Universal Serial Bus (USB) or the like, for connecting expansion cards, drives and other subsystems as well as other buses, such a front side bus (FSB), which connects the processor (770) to software system (700).
[00120] Optionally, operator and administrative interfaces, e.g. a display, keyboard, and a cursor control device, may also be coupled to the bus (420) to support direct operator interaction with the computer system (400). Other operator and administrative interfaces may be provided through network connections connected through the communication port(s) (460). In no way should the aforementioned exemplary computer system (400) limit the scope of the present disclosure.
[00121] While considerable emphasis has been placed herein on the preferred embodiments, it will be appreciated that many embodiments can be made and that many changes can be made in the preferred embodiments without departing from the principles of the invention. These and other changes in the preferred embodiments of the invention will be apparent to those skilled in the art from the disclosure herein, whereby it is to be distinctly understood that the foregoing descriptive matter to be implemented merely as illustrative of the invention and not as limitation.
ADVANTAGES OF THE PRESENT DISCLOSURE
[00122] The present disclosure provides a system and a method for automatically transforming search queries in order to better represent the searcher’s intent.
[00123] The present disclosure predicts the true intent of query.
[00124] The present disclosure facilitates query reformulation by using query understanding, which is cost-effective, time-effective, and adapted widely.
[00125] The present disclosure eliminates outdated search techniques, which are cumbersome, time consuming, and involves considerable user interaction.
[00126] The present disclosure eliminates the occurrence of complexity in understanding user’s query.
[00127] The present disclosure provides optimized, dynamic, and flexible communication services for diversified application scenarios in the future.
,CLAIMS:1. A system (108) for query reformulation using query understanding, the system (108) comprising:
one or more processors (202); and
a memory (204) operatively coupled to the one or more processors (202), wherein the memory (204) comprises processor-executable instructions, which on execution, cause the one or more processors (202) to:
receive a first set of data packets indicative of a search query having one or more input tokens from a computing device (104) associated with a user (102);
extract one or more attributes associated with each of the one or more input tokens from a knowledge graph (402) to identify a search intent of the user (102);
retrieve one or more related tokens corresponding to each of the one or more input tokens from the knowledge graph (402) based on the identified search intent, the one or more related tokens being semantically related to the one or more input tokens;
generate one or more candidate queries by performing one or more edit operations on the search query based on the identified search intent; and
execute the one or more candidate queries on a search engine (410) to retrieve a search result having a set of search artifacts from a database (210).
2. The system (108) as claimed in claim 1, wherein the one or more processors (202) are configured to:
assign a similarity value to each of the one or more candidate queries based on similarity between the one or more candidate queries and the search query; and
rank each search artifact in the set of search artifacts retrieved in each search result corresponding to the one or more candidate queries based on a relevance value determined using the similarity value assigned to said one or more candidate queries.
3. The system (108) as claimed in claim 1, wherein to identify the search intent from the search query, the one or more processors (202) are configured to:
generate one or more n-grams of the one or more input tokens in the search query, wherein n is a predetermined integer indicative of a context size of each n-gram; and
disambiguate the one or more attributes corresponding to each of the one or more input tokens from a context provided in the one or more n-grams having said one or more input tokens.
4. The system (108) as claimed in claim 1, wherein to generate the one or more candidate queries, the one or more processors (202) are configured to perform any one or a combination of:
adding one or more of the retrieved related tokens to the search query;
deleting one or more of the input tokens from the search query; and
substituting one or more of the input tokens in the search query with the corresponding related tokens, wherein the one or more related tokens are either added or substituted based on the search intent.
5. The system (108) as claimed in claim 1, wherein the one or more related tokens either have a hypernymous semantic relation or a hyponymous semantic relation with the corresponding one or more input tokens, wherein the one or more candidate queries generated using hypernymous related tokens are assigned lower restrictiveness value compared to the one or more candidate queries generated using hyponymous related tokens, the restrictiveness value being determined based on a distance value and a direction value between the one or more related tokens from the corresponding one or more input tokens in the knowledge graph (402), and wherein the one or more processors (202) are configured to select the one or more related tokens having either the hypernymous or hyponymous semantic relation with the corresponding one or more input tokens based on the search intent identified for the search query to generate the one or more candidate queries.
6. The system (108) as claimed in claim 1, wherein when a number of search artifacts returned in the search result is less than a predetermined hits threshold, the one or more processors (202) are configured to:
reformulate the one or more candidate queries by either deleting one or more of the related tokens from the one or more candidate queries or substituting one or more of the related tokens with the corresponding one or more input tokens in the one or more candidate queries to decrease a restrictiveness value between the one or more candidate queries; and
execute the reformulated one or more candidate queries to obtain the corresponding search results.
7. The system (108) as claimed in claim 1, wherein the one or more attributes of the one or more input tokens and the one or more related tokens in the knowledge graph (402) comprise any one or combination of: named entity tags, one or more associated word-senses, named entity tags, dependency tags, part-of-speech tags, word embeddings, morphological features, translation and transliteration in a plurality of languages, and semantic and linguistic relationships between each of the one or more input tokens and the corresponding one or more related tokens.
8. A method for query reformulation using query understanding, the method comprising:
receiving, by a processor (202), a first set of data packets indicative of a search query having one or more input tokens from a computing device (104) associated with a user (102);
extracting, by the processor (202), one or more attributes associated with each of the one or more input tokens from a knowledge graph (402) to identify a search intent of the user (102);
retrieving, by the processor (202), one or more related tokens corresponding to each of the one or more input tokens from the knowledge graph (402) based on the identified search intent, the one or more related tokens being semantically related to the one or more input tokens;
generating, by the processor (202), one or more candidate queries by performing one or more edit operations on the search query based on the identified search intent; and
executing, by the processor (202), the one or more candidate queries on a search engine (410) to retrieve a search result having a set of search artifacts from a database (210).
9. The method as claimed in claim 8, comprising:
assigning, by the processor (202), a similarity value to each of the one or more candidate queries based on similarity between the one or more candidate queries and the search query; and
ranking, by the processor (202), each search artifact in the set of search artifacts retrieved in each search result corresponding to the one or more candidate queries based on a relevance value determined using the similarity value assigned to said one or more candidate queries.
10. The method as claimed in claim 8, wherein identifying the search intent from the search query comprises:
generating, by the processor (202), one or more n-grams of the one or more input tokens in the search query, wherein n is a predetermined integer indicative of a context size of each n-gram; and
disambiguating, by the processor (202), the one or more attributes corresponding to each of the one or more input tokens from a context provided in the one or more n-grams having said one or more input tokens.
11. The method as claimed in claim 8, wherein generating the one or more candidate queries by performing the one or more edit operations comprises any one or combination of:
adding, by the processor (202), one or more of the retrieved related tokens to the search query;
deleting, by the processor (202), one or more of the input tokens from the search query; and
substituting, by the processor (202), one or more of the input tokens in the search query with the corresponding one or more related tokens, wherein the one or more related tokens are either added or substituted based on the search intent.
12. The method as claimed in claim 8, wherein the one or more related tokens retrieved either have a hypernymous semantic relation or a hyponymous semantic relation with the corresponding one or more input tokens, wherein the one or more candidate queries generated using hypernymous related tokens are assigned lower restrictiveness value compared to the one or more candidate queries generated using hyponymous related tokens, the restrictiveness value being determined based on a distance value and a direction value between the retrieved one or more related tokens from the corresponding one or more input tokens in the knowledge graph (402), and wherein the method comprises selecting the one or more related tokens having either the hypernymous or hyponymous semantic relation with the corresponding one or more input tokens based on the search intent identified for the search query to generate the one or more candidate queries.
13. The method as claimed in claim 8, wherein when a number of search artifacts returned in the search result is less than a predetermined hits threshold, the method comprises:
reformulating, by the processor (202), the one or more candidate queries by either deleting one or more of the related tokens from said one or more candidate queries or substituting one or more of the related tokens with the corresponding one or more input tokens in the one or more candidate queries to decrease a restrictiveness value between the one or more candidate queries; and
executing, by the processor (202), the reformulated one or more candidate queries to obtain the corresponding search results.
14. The method as claimed in claim 8, wherein the one or more attributes of the one or more input tokens and the one or more related tokens in the knowledge graph (402) comprise any one or combination of: named entity tags, one or more associated word-senses, named entity tags, dependency tags, part-of-speech tags, word embeddings, morphological features, translation and transliteration in a plurality of languages, and semantic and linguistic relationships between each of the one or more input tokens and the corresponding one or more related tokens.
15. A user equipment (104), comprising:
one or more processors; and
a memory operatively coupled to the one or more processors, wherein the memory comprises processor-executable instructions, which on execution, cause the one or more processors to:
transmit a first set of data packets indicative of a search query having one or more input tokens to a system (108);
receive a second set of data packets from the system (108) indicative of one or more search artifacts; and
render the one or more search artifacts on a graphical user interface associated with the user equipment (104) based on a rank assigned to said one or more search artifacts.
16. The user equipment (104) as claimed in claim 15, wherein the one or more input tokens in the search query are indicative of any one or combination of: words, phrases, and linguistic elements of expressed in a plurality of languages.
| # | Name | Date |
|---|---|---|
| 1 | 202221031239-STATEMENT OF UNDERTAKING (FORM 3) [31-05-2022(online)].pdf | 2022-05-31 |
| 2 | 202221031239-PROVISIONAL SPECIFICATION [31-05-2022(online)].pdf | 2022-05-31 |
| 3 | 202221031239-POWER OF AUTHORITY [31-05-2022(online)].pdf | 2022-05-31 |
| 4 | 202221031239-FORM 1 [31-05-2022(online)].pdf | 2022-05-31 |
| 5 | 202221031239-DRAWINGS [31-05-2022(online)].pdf | 2022-05-31 |
| 6 | 202221031239-DECLARATION OF INVENTORSHIP (FORM 5) [31-05-2022(online)].pdf | 2022-05-31 |
| 7 | 202221031239-ENDORSEMENT BY INVENTORS [29-05-2023(online)].pdf | 2023-05-29 |
| 8 | 202221031239-DRAWING [29-05-2023(online)].pdf | 2023-05-29 |
| 9 | 202221031239-CORRESPONDENCE-OTHERS [29-05-2023(online)].pdf | 2023-05-29 |
| 10 | 202221031239-COMPLETE SPECIFICATION [29-05-2023(online)].pdf | 2023-05-29 |
| 11 | 202221031239-FORM-8 [30-05-2023(online)].pdf | 2023-05-30 |
| 12 | 202221031239-FORM 18 [30-05-2023(online)].pdf | 2023-05-30 |
| 13 | Abstract1.jpg | 2023-10-28 |
| 14 | 202221031239-FER.pdf | 2025-04-08 |
| 15 | 202221031239-FER_SER_REPLY [06-10-2025(online)].pdf | 2025-10-06 |
| 16 | 202221031239-CLAIMS [06-10-2025(online)].pdf | 2025-10-06 |
| 1 | 202221031239E_18-03-2024.pdf |