A Method And System For Context Aware Query Expansion In A Computing

< Back

A Method And System For Context Aware Query Expansion In A Computing Environment

Abstract: Present disclosure generally relates to field of data processing systems, particularly to method and system for context-aware query expansion in a computing environment. A method includes receiving an input query from users and identifying issue corresponding to low search recall performance and/or short query. Further, method includes segmenting input query into constituent phrases, upon identifying issue, and analyzing vocabulary gap in constituent phrases with respect to pre-defined phrases corresponding to catalogs. Further, method includes selecting constituent phrase comprising vocabulary gap, and retrieving contextual synonymous phrases corresponding to selected constituent phrase, by identifying top-k similar phrases corresponding to the at least one constituent phrase, from a set of mapping phrases with corresponding top-m historical contexts. Further, method includes expanding input query into context-aware query, by concatenating words in contextual synonymous phrases with words in input query.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

21 November 2022

Publication Number

01/2023

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Patent Number

Legal Status

Grant Date

2024-08-23

Renewal Date

Applicants

Flipkart Internet Private Limited

Building Alyssa Begonia & Clover, Embassy Tech Village, Outer Ring Road, Devarabeesanahalli Village, Bengaluru - 560103, Karnataka, India.

Inventors

1. ROHAN KUMAR

Flipkart Internet Private Limited, Building Alyssa Begonia & Clover, Embassy Tech Village, Outer Ring Road, Devarabeesanahalli Village, Bengaluru - 560103, Karnataka, India.

2. SURENDER KUMAR

Flipkart Internet Private Limited, Building Alyssa Begonia & Clover, Embassy Tech Village, Outer Ring Road, Devarabeesanahalli Village, Bengaluru - 560103, Karnataka, India.

3. SAMIR SHAH

Flipkart Internet Private Limited, Building Alyssa Begonia & Clover, Embassy Tech Village, Outer Ring Road, Devarabeesanahalli Village, Bengaluru - 560103, Karnataka, India.

Specification

Description:FIELD OF INVENTION
[0001] The embodiments of the present disclosure generally relate to a field of data processing systems. More particularly, the present disclosure relates to a method and a system for context-aware query expansion in a computing environment.

BACKGROUND
[0002] The following description of the related art is intended to provide background information pertaining to the field of the disclosure. This section may include certain aspects of the art that may be related to various features of the present disclosure. However, it should be appreciated that this section be used only to enhance the understanding of the reader with respect to the present disclosure, and not as admissions of the prior art.
[0003] Generally, in electronic commerce (e-commerce), users with varying levels of literacy and articulation abilities may articulate their queries with text different from what the sellers have uploaded in a product catalog. Both sellers and users can use colloquial and technical terms to describe a product. This results in a low-performing (low or no results) search problem arising from a phenomenon known as vocabulary or articulation gap. For example, a query such as "avengers half-pant" refers to the product listed as "avengers shorts", and the query "pregnancy dress" may be listed as "maternity gown" in the product catalog by the sellers. The problem of articulation gap may be more pronounced in the torso, tail (last two tertiles of the query set divided into 3 equal quantiles by query frequency) query segments, and once only queries. This problem has been addressed using query rewrite or expansion. One way is to generate an entirely new query, which may be known as a query rewrite or replacement for the original query. For example, rewriting "sugar checking machine" to "blood glucose monitor" or "diabetes test strip". More than one query may lead to a heavier load on the search index due to multiple passes for each alternate query. Hence, only one rewritten query may be used in a user path production setup. Another way is to expand the query by adding more keywords/tokens to the original query (optionally) along with Boolean expressions such as "(sugar or diabetes) checking machine".
[0004] The conventional method provides a method of replacing one or two phrases in the query, which may be sufficient to generate well-performing queries. Furthermore, this approach is more index friendly as it involves a single pass on the index. Another conventional method provides a pseudo-relevance feedback method to expand a query by first fetching the documents against a given query. This is followed by extracting top keywords from these documents to append to the original query and retrieve the final list of products. However, in case of null searches (where no products are fetched in the first place) which form a significant volume of user searches, the pseudo-relevance feedback method may not be suitable. The pseudo-relevance feedback method may not be computationally expensive since related terms from retrieved documents rely on an initial retrieval and hence not practical in the user path. Further, a pseudo-relevance feedback method may operate in the catalog vocabulary space, and thus may not be able to capture the variations of the user vocabulary. This may also lead to query drift, as unrelated terms from the products/documents are added to the user query. Further, rewriting the original query to an alternate query may be performed by first identifying a set of well-performing head queries (based on individual query volume and click-through rates) and then mapping an articulation gap query (tail query) to the most similar query from this well-performing set. The coverage of replacement head query for a tail articulation gap query drops drastically in the tail and once only queries, while phrase substitutions cover the entire range of queries. For example, with the tail query to the head query mapping, there may be 20,000 pairs generated offline from one month’s data of tail-to-head queries, while there may be over 15 million unique queries received online in a day.
[0005] Further, another conventional method demonstrated the limitations of a Pseudo-relevance feedback-based query expansion method which primarily operates in the document space. Yet another conventional method discloses generated entirely new queries by learning a query-to-query similarity deep neural model. However, as mentioned earlier, generating entirely new queries may have lower coverage. Another conventional method provides a statistical approach to overcome the above limitations by switching to the user query space and using query reformulations by the users. The phrase-based substitutions may have better coverage than the query-to-query substitution. Besides the co-occurring queries, other implicit user feedback such as a Click-Through Rate (CTR) and co-click-based query graphs may have been used to rewrite queries. However, as mentioned above, articulation gap queries may be typically infrequent and low performing. This results in insufficient implicit user feedback and thus limiting the efficacy of these feedback-based methods. Some conventional methods rely on a statistical co-occurrence-based phrase-to-phrase similarity and not on a semantic deep learning Bidirectional Encoder Representations from Transformers (BERT) based phrase to phrase similarity model. Another conventional method provides a method for creating word embedding using pseudo-relevance feedback. However, due to pseudo relevance, the vocabulary space may be restricted to an electronic commerce (e-commerce) seller catalog, hence does not capture the user vocabularies. Further, the embeddings learned may not be context-free, due to which a term such as “bank” may have the same embedding for both the queries “bank of a river” and “nearest bank to deposit money”. Further, being a 2-stage retrieval, pseudo relevance feedback in the user path may be computationally expensive and hence rarely used in practice.
[0006] Therefore, there is a need for a method and a system for solving the shortcomings of the prior arts, by providing a method and a system for context-aware query expansion in a computing environment.

SUMMARY
[0007] This section is provided to introduce certain objects and aspects of the present invention in a simplified form that are further described below in the detailed description. This summary is not intended to identify the key features or the scope of the claimed subject matter. In order to overcome at least a few problems associated with the known solutions as provided in the previous section, an object of the present invention is to provide a technique that may be for context-aware query expansion in a computing environment.
[0008] It is an object of the present disclosure to provide a method and a system for context-aware query expansion in a computing environment.
[0009] It is another object of the present disclosure to provide a method and a system for selectively expanding portions of the query with synonymous terms to improve the recall search performance of vocabulary gap queries.
[0010] It is another object of the present disclosure to use a Lucene-based search index to search for the matching objects against a given user query (and corresponding augmented variants).
[0011] It is another object of the present disclosure to provide a Bidirectional Encoder Representations from Transformers (BERT) based query phrase expansion.
[0012] It is another object of the present disclosure to provide a method and a system for mining n-gram phrases from user queries based on thresholds on co-location information, thereby improving the matching set of objects beyond the input query text.
[0013] It is another object of the present disclosure to reduce null searches and increase search query Click Through Rate (CTR).
[0014] It is another object of the present disclosure to overcome the two-stage retrieval of pseudo-relevance feedback in the user path, by using a bi-directional context for a given phrase or a term and hence differentiate between queries due to the bi-directional nature of the BERT model which builds a deeper pre-trained transformer variant using only encoders.
[0015] In an aspect, the present disclosure provides a method for context-aware query expansion in a computing environment. The method includes receiving an input query from one or more users associated with an electronic device. Further, the method includes identifying at least one issue corresponding to at least one of a low search recall performance and a short query, in response to the received input query. Further, the method includes segmenting the input query into one or more constituent phrases, upon identifying the at least one issue. Furthermore, the method includes analyzing a vocabulary gap in the one or more constituent phrases with respect to pre-defined phrases corresponding to one or more catalogs. Further, the method includes selecting at least one constituent phrase from the one or more constituent phrases comprising the vocabulary gap. Furthermore, the method includes retrieving one or more contextual synonymous phrases corresponding to the selected at least one constituent phrase, by identifying top-k similar phrases corresponding to the at least one constituent phrase, from a set of mapping phrases with corresponding top-m historical contexts. The top-k similar phrases are identified using at least one of a Bidirectional Encoder Representations from Transformers (BERT) technique, and Approximate Nearest Neighbor (ANN) technique. Further, the method includes expanding the input query into a context-aware query, by concatenating words in the one or more contextual synonymous phrases with words in the input query. The input query is expanded for high search recall performance in the computing environment.
[0016] In an embodiment, segmenting the input query further includes measuring a co-location for each adjacent word pair in the input query. Further, the method includes generating n-gram phrases from the input query based on thresholds associated with the measured co-location. Furthermore, the method includes creating a set of the n-gram phrases. Further, the method includes generating possible query tokens corresponding to the set of the n-gram phrases. Further, the method includes analyzing the query tokens from left among the query tokens, and recursively obtaining a longest segment that exists in the set of the n-gram phrases, for segmenting the input query into one or more constituent phrases.
[0017] In an embodiment, the co-location measurement comprises at least one of, a pointwise mutual information, log-likelihood ratio, token co-occurrence counts, and conditional probabilities.
[0018] In an embodiment, analyzing the vocabulary gap in the one or more constituent phrases further includes creating a metrics set to store a number of objects in the one or more catalogs, that are historically matched for the input query and the historical Click-Through Rates (CTRs), upon segmenting the input query into the one or more constituent phrases. Further, the method includes calculating for each segmented one or more constituent phrases, a size of a set of matching objects with historical contexts in the one or more catalogs.
[0019] In an embodiment, the set of mapping phrases includes phrase-query pairs, which are well-performing phrases comprising high search recall performance.
[0020] In an embodiment, the one or more constituent phrases are common word combinations with meaning slightly different from that of individual words.
[0021] In an embodiment, the top-m historical context includes at least one of, a machine-learned dense embeddings/vector representation corresponding to the content of the query, a sub-category of the input query, gender, and age.
[0022] In an aspect, the present disclosure provides a system for context-aware query expansion in a computing environment. The system receives an input query from one or more users associated with an electronic device. Further, the system identifies at least one issue corresponding to at least one of a low search recall performance and a short query, in response to the received input query. Furthermore, the system segments the input query into one or more constituent phrases, upon identifying the at least one issue. Further, the system analyzes a vocabulary gap in the one or more constituent phrases with respect to pre-defined phrases corresponding to one or more catalogs. Furthermore, the system selects at least one constituent phrase from the one or more constituent phrases comprising the vocabulary gap. Further, the system retrieves one or more contextual synonymous phrases corresponding to the selected at least one constituent phrase, by identifying top-k similar phrases corresponding to the at least one constituent phrase, from a set of mapping phrases. The top-k similar phrases with corresponding top-m historical contexts are identified using at least one of a Bidirectional Encoder Representations from Transformers (BERT) technique, and Approximate Nearest Neighbor (ANN) technique. Additionally, the system expands the input query into a context-aware query, by concatenating words in the one or more contextual synonymous phrases with words in the input query, wherein the input query is expanded for high search recall performance in the computing environment.
BRIEF DESCRIPTION OF THE ACCOMPANYING DRAWINGS
[0023] The accompanying drawings, which are incorporated herein, and constitute a part of this invention, illustrate exemplary embodiments of the disclosed methods and systems in which like reference numerals refer to the same parts throughout the different drawings. Components in the drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the present invention. Some drawings may indicate the components using block diagrams and may not represent the internal circuitry/sub-components of each component. It will be appreciated by those skilled in the art that the invention of such drawings includes the invention of electrical components, electronic components, or circuitry commonly used to implement such components.
[0024] FIG. 1 illustrates an exemplary block diagram representation of a network architecture implementing a proposed system for context-aware query expansion in a computing environment, according to embodiments of the present disclosure.
[0025] FIG. 2 illustrates an exemplary detailed block diagram representation of the proposed system, according to embodiments of the present disclosure.
[0026] FIG. 3 illustrates a flow chart depicting a method of context-aware query expansion in a computing environment, according to embodiments of the present disclosure.
[0027] FIG. 4 illustrates a hardware platform for the implementation of the disclosed system according to embodiments of the present disclosure.
[0028] The foregoing shall be more apparent from the following more detailed description of the invention.

DETAILED DESCRIPTION OF INVENTION
[0029] In the following description, for the purposes of explanation, various specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It will be apparent, however, that embodiments of the present disclosure may be practiced without these specific details. Several features described hereafter can each be used independently of one another or with any combination of other features. An individual feature may not address all of the problems discussed above or might address only some of the problems discussed above. Some of the problems discussed above might not be fully addressed by any of the features described herein.
[0030] The ensuing description provides exemplary embodiments only, and is not intended to limit the scope, applicability, or configuration of the disclosure. Rather, the ensuing description of the exemplary embodiments will provide those skilled in the art with an enabling description for implementing an exemplary embodiment. It should be understood that, various changes may be made in the function and arrangement of elements without departing from the spirit and scope of the invention as set forth.
[0031] Specific details are given in the following description to provide a thorough understanding of the embodiments. However, it will be understood by one of ordinary skill in the art that the embodiments may be practiced without these specific details. For example, circuits, systems, networks, processes, and other components may be shown as components in block diagram form in order not to obscure the embodiments in unnecessary detail. In other instances, well-known circuits, processes, algorithms, structures, and techniques may be shown without unnecessary detail in order to avoid obscuring the embodiments.
[0032] Also, it is noted that individual embodiments may be described as a process which is depicted as a flowchart, a flow diagram, a data flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be re-arranged. A process is terminated when its operations are completed but could have additional steps not included in a figure. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, its termination can correspond to a return of the function to the calling function or the main function.
[0033] The word “exemplary” and/or “demonstrative” is used herein to mean serving as an example, instance, or illustration. For the avoidance of doubt, the subject matter disclosed herein is not limited by such examples. In addition, any aspect or design described herein as “exemplary” and/or “demonstrative” is not necessarily to be construed as preferred or advantageous over other aspects or designs, nor is it meant to preclude equivalent exemplary structures and techniques known to those of ordinary skill in the art. Furthermore, to the extent that the terms “includes,” “has,” “contains,” and other similar words are used in either the detailed description or the claims, such terms are intended to be inclusive—in a manner similar to the term “comprising” as an open transition word—without precluding any additional or other elements.
[0034] As used herein, "connect", "configure", "couple" and its cognate terms, such as "connects", "connected", "configured", and "coupled" may include a physical connection (such as a wired/wireless connection), a logical connection (such as through logical gates of semiconducting device), other suitable connections, or a combination of such connections, as may be obvious to a skilled person.
[0035] As used herein, "send", "transfer", "transmit", and their cognate terms like "sending", "sent", "transferring", "transmitting", "transferred", "transmitted", etc. include sending or transporting data or information from one unit or component to another unit or component, wherein the content may or may not be modified before or after sending, transferring, transmitting.
[0036] Reference throughout this specification to “one embodiment” or “an embodiment” or “an instance” or “one instance” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
[0037] The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items.
[0038] Various embodiments of the present disclosure provide a method and a system for context-aware query expansion in a computing environment. The present disclosure provides a method and a system for selectively expanding portions of the query with synonymous terms to improve the recall search performance of vocabulary gap queries. The present disclosure uses a Lucene-based search index to search for the matching objects against a given user query (and corresponding augmented variants). The present disclosure provides a Bidirectional Encoder Representations from Transformers (BERT) based query phrase expansion. The present disclosure provides a method and a system for mining n-gram phrases from user queries based on thresholds on co-location information, thereby improving the matching set of objects beyond the input query text. The present disclosure reduces null searches and increases search query Click Through Rate (CTR). The present disclosure overcomes two-stage retrieval of pseudo-relevance feedback in the user path, by using bi-directional context for a given phrase or a term and hence differentiates between queries due to the bi-directional nature of the BERT model which builds a deeper pre-trained transformer variant using only encoders.
[0039] FIG. 1 illustrates an exemplary block diagram representation of a network architecture 100 implementing a proposed system 110 for context-aware query expansion in a computing environment, according to embodiments of the present disclosure. The network architecture 100 may include the system 110, an electronic device 108, and a centralized server 118. The system 110 may be connected to the centralized server 118 via a communication network 106. The centralized server 118 may include, but is not limited to, a stand-alone server, a remote server, a cloud computing server, a dedicated server, a rack server, a server blade, a server rack, a bank of servers, a server farm, hardware supporting a part of a cloud service or system, a home server, hardware running a virtualized server, one or more processors executing code to function as a server, one or more machines performing server-side functionality as described herein, at least a portion of any of the above, some combination thereof, and the like. The communication network 106 may be a wired communication network or a wireless communication network. The wireless communication network may be any wireless communication network capable of transferring data between entities of that network such as, but is not limited to, a carrier network including a circuit-switched network, a public switched network, a Content Delivery Network (CDN) network, a Long-Term Evolution (LTE) network, a New Radio (NR), a Global System for Mobile Communications (GSM) network and a Universal Mobile Telecommunications System (UMTS) network, an Internet, intranets, Local Area Networks (LANs), Wide Area Networks (WANs), mobile communication networks, combinations thereof, and the like.
[0040] The system 110 may be implemented by way of a single device or a combination of multiple devices that may be operatively connected or networked together. For example, the system 110 may be implemented by way of a standalone device such as the centralized server 118, and the like, and may be communicatively coupled to the electronic device 108. In another example, the system 110 may be implemented in/ associated with the electronic device 108. In yet another example, the system 110 may be implemented in/ associated with respective computing device 104-1, 104-2, …..., 104-N (individually referred to as computing device 104, and collectively referred to as computing devices 104), associated with one or more user 102-1, 102-2, …..., 102-N (individually referred to as the user 102, and collectively referred to as the users 102). In such a scenario, the system 110 may be replicated in each of the computing devices 104. The users 102 may be a user of, but are not limited to, an electronic commerce (e-commerce) platform, a hyperlocal platform, a super-mart platform, a media platform, a service providing platform, a social networking platform, a messaging platform, a bot processing platform, a virtual assistance platform, an Artificial Intelligence (AI) based platform, and the like. In some instances, the user 102 may include an entity/administrator.
[0041] The electronic device 108 may be at least one of, an electrical, an electronic, an electromechanical, and a computing device. The electronic device 108 may include, but is not limited to, a mobile device, a smart-phone, a Personal Digital Assistant (PDA), a tablet computer, a phablet computer, a wearable device, a Virtual Reality/Augment Reality (VR/AR) device, a laptop, a desktop, a server, and the like. The system 110 may be implemented in hardware or a suitable combination of hardware and software. The system 110 or the centralized server 118 may be associated with entities (not shown). The entities may include, but are not limited to, an e-commerce company, a company, an outlet, a manufacturing unit, an enterprise, a facility, an organization, an educational institution, a secured facility, and the like.
[0042] Further, the system 110 may include a processor 112, an Input/Output (I/O) interface 114, and a memory 116. The Input/Output (I/O) interface 114 of the system 110 may be used to receive user inputs, from one or more computing devices 104-1, 104-2, ……, 104-N (collectively referred to as the computing devices 104 and individually referred to as computing device 104) associated with one or more users 102 (collectively referred as users 102 and individually referred as user 102).
[0043] Further, system 110 may also include other units such as a display unit, an input unit, an output unit, and the like, however the same are not shown in FIG. 1, for the purpose of clarity. Also, in FIG. 1 only a few units are shown, however, the system 110 or the network architecture 100 may include multiple such units or the system 110/ network architecture 100 may include any such numbers of the units, obvious to a person skilled in the art or as required to implement the features of the present disclosure. The system 110 may be a hardware device including the processor 112 executing machine-readable program instructions to perform context-aware query expansion in a computing environment.
[0044] Execution of the machine-readable program instructions by the processor 112 may enable the proposed system 110 to perform context-aware query expansion in a computing environment. The “hardware” may comprise a combination of discrete components, an integrated circuit, an application-specific integrated circuit, a field-programmable gate array, a digital signal processor, or other suitable hardware. The “software” may comprise one or more objects, agents, threads, lines of code, subroutines, separate software applications, two or more lines of code, or other suitable software structures operating in one or more software applications or on one or more processors. The processor 112 may include, for example, but is not limited to, microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuits, and any devices that manipulate data or signals based on operational instructions, and the like. Among other capabilities, the processor 112 may fetch and execute computer-readable instructions in the memory 116 operationally coupled with the system 110 for performing tasks such as data processing, input/output processing, and/or any other functions. Any reference to a task in the present disclosure may refer to an operation being or that may be performed on data.
[0045] In the example that follows, assume that a user 102 of the system 110 desires to improve/add additional features for context-aware query expansion in a computing environment. In this instance, the user 102 may include an administrator of a website, an administrator of an e-commerce site, an administrator of a social media site, an administrator of an e-commerce application/ social media application/other applications, an administrator of media content (e.g., television content, video-on-demand content, online video content, graphical content, image content, augmented/virtual reality content, metaverse content), among other examples, and the like. The system 110 when associated with the electronic device 108 or the centralized server 118 may include, but is not limited to, a touch panel, a soft keypad, a hard keypad (including buttons), and the like.
[0046] In an embodiment, the system 110 may receive an input query from the one or more users 102 associated with a computing device 104. Further, the system 110 may identify at least one issue corresponding to at least one of a low search recall performance and a short query, in response to the received input query. Furthermore, the system 110 may segment the input query into one or more constituent phrases, upon identifying the at least one issue. In an embodiment, for segmenting the input query, the system 110 may measure a co-location for each adjacent word pair in the input query. In an embodiment, the co-location measurement comprises at least one of, a pointwise mutual information, log-likelihood ratio, token co-occurrence counts, and conditional probabilities. Further, the system may generate n-gram phrases from the input query based on thresholds associated with the measured co-location. Furthermore, the system 110 may create a set of the n-gram phrases.
[0047] Additionally, the system 110 may generate possible query tokens corresponding to the set of the n-gram phrases. Further, the system 110 may analyze the query tokens from the left among the query tokens, and recursively obtain a longest segment that exists in the set of the n-gram phrases, for segmenting the input query into one or more constituent phrases. In an embodiment, the one or more constituent phrases are common word combinations with meaning slightly different from that of individual words. In an embodiment, the system 110 may analyze a vocabulary gap in the one or more constituent phrases with respect to pre-defined phrases corresponding to one or more catalogs.
[0048] In an embodiment, for analyzing the vocabulary gap in the one or more constituent phrases, the system 110 may create a metrics set to store a number of objects in the one or more catalogs, that are historically matched for the input query and the historical Click-Through Rates (CTRs), upon segmenting the input query into the one or more constituent phrases. Further, the system 110 may calculate for each segmented one or more constituent phrases, a size of a set of matching objects with historical contexts in the one or more catalogs.
[0049] In an embodiment, the system 110 may select at least one constituent phrase from the one or more constituent phrases comprising the vocabulary gap. Additionally, the system 110 may retrieve one or more contextual synonymous phrases corresponding to the selected at least one constituent phrase, by identifying top-k similar phrases corresponding to the at least one constituent phrase, from a set of mapping phrases with corresponding top-m historical contexts. In an embodiment, the top-k similar phrases are identified using at least one of a Bidirectional Encoder Representations from Transformers (BERT) technique, and Approximate Nearest Neighbor (ANN) technique. The set of mapping phrases includes phrase-query pairs, which are well-performing phrases comprising high search recall performance. In an embodiment, the top-m historical context includes, but is not limited to, a machine-learned dense embeddings/vector representation corresponding to the content of the query, a sub-category of the input query, gender, age, and the like.
[0050] In an embodiment, the system 110 may expand the input query into a context-aware query, by concatenating words in the one or more contextual synonymous phrases with words in the input query. In an embodiment, the input query is expanded for high search recall performance in the computing environment.
[0051] FIG. 2 illustrates an exemplary detailed block diagram representation of the proposed system 110, according to embodiments of the present disclosure. The system 110 may include the processor 112, the Input/Output (I/O) interface 114, and the memory 116. In some implementations, the system 110 may include data 202, and modules 204. As an example, the data 202 may be stored in the memory 116 configured in the system 110 as shown in FIG. 2.
[0052] In an embodiment, the data 202 may include query data 206, issue data 208, low search recall data 210, short query data 212, constituent data 214, vocabulary gap data 216, contextual synonyms data 218, top-k similar phrases data 220, context-aware query data 222, concatenated words data 224, and other data 226. In an embodiment, the data 202 may be stored in the memory 116 in the form of various data structures. Additionally, the data 202 can be organized using data models, such as relational or hierarchical data models. The other data 218 may store data, including temporary data and temporary files, generated by the modules 204 for performing the various functions of the system 110.
[0053] In an embodiment, the modules 204, may include a receiving module 232, an identifying module 234, a segmenting module 236, an analyzing module 238, a selecting module 240, a retrieving module 242, an expanding module 244, and other modules 246.
[0054] In an embodiment, the data 202 stored in the memory 116 may be processed by the modules 204 of the system 110. The modules 204 may be stored within the memory 116. In an example, the modules 204 communicatively coupled to the processor 112 configured in the system 110, may also be present outside the memory 116, as shown in FIG. 2, and implemented as hardware. As used herein, the term modules refer to an Application-Specific Integrated Circuit (ASIC), an electronic circuit, a processor (shared, dedicated, or group) and memory that execute one or more software or firmware programs, a combinational logic circuit, and/or other suitable components that provide the described functionality.
[0055] In an embodiment, the receiving module 232 may receive an input query from the one or more users 102 associated with a computing device 104. The received input query may be stored as the query data 206. Further, the identifying module 234 may identify at least one issue corresponding to at least one of a low search recall performance and a short query, in response to the received input query. The identified at least one issue may be stored as the issue data 208. The identified low search recall performance may be stored as the low search recall data 210. The identified short query may be stored as the short query data 212. Furthermore, the segmenting module 236 may segment the input query into one or more constituent phrases, upon identifying the at least one issue. The one or more constituent phrases may be stored as the constituent data 214. In an embodiment, for segmenting the input query, the system 110 may measure a co-location for each adjacent word pair in the input query.
[0056] In an embodiment, the co-location measurement includes, but is not limited to, pointwise mutual information, log-likelihood ratio, token co-occurrence counts, conditional probabilities, and the like. Further, the system may generate n-gram phrases from the input query based on thresholds associated with the measured co-location. Furthermore, the system 110 may create a set of the n-gram phrases. Additionally, the system 110 may generate possible query tokens corresponding to the set of the n-gram phrases. Further, the analyzing module 238 may analyze the query tokens from the left among the query tokens, and recursively obtain a longest segment that exists in the set of the n-gram phrases, for segmenting the input query into one or more constituent phrases. In an embodiment, the one or more constituent phrases are common word combinations with meaning slightly different from that of individual words.
[0057] For example, a query chunker associated with the segmenting module 236 may segment the query into "constituent phrases", some of which can then be chosen for expansion. Phrases are common word combinations with meanings at least slightly different from that of individual words. These word combinations can have semantic meaning/definition of their own, contrasting from meanings of individual words, for example, ‘back cover,’ ‘Steve madden.’ Borrowing grammar-based linguistic rules or heuristics from general a Natural Language Processing (NLP) literature may not be helpful since search queries typically may not be complete well-formed sentences. Instead, mining n-gram phrases from user queries based on thresholds on co-location information may be helpful. The system 110 may denote a created phrase set with ‘𝑃’. To segment the query, the system 110 may scan the query tokens starting from the left and recursively take the longest segment that exists in the ‘𝑃’.
[0058] In an embodiment, the analyzing module 238 may analyze a vocabulary gap in the one or more constituent phrases with respect to pre-defined phrases corresponding to one or more catalogs. The analyzed vocabulary gap may be stored as the vocabulary gap data 216. In an embodiment, for analyzing the vocabulary gap in the one or more constituent phrases, the system 110 may create a metrics set to store a number of objects in the one or more catalogs, that are historically matched for the input query and the historical Click-Through Rates (CTRs), upon segmenting the input query into the one or more constituent phrases. Further, the system 110 may calculate for each segmented one or more constituent phrases, a size of a set of matching objects with historical contexts in the one or more catalogs.
[0059] For example, upon segmenting the query into phrases, the system 110 may identify the phrase(s) that lead(s) to a vocabulary gap problem. Here the system 110 may assume a phrase which has poor recall may be the cause of vocabulary gap and choose a respective phrase for expansion. Additionally, the system 110 may create a metrics set ‘𝑁’ to store the number of products historically matched for phrases along with corresponding historical Click-Through Rates (CTRs). By expanding on phrases with a low or null matching set or low CTR, the system 110 may help in increasing recall in the expanded query.
[0060] In an embodiment, the selecting module 240 may select at least one constituent phrase from the one or more constituent phrases comprising the vocabulary gap. Additionally, the retrieving module 242 may retrieve one or more contextual synonymous phrases corresponding to the selected at least one constituent phrase, by identifying top-k similar phrases corresponding to the at least one constituent phrase, from a set of mapping phrases with corresponding top-m historical contexts. The retrieved one or more contextual synonymous phrases may be stored as the contextual synonyms data 218. The identified top-k similar phrases may be stored as the top-k similar phrases data 220. In an embodiment, the top-k similar phrases may be identified using at least one of a Bidirectional Encoder Representations from Transformers (BERT) technique, and Approximate Nearest Neighbor (ANN) technique. The set of mapping phrases includes phrase-query pairs, which are well-performing phrases comprising high search recall performance. In an embodiment, the top-m historical context includes, but is not limited to, a machine-learned dense embeddings/vector representation corresponding to the content of the query, a sub-category of the input query, gender, age, and the like.
[0061] For example, upon identifying a phrase to be replaced in the input query, the system 110 may search for a corresponding contextual synonymous replacement phrase. To simplify, the system 110 may formulate the problem of finding expansions as a mapping problem (as opposed to a generation problem). Under this formulation, the system 110 may need to determine the nearest neighbors from a fixed set to a given input phrase. The system 110 may perform Maximum Inner Product Search (MIPS) on dense vector representations of these queries. Given the contextual nature of synonyms that the system 110 may need to model, the system 110 may use the BERT-based model to identify the top-k most similar phrases to the input phrase, from a set of well-performing phrases called mapping phrase set ‘𝑀’. The system 110 may obtain BERT embeddings for the input phrase, ‘𝑝’, and all phrases in ‘𝑀’ and choose the top-k based on dot product scores between ‘𝑝’ and each ‘𝑚𝑖 ∈ 𝑀’. Since BERT (and corresponding variants) typically operate at a word level, the system 110 may combine words in a phrase to obtain the embeddings of the whole phrase by concatenation of first and last words.
[0062] Additionally, a phrase similarity model such as a BERT model may be trained (i.e., offline learning) in two phases. For example, an unsupervised pre-training and supervised fine-tuning. The system 110 may first train a BERT model on in-domain query chains (queries appearing consecutively in the same user search session) since the general BERT fails to capture e-commerce concepts even after fine-tuning. In the pre-training phase, the BERT-base-uncased model checkpoint may be pre-trained on search queries optimizing for the Masked Language Model (MLM) objective. For the fine-tuning phase, the BERT model may be trained on the sentence pair classification task using query pairs and a binary label. The system 110 may train on sentence(query) pairs instead of phrases to capture contextual synonyms. Subsequently, the system 110 may call a trained model such as a PSimBERT (Phrase Similarity with BERT).
[0063] In an embodiment, the expanding module 244 may expand the input query into a context-aware query, by concatenating words in the one or more contextual synonymous phrases with words in the input query. The context-aware query may be stored as the context-aware query data 222. The concatenated words in the one or more contextual synonymous phrases may be stored as the concatenated words data 224. In an embodiment, the input query is expanded for high search recall performance in the computing environment.
Exemplary scenario:
[0064] Consider, data along with generation details of the sets Phrase Set ‘𝑃,’ Metrics Set ‘𝑁,’ and Mapping Set ‘𝑀.’ To create these three sets, the system 110 may use user queries along with the size of a corresponding matching product set from one month’s logs.
-Phrase Set ‘𝑃’ Creation: Consider phrases of bigrams. For all adjacent word pairs in a query, the system 110 may calculate co-location measures such as pointwise mutual information, log-likelihood ratio, token co-occurrence counts, and conditional probabilities and apply thresholds through cross-validation to get the set ‘𝑃’ of one million bigram phrases. To evaluate the phrase set, the system 110 may draw a random sample of ∼1000 queries stratified on frequency volume buckets and chunk the queries. In one example, human raters manually evaluated the set of identified phrases (∼550) and found that ∼85% of the identified phases were good/valid phrases for a phrase expansion task.
-Metrics Set ‘𝑁’ creation: To create the metrics set 𝑁, the system 110 may segment the queries. For each phrase segment, the system 110 may calculate the matching set size by taking an average of matching set sizes of all queries in which the phrase occurs.
-Mapping Set ‘𝑀’ creation: To create this set, the system 110 may chunk the queries according to the criteria such as scanning the query tokens starting from the left and recursively taking the longest segment that exists in ‘𝑃’. For each chunk, the system 110 may identify up to k (e.g., 5) queries (to provide different potential contexts of the phrase) in which it most commonly occurs. This set of phrase-query pairs may be called the mapping phrase set ‘𝑀’ consisting of ‘∼120k’ items.
[0065] For phrase similarity PSim-BERT model pre-training phase, the queries may be obtained from a month’s search logs after filtering for the extreme tail to create a set of ∼57M queries. For example, the model trained using both MLM and the next sentence prediction tasks (i.e., next query prediction in a user session from the same product category) may not be improved the recall@5 and hence the MLM-based model may be used. For the fine-tuning phase of the sentence pair classification task, data may be labeled by human labelers. For example, domain experts may identify the articulation/vocabulary gap queries from a random sample of low-performing queries and provide a well-performing ground truth replacement. Then the processor 112 may add a random sample of one negative example per positive pair to obtain a class-balanced labeled set with ∼69k examples.
-Model Inference Optimizations: In an example, to enable the basic PSim-BERT model to be computationally feasible to run online, the basic PSimBERT model may be optimized. First, padding may be removed during inference since the model may only receive a single query per batch in the user-path. Second, a knowledge distillation is applied, thereby reducing the number of layers by a factor of 2. A pretrained version may be distilled of the model tuned for the e-commerce domain and then perform fine-tuning. Finally, the model inference computation may be simplified by applying dynamic quantization.
[0066] In an example, upon rewriting the phrases, the system 110 may need to retrieve products using these for the input query fired by a user. Instead of firing separate queries to the index, the system 110 may construct a “Lucene query” with a disjunction (OR) of rewrite phrases “𝑚𝑖”. For example, the query "𝑝1 𝑝2 𝑝3" may be expanded as "𝑝1 (𝑝2 OR 𝑚𝑖) 𝑝3" where, 𝑚𝑖 may refer to the closest phrase from the mapping set. This leads to better compute efficiency on the index. For example, an overall P95 latency of the system 110 may be 38ms, which is within acceptable limits and allows to deploy the model online (in user-path).
[0067] In an example scenario, the system 110 may be evaluated using at least three techniques. Firstly, the system 110 may be evaluated using a phrase similarity evaluation. For a given phrase identified for expansion, evaluate a quality of the top 10 most similar phrases on a random sample of 500 queries on a Boolean scale of good or bad. The output may be a recall at various values of k for similarity judgments. For latency reasons, top-3 phrases may be restricted for expansion. Secondly, the system 110 may be evaluated using an offline evaluation. The system 110 uses a statistical query to query dictionaries as well as a query-to-query Manhattan Long Short-Term Memory Networks (MaLSTM) model, which is superior to multiple complete query rewrite systems such as a SentenceBERT. Query expansion is a recall-focused task. Hence, for offline evaluation, users 102 (as administrator or entity) may measure Recall@30 to account for inefficiencies of the precision-focused relevance ranking systems that are invoked by the recall layer. There may be an improvement in Recall@30 by +11% over the existing systems. Furthermore, the system 110 may be evaluated using an online AB experiment. A PSim-BERT model may be used for an online AB experiment on 10% of user traffic sampled randomly. After two weeks of execution of the online AB experiment at a 5% significance level, the system 110 may be applied with an engagement criterion of expanding only low-recall (number of matching products < 2) and short (number of words < 7) queries. The PSim-BERT may provide improvements with reduced null searches and increased search query CTR.
[0068] FIG. 3 illustrates a flow chart depicting a method 300 of context-aware query expansion in a computing environment, according to embodiments of the present disclosure.
[0069] At block 302, the method 300 includes, receiving, by the processor 112 associated with the system 110, an input query from one or more users 102 associated with a computing device 104.
[0070] At block 304, the method 300 includes identifying, by the processor 112, at least one issue corresponding to at least one of a low search recall performance and a short query, in response to the received input query.
[0071] At block 306, the method 300 includes segmenting, by the processor 112, the input query into one or more constituent phrases, upon identifying the at least one issue.
[0072] At block 308, the method 300 includes analyzing, by the processor 112, a vocabulary gap in the one or more constituent phrases with respect to pre-defined phrases corresponding to one or more catalogs.
[0073] At block 310, the method 300 includes selecting, by the processor 112, at least one constituent phrase from the one or more constituent phrases comprising the vocabulary gap.
[0074] At block 312, the method 300 includes retrieving, by the processor 112, one or more contextual synonymous phrases corresponding to the selected at least one constituent phrase, by identifying top-k similar phrases corresponding to the at least one constituent phrase, from a set of mapping phrases with corresponding top-m historical contexts. The top-k similar phrases are identified using at least one of a Bidirectional Encoder Representations from Transformers (BERT) technique, and Approximate Nearest Neighbor (ANN) technique.
[0075] At block 314, the method 300 includes expanding, by the processor 112, the input query into a context-aware query, by concatenating words in the one or more contextual synonymous phrases with words in the input query, wherein the input query is expanded for high search recall performance in the computing environment.
[0076] The order in which the method 300 is described is not intended to be construed as a limitation, and any number of the described method blocks may be combined or otherwise performed in any order to implement the method 300 or an alternate method. Additionally, individual blocks may be deleted from the method 300 without departing from the spirit and scope of the present disclosure described herein. Furthermore, the method 300 may be implemented in any suitable hardware, software, firmware, or a combination thereof, that exists in the related art or that is later developed. The method 300 describes, without limitation, the implementation of the system 110. A person of skill in the art will understand that method 300 may be modified appropriately for implementation in various manners without departing from the scope and spirit of the disclosure.
[0077] FIG. 4 illustrates a hardware platform 400 for implementation of the disclosed system 110, according to an example embodiment of the present disclosure. For the sake of brevity, the construction, and operational features of the system 110 which are explained in detail above are not explained in detail herein. Particularly, computing machines such as but not limited to internal/external server clusters, quantum computers, desktops, laptops, smartphones, tablets, and wearables which may be used to execute the system 110 or may include the structure of the hardware platform 400. As illustrated, the hardware platform 400 may include additional components not shown, and that some of the components described may be removed and/or modified. For example, a computer system with multiple GPUs may be located on external-cloud platforms including Amazon® Web Services, or internal corporate cloud computing clusters, or organizational computing resources, etc.
[0078] The hardware platform 400 may be a computer system such as the system 110 that may be used with the embodiments described herein. The computer system may represent a computational platform that includes components that may be in a server or another computer system. The computer system may execute, by the processor 405 (e.g., a single or multiple processors) or other hardware processing circuit, the methods, functions, and other processes described herein. These methods, functions, and other processes may be embodied as machine-readable instructions stored on a computer-readable medium, which may be non-transitory, such as hardware storage devices (e.g., RAM (random access memory), ROM (read-only memory), EPROM (erasable, programmable ROM), EEPROM (electrically erasable, programmable ROM), hard drives, and flash memory). The computer system may include the processor 405 that executes software instructions or code stored on a non-transitory computer-readable storage medium 410 to perform methods of the present disclosure. The software code includes, for example, instructions to gather data and documents and analyze documents. In an example, the modules 204, may be software codes or components performing these steps.
[0079] The instructions on the computer-readable storage medium 410 are read and stored the instructions in storage 415 or in random access memory (RAM). The storage 415 may provide a space for keeping static data where at least some instructions could be stored for later execution. The stored instructions may be further compiled to generate other representations of the instructions and dynamically stored in the RAM such as RAM 420. The processor 405 may read instructions from the RAM 420 and perform actions as instructed.
[0080] The computer system may further include the output device 425 to provide at least some of the results of the execution as output including, but not limited to, visual information to users, such as external agents. The output device 425 may include a display on computing devices and virtual reality glasses. For example, the display may be a mobile phone screen or a laptop screen. GUIs and/or text may be presented as an output on the display screen. The computer system may further include an input device 430 to provide a user or another device with mechanisms for entering data and/or otherwise interacting with the computer system. The input device 430 may include, for example, a keyboard, a keypad, a mouse, or a touchscreen. Each of these output devices 425 and input device 430 may be joined by one or more additional peripherals. For example, the output device 425 may be used to display the results such as bot responses by the executable chatbot.
[0081] A network communicator 435 may be provided to connect the computer system to a network and in turn to other devices connected to the network including other clients, servers, data stores, and interfaces, for instance. A network communicator 435 may include, for example, a network adapter such as a LAN adapter or a wireless adapter. The computer system may include a data sources interface 440 to access the data source 445. The data source 445 may be an information resource. As an example, a database of exceptions and rules may be provided as the data source 445. Moreover, knowledge repositories and curated data may be other examples of the data source 445.
[0082] While considerable emphasis has been placed herein on the preferred embodiments, it will be appreciated that many embodiments can be made and that many changes can be made in the preferred embodiments without departing from the principles of the invention. These and other changes in the preferred embodiments of the invention will be apparent to those skilled in the art from the disclosure herein, whereby it is to be distinctly understood that the foregoing descriptive matter is to be implemented merely as illustrative of the invention and not as a limitation.

ADVANTAGES OF THE PRESENT DISCLOSURE
[0083] The present disclosure provides a method and a system for context-aware query expansion in a computing environment.
[0084] The present disclosure provides a method and a system for selectively expanding portions of the query with synonymous terms to improve the recall search performance of vocabulary gap queries.
[0085] The present disclosure uses Lucene-based search index to search for the matching objects against a given user query (and corresponding augmented variants).
[0086] The present disclosure provides a Bidirectional Encoder Representations from Transformers (BERT) based query phrase expansion.
[0087] The present disclosure provides a method and a system for mining n-gram phrases from user queries based on thresholds on co-location information, thereby improving the matching set of objects beyond the input query text.
[0088] The present disclosure reduces null searches and increases search query Click Through Rate (CTR).
[0089] The present disclosure overcomes two-stage retrieval of pseudo-relevance feedback in the user path, by using bi-directional context for a given phrase or a term and hence differentiates between queries due to the bi-directional nature of the BERT model which builds a deeper pre-trained transformer variant using only encoders.

, Claims:1. A method for context-aware query expansion in a computing environment, the method comprising:
receiving, by a processor (112) associated with a system (110), an input query from one or more users (102) associated with a computing device (104);
identifying, by the processor (112), at least one issue corresponding to at least one of a low search recall performance and a short query, in response to the received input query;
segmenting, by the processor (112), the input query into one or more constituent phrases, upon identifying the at least one issue;
analyzing, by the processor (112), a vocabulary gap in the one or more constituent phrases with respect to pre-defined phrases corresponding to one or more catalogs;
selecting, by the processor (112), at least one constituent phrase from the one or more constituent phrases comprising the vocabulary gap;
retrieving, by the processor (112), one or more contextual synonymous phrases corresponding to the selected at least one constituent phrase, by identifying top-k similar phrases corresponding to the at least one constituent phrase, from a set of mapping phrases with corresponding top-m historical contexts, wherein the top-k similar phrases are identified using at least one of a Bidirectional Encoder Representations from Transformers (BERT) technique, and Approximate Nearest Neighbor (ANN) technique; and
expanding, by the processor (112), the input query into a context-aware query, by concatenating words in the one or more contextual synonymous phrases with words in the input query, wherein the input query is expanded for high search recall performance in the computing environment.

2. The method as claimed in claim 1, wherein segmenting the input query further comprises:
measuring, by the processor (112), a co-location for each adjacent word pair in the input query;
generating, by the processor (112), n-gram phrases from the input query based on thresholds associated with the measured co-location;
creating, by the processor (112), a set of the n-gram phrases;
generating, by the processor (112), possible query tokens corresponding to the set of the n-gram phrases; and
analyzing, by the processor (112), the query tokens from left among the query tokens, and recursively obtaining a longest segment that exists in the set of the n-gram phrases, for segmenting the input query into one or more constituent phrases.

3. The method as claimed in claim 2, wherein the co-location measurement comprises at least one of, a pointwise mutual information, log-likelihood ratio, token co-occurrence counts, and conditional probabilities.

4. The method as claimed in claim 1, wherein analyzing the vocabulary gap in the one or more constituent phrases further comprises:
creating, by the processor (112), a metrics set to store a number of objects in the one or more catalogs, that are historically matched for the input query and the historical Click-Through Rates (CTRs), upon segmenting the input query into one or more constituent phrases; and
calculating, by the processor (112), for each segmented one or more constituent phrases, a size of a set of matching objects with historical contexts in the one or more catalogs.

5. The method as claimed in claim 1, wherein the set of mapping phrases comprises phrase-query pairs, which are well-performing phrases comprising high search recall performance.
6. The method as claimed in claim 1, wherein the one or more constituent phrases are common word combinations with meaning slightly different from that of individual words.

7. The method as claimed in claim 1, wherein the top-m historical context comprises at least one of, a machine-learned dense embeddings/vector representation corresponding to content of the query, a sub-category of the input query, gender, and age.

8. A system (110) for context-aware query expansion in a computing environment, the system (110) comprising:
a processor (112);
a memory (116) coupled to the processor (112), wherein the memory (116) comprises processor-executable instructions, which on execution, cause the processor (112) to:
receive an input query from one or more users (102) associated with a computing device (104);
identify at least one issue corresponding to at least one of a low search recall performance and a short query, in response to the received input query;
segment the input query into one or more constituent phrases, upon identifying the at least one issue;
analyze a vocabulary gap in the one or more constituent phrases with respect to pre-defined phrases corresponding to one or more catalogs;
select at least one constituent phrase from the one or more constituent phrases comprising the vocabulary gap;
retrieve one or more contextual synonymous phrases corresponding to the selected at least one constituent phrase, by identifying top-k similar phrases corresponding to the at least one constituent phrase, from a set of mapping phrases, wherein the top-k similar phrases with corresponding top-m historical contexts are identified using at least one of a Bidirectional Encoder Representations from Transformers (BERT) technique, and Approximate Nearest Neighbor (ANN) technique; and
expand the input query into a context-aware query, by concatenating words in the one or more contextual synonymous phrases with words in the input query, wherein the input query is expanded for high search recall performance in the computing environment.

9. The system (110) as claimed in claim 8, wherein for segmenting the input query the processor (112) is further configured to:
measure a co-location for each adjacent word pair in the input query;
generate n-gram phrases from the input query based on thresholds associated with the measured co-location;
create a set of the n-gram phrases;
generate possible query tokens corresponding to the set of the n-gram phrases; and
analyze the query tokens from left among the query tokens, and recursively obtain a longest segment that exists in the set of the n-gram phrases, for segmenting the input query into one or more constituent phrases.

10. The system (110) as claimed in claim 9, wherein the co-location measurement comprises at least one of, a pointwise mutual information, log-likelihood ratio, token co-occurrence counts, and conditional probabilities.

11. The system (110) as claimed in claim 8, wherein for analyzing the vocabulary gap in the one or more constituent phrases, the processor (112) is further configured to:
create a metrics set to store a number of objects in the one or more catalogs, that are historically matched for the input query and the historical Click-Through Rates (CTRs), upon segmenting the input query into the one or more constituent phrases; and
calculate for each segmented one or more constituent phrases, a size of a set of matching objects with historical contexts in the one or more catalogs.

12. The system (110) as claimed in claim 8, wherein the set of mapping phrases comprises phrase-query pairs, which are well-performing phrases comprising high search recall performance.

13. The system (110) as claimed in claim 8, wherein the one or more constituent phrases are common word combinations with meaning slightly different from that of individual words.

14. The system (110) as claimed in claim 8, wherein the top-m historical context comprises at least one of, a machine-learned dense embeddings/vector representation corresponding to content of the query, a sub-category of the input query, gender, and age.

Documents

Application Documents

#	Name	Date
1	202241066810-STATEMENT OF UNDERTAKING (FORM 3) [21-11-2022(online)].pdf	2022-11-21
2	202241066810-REQUEST FOR EXAMINATION (FORM-18) [21-11-2022(online)].pdf	2022-11-21
3	202241066810-REQUEST FOR EARLY PUBLICATION(FORM-9) [21-11-2022(online)].pdf	2022-11-21
4	202241066810-POWER OF AUTHORITY [21-11-2022(online)].pdf	2022-11-21
5	202241066810-FORM-9 [21-11-2022(online)].pdf	2022-11-21
6	202241066810-FORM 18 [21-11-2022(online)].pdf	2022-11-21
7	202241066810-FORM 1 [21-11-2022(online)].pdf	2022-11-21
8	202241066810-DRAWINGS [21-11-2022(online)].pdf	2022-11-21
9	202241066810-DECLARATION OF INVENTORSHIP (FORM 5) [21-11-2022(online)].pdf	2022-11-21
10	202241066810-COMPLETE SPECIFICATION [21-11-2022(online)].pdf	2022-11-21
11	202241066810-ENDORSEMENT BY INVENTORS [14-12-2022(online)].pdf	2022-12-14
12	202241066810-FER.pdf	2023-02-22
13	202241066810-FER_SER_REPLY [21-08-2023(online)].pdf	2023-08-21
14	202241066810-DRAWING [21-08-2023(online)].pdf	2023-08-21
15	202241066810-CORRESPONDENCE [21-08-2023(online)].pdf	2023-08-21
16	202241066810-COMPLETE SPECIFICATION [21-08-2023(online)].pdf	2023-08-21
17	202241066810-CLAIMS [21-08-2023(online)].pdf	2023-08-21
18	202241066810-ABSTRACT [21-08-2023(online)].pdf	2023-08-21
19	202241066810-US(14)-HearingNotice-(HearingDate-11-06-2024).pdf	2024-05-16
20	202241066810-Correspondence to notify the Controller [07-06-2024(online)].pdf	2024-06-07
21	202241066810-Written submissions and relevant documents [26-06-2024(online)].pdf	2024-06-26
22	202241066810-US(14)-ExtendedHearingNotice-(HearingDate-05-08-2024)-1030.pdf	2024-07-24
23	202241066810-Correspondence to notify the Controller [30-07-2024(online)].pdf	2024-07-30
24	202241066810-Written submissions and relevant documents [12-08-2024(online)].pdf	2024-08-12
25	202241066810-PatentCertificate23-08-2024.pdf	2024-08-23
26	202241066810-IntimationOfGrant23-08-2024.pdf	2024-08-23

Search Strategy

1	202241066810E_21-02-2023.pdf