Abstract: This disclosure relates generally to complex question answering (CQA) on knowledge graphs (KG) using machine translation and multi-task learning. Complex KGQA involves mention of multiple entities in the question. Multiple sequence of relationships combined in complex topologies, are required to answer such questions. The disclosed method and system provides a CQA-NMT model to answer such questions by using a multi-task BERT based Neural Machine Translation (NMT) model. The CQA-NMT model performs four tasks jointly using a single model, i.e., i) Detection of mentioned entities, ii) prediction of entity types of answer nodes, iii) prediction of topology and relations involved, and iv) question type classification such as ‘Factoid’, ‘Count’, etc. CQA-NMT also helps downstream tasks of mentioned entity disambiguation and subsequent answer retrieval from the KG. [To be published with FIG. 6]
FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003
COMPLETE SPECIFICATION (See Section 10 and Rule 13)
Title of invention:
COMPLEX QUESTION ANSWERING ON KNOWLEDGE GRAPHS USING MACHINE TRANSLATION AND MULTI-TASK LEARNING
Applicant
Tata Consultancy Services Limited A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th floor,
Nariman point, Mumbai 400021,
Maharashtra, India
Preamble to the description
The following specification particularly describes the invention and the manner in which it is to be performed.
TECHNICAL FIELD [001] The disclosure herein generally relates the field of question answering over a knowledge graph, and, more particularly, to complex question answering (CQA) on knowledge graphs using machine translation and multi-task learning.
BACKGROUND [002] Question answering (QA) over a knowledge graph (KG) is a task of answering a natural language (NL) query using the information stored in the KG. In a real-world industrial setting, question answering involves addressing multiple challenges including entity linking, multi-hop reasoning over KG, etc. Traditional approaches handle these challenges in a modularized sequential manner where errors in one module lead to the accumulation of errors in downstream modules. Often these challenges are inter-related and the solutions to them can reinforce each other when handled simultaneously in an end-to-end learning setup.
SUMMARY [003] Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one embodiment, a processor-implemented method for complex question answering on knowledge graphs using machine translation and multi-task learning is provided. The method includes receiving a natural language query (NLQ), via one or more hardware processors. Further, the method includes determining one of a presence and an absence of one or more mentioned entities in the NLQ, via the one or more hardware processors. Also, the method includes upon determination of the presence of the one or more mentioned entities in the NLQ, extracting the one or more mentioned entities and the type of the one or more mentioned entities using a Bidirectional Encoder Representations from Transformers (BERT) encoder, via the one or more hardware processors. Moreover, the method includes linking each mentioned entity of the one or more mentioned entities to a node of a knowledge
graph (KG) by using ensemble of string-matching algorithms and a graph-based model to capture structural bias in the KG, via the one or more hardware processors, wherein a type of the node of the KG and the type of the each mentioned entity are same. Further, the method includes generating a sequence of predicates for the NLQ using a Transformer-based decoder, via one or more hardware processors, each predicate represented by an edge of the KG, and the sequence of predicates is representative of a path to be traversed in the KG from mentioned entity to the answer entity. Also, the method includes extracting an answer to the NLQ based on the sequence of predicates, the type of the NLQ (qtype), the AET of the set of answer entities and the one or more mentioned entities, via the one or more hardware processors, using one or more predefined templates of a formal query, the formal query capable of being executed on the KG.
[004] In another aspect, a system for complex question answering on knowledge graphs using machine translation and multi-task learning is provided. The system includes a memory storing instructions; one or more communication interfaces; and one or more hardware processors coupled to the memory via the one or more communication interfaces, wherein the one or more hardware processors are configured by the instructions to receive a natural language query (NLQ). Further, the one or more hardware processors are configured by the instructions to determine one of a presence and an absence of one or more mentioned entities in the NLQ. Also, upon determination of the presence of the one or more mentioned entities in the NLQ the one or more hardware processors are configured by the instructions to extract the one or more mentioned entities and the type of the one or more mentioned entities using a Bidirectional Encoder Representations from Transformers (BERT) encoder. Moreover, the one or more hardware processors are configured by the instructions to link each mentioned entity of the one or more mentioned entities to a node of a knowledge graph (KG) by using ensemble of string-matching algorithms and a graph-based model to capture structural bias in the KG, via the one or more hardware processors, wherein a type of the node of the KG and the type of the each mentioned entity are same. Further, the one or more hardware processors are configured by the instructions to generate a sequence of
predicates for the NLQ using a Transformer-based decoder, each predicate represented by an edge of the KG, and the sequence of predicates is representative of a path to be traversed in the KG from mentioned entity to the answer entity. Also, the one or more hardware processors are configured by the instructions to extract an answer to the NLQ based on the sequence of predicates, the type of the NLQ (qtype), the AET of the set of answer entities and the one or more mentioned entities, using one or more predefined templates of a formal query, the formal query capable of being executed on the KG.
[005] In yet another aspect, a non-transitory computer readable medium for method for complex question answering on knowledge graphs using machine translation and multi-task learning is provided. The method includes receiving a natural language query (NLQ), via one or more hardware processors. Further, the method includes determining one of a presence and an absence of one or more mentioned entities in the NLQ, via the one or more hardware processors. Also, the method includes upon determination of the presence of the one or more mentioned entities in the NLQ, extracting the one or more mentioned entities and the type of the one or more mentioned entities using a Bidirectional Encoder Representations from Transformers (BERT) encoder, via the one or more hardware processors. Moreover, the method includes linking each mentioned entity of the one or more mentioned entities to a node of a knowledge graph (KG) by using ensemble of string-matching algorithms and a graph-based model to capture structural bias in the KG, via the one or more hardware processors, wherein a type of the node of the KG and the type of the each mentioned entity are same. Further, the method includes generating a sequence of predicates for the NLQ using a Transformer-based decoder, via one or more hardware processors, each predicate represented by an edge of the KG, and the sequence of predicates is representative of a path to be traversed in the KG from mentioned entity to the answer entity. Also, the method includes extracting an answer to the NLQ based on the sequence of predicates, the type of the NLQ (qtype), the AET of the set of answer entities and the one or more mentioned entities, via the one or more hardware processors, using one or more
predefined templates of a formal query, the formal query capable of being executed on the KG.
[006] It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS
[007] The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles:
[008] FIG. 1 illustrates an example natural language queries from a real-world dataset.
[009] FIG. 2 illustrates a schema of a knowledge graph for complex question answering, in accordance with an example embodiment.
[010] FIG. 3 illustrates an example network implementation of a system for complex question answering on knowledge graphs using machine translation and multi-task learning, in accordance with an example embodiment.
[011] FIG. 4 is a block diagram of the disclosed multi-task model for complex question answering - neural machine translation (CQA-NMT), according to some embodiments of the present disclosure.
[012] FIG. 5 is a block diagram of entity mention module, entity linking module and path prediction module of the system of FIG. 3, in accordance with some embodiments of the present disclosure.
[013] FIG. 6 is a flow diagram of a method for complex question answering on knowledge graphs using machine translation and multi-task learning, in accordance with an example embodiment.
[014] FIG. 7 is a block diagram of an exemplary computer system for implementing embodiments consistent with the present disclosure.
DETAILED DESCRIPTION OF EMBODIMENTS
[015] Question answering on knowledge graphs (KGQA) has mainly been attempted on publicly available KGs such as FreebaseTM, DBPediaTM, YagoTM and so on and so forth. Questions answering is also been used on proprietary KGs created by large enterprises. For example, KGQA, on a) a KG that contains information related to retail products, can help the customers choose the right product for their needs, or b) a KG containing document catalogs (best practices, white papers, research papers) can help a knowledge worker find a specific piece of information, or c) a KG that stores profiles of various companies can be used to do preliminary analysis before giving them a loan, etc. Sample questions from a known dataset are shown in FIG. 1. The schema of the corresponding KG is shown in FIG. 2.
[016] Answering such questions often requires a traversal of KG along multiple relations which may not form a directed chain graph and may follow more complex topology as shown for question 5, 7 and 8 in FIG. 1. As is also seen, most often words of the natural language question (NLQ) and corresponding relations have a weak correlation. Most of the known techniques on the KGQA task parse the NLQ and convert it into a structured query and then execute the structured query on the KG to retrieve the factoid answers. Such conversion involves multiple sub-tasks: a) linking the mentioned entity with corresponding entity-node in the KG b) identification of the type of the answer entity c) identification of relations. These tasks are most often performed in a sequence or in parallel which results in accumulation of errors. Further, most of the KGQA tasks are not as complex as LOCA. For example, a) All questions of SimpleQA can be answered using single triple, b) NLQs for most of the datasets (e.g., SimpleQA, Meta QA) contain only one mentioned entity, and c) Even if multiple relations are required for answer entity retrieval, they are organized in a sequence, i.e., chain.
[017] There are specific types of questions that pose many challenges with respect to each of the aforementioned tasks. Moreover, some of the questions can only be answered via a model that attempts more than one sub-tasks together. For example, the first two questions of FIG. 1 mentions the same words, i.e., “deep learning” but they get associated with two different entity nodes of the KG.
Additionally, the known techniques could detect the set of relations when the schema sub-graph follows a specific topology, however, in the aforementioned example, most of the questions follow a different topology, as shown in FIG. 1.
[018] Some of the challenges to be addressed while performing the KGQA task includes, for example, but not limited to incomplete entity mention, co-occurrence disambiguation, avoid un-intended match, duplicate KG entity, relation names mismatch, implicit relations indication, and so on. Each of these are explained below. For example, in the NLQ users often do not mention the complete name of the intended entity Huang et al. (2019), e.g., only the first name of a person, short name of a group, etc., e.g., question 8 in FIG. 1. This is referred to as incomplete entity mention.
[019] Co-occurrence disambiguation: For situations when a mentioned entity should be linked to KG entity with help of another mentioned entity in the question, e.g., in question 7 of FIG. 1, there can be many people who have the same first name (‘Libby’) but there is only one of them who works on NLP, the models needs to use this information to conclusively resolve the mentioned entities.
[020] Avoid un-intended match: Some of the words in a sentence coincidently match with an entity name but are not an intended mention of an entity, e.g., the word ‘vision’ may get matched with ‘Computer Vision’ which is not intended in question 9 of FIG. 1.
[021] Duplicate KG Entity: The intended entity names may be different from the words used in the NLQ, and there can be more than one entity in the KG that has the same name, for example, “Life Sciences” is the name of a research area, as well as a keyword (see KG schema given in FIG. 2). The model needs to link the entity using other words, similar to how it is shown in questions 1 and 2 of FIG. 1.
[022] Relation names mismatch: Often the words of the KG relations and the words of the NLQ do not match, e.g., questions 2, 4, 6, etc. in FIG. 1.
[023] Implicit Relations Indication: Sometimes words of the NLQ do not even make any mention of the relations involved, however, they need to be inferred. For example, in question 4 of FIG. 1, some of the relations are not mentioned in the question.
[024] Typically, various techniques are used for KGQA problem. For example, some approaches were proposed to perform KGQA by mapping a query to its logical form and then converting it to a formal query to extract answers. However, these are were joint learning tasks.
[025] Multi-Task based approaches Various approaches rely on the jointly learning multiple sub-tasks tasks of KGQA problem. However, all these approaches focus on single hop relations only, and therefore we cannot take such approaches as a baseline for our model. In a more complex setting, a known technique proposed a joint learning task for entity linking, path prediction (chains topology only), and question type. However, the model of said technique does not predict answer entity type. Said technique focuses on the implicit mention of the entities in previous sentences of dialogue, and do not attempt to predict non-chain topology or the answer entity type.
[026] Non-Chain Multi-Hop Relations: An embedding based approach was proposed to predict non-chain multi-hop relation prediction (for a fixed and small set of topologies). They perform only one task of relationship prediction.
[027] In addition, various techniques have been used for KGQA. For example, transformers and Machine Translation has proved to be one of the most exciting approaches for NLP research. The approach proposed a joint-learning based multi-task model using Transformer. However, they handle only 1-hop questions and consider relation prediction as a classification task. In its current form it cannot be used to solve the variable length path prediction form, as required in the example of FIG. 1. In an other proposed technique, the usage of attention-based seq2seq model to generate the logical form of an input utterance is considered. However, they use an LSTM model and not Transformer.
[028] Another known technique for the problem of KGQA is using graph-based solutions. The approach extracts a subgraph related to query and then performs reasoning over it to extract the final answer(s). A KV-MEM was proposed which is a memory network-based approach for a single relationship prediction.
[029] Embedding Based Approaches: Approaches for KGQA using KG embeddings (such as TransETM) were used when only one relation (i.e., one RDF
triple) is involved. Certain one-hop KG modeling approaches were also proposed. An approach, namely, EmbedKGQATM, for joint learning, uses KG embeddings, in the context of multi-hop relations. However, their approach is not truly a joint model as they perform answer candidate selection via the model, i.e., they arrive at the candidates before executing the model.
[030] Various embodiments disclosed herein provides method and system for complex question answering (QA) on knowledge graphs (KG) using machine translation and multi-task learning. In various embodiments, the KGQA problem is posed as a supervised learning problem and next, the labels are assumed to be available for every question in the training data and that need to be predicted for every question in the test data. In an embodiment, for answering natural language questions (NLQ), it is assumed that the background knowledge is stored in a knowledge graph G, comprising of a set of nodes V (G), and edges E(G). Here, nodes represent entities, and edges represent the relationship between a pair of entities or connect an entity to one of its properties. An NLQ (q) is a sequence of words wi of a natural language (e.g., English), i.e., q = {w1, w2, …, wN}.
[031] It is also assumed that the NLQs can mention zero, one, or more entities present in G and enquire about another entity of G, which is connected with the mentioned entity(s). The objective of the proposed method and system is to output 1) the mentioned entity(s) (si) in the query, 2) the answer entity type, 3) the path or set of predicates, and, 4) the
question type. The set Pp is a sequence of predicates, such that if traversed along these edges from the mentioned entity node(s), the answer entity(s) node(s) can be arrived at. The final answer is then retrieved from KG and is post-processed as per the outputs of question type and the ‘answer entity type’ modules. It is assumed that there are N training samples available where,
[032] Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever
convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments. It is intended that the following detailed description be considered as exemplary only, with the true scope being indicated by the following claims.
[033] Glossary of terms used in embodiments
[034] Entity Linking Annotation: Some of the n-grams (ηi) in an NLQ refer to entity(s) of KG. Such n-grams have been underlined in FIG 1. The entity-id (as shown in the third column of FIG. 1 of the mentioned entity is also assumed to be available as part of label annotation for every question.
[035] Answer Entity Type Annotation (AET), τ: It is assumed that every NLQ has an entity type (ti) for the answer entities. These are shown in the middle column of FIG. 1. The τ is referred to as a set of all entity types in the knowledge graph G.
[036] Relation Sequence and Topology Annotation (path): Sequence of relations connecting the linked entities to the answer entities can be considered paths (pathi), each of which can contain one or more relations. These paths are connected to form a topology, as shown in FIG. 1. This topology of the paths and relations are also assumed to be available for an NLQ in training data. These paths need not be the shortest paths between the linked entities and the answer entities. For example, the last three columns of FIG. 1 indicate a) the set of paths separated by a semicolon (;), b) whether this is the shortest path, and c) topology of the paths connecting the linked entities to the answer entities.
[037] Question Type Annotation Some NLQs can be answered by
a single triple of the knowledge graph (‘Simple’), while some of them require traversal along with more complex topology (‘Factoid’), some questions require an aggregate operation such as count (‘Count’, see question 6, in FIG. 1 1), and finally, some questions perform existence check (‘Boolean’, see question 8, in FIG. 1). Such information is also assumed to be available for every NLQ in training data.
[038] Referring now to the drawings, and more particularly to FIG. 3 through 7, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.
[039] FIG. 3 illustrates an example network implementation 300 of a system 302 for complex question answering on knowledge graphs using machine translation and multi-task learning, in accordance with an example embodiment. The embodiments are related to complex KGQA which involves mention of multiple entities in the question. Multiple sequence of relationships combined in complex topologies, are required to answer such questions. It is evident that such questions, while required to be answered in real world industrial setting, cannot be answered using conventional approaches. The disclosed system includes a complex question answer neural machine translation (CQA-NMT) model to answer such questions.
[040] Although the present disclosure is explained considering that the system 302 is implemented on a server, it may be understood that the system 302 may also be implemented in a variety of computing systems 304, such as a laptop computer, a desktop computer, a notebook, a workstation, a cloud-based computing environment and the like. It will be understood that the system 302 may be accessed through one or more devices 306-1, 306-2... 306-N, collectively referred to as devices 306 hereinafter, or applications residing on the devices 306. Examples of the devices 306 may include, but are not limited to, a portable computer, a personal digital assistant, a handheld device, a smartphone, a tablet computer, a workstation and the like. The devices 306 are communicatively coupled to the system 302 through a network 308.
[041] In an embodiment, the network 308 may be a wireless or a wired network, or a combination thereof. In an example, the network 308 can be implemented as a computer network, as one of the different types of networks, such as virtual private network (VPN), intranet, local area network (LAN), wide area network (WAN), the internet, and such. The network 306 may either be a dedicated
network or a shared network, which represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), and Wireless Application Protocol (WAP), to communicate with each other. Further, the network 308 may include a variety of network devices, including routers, bridges, servers, computing devices, storage devices. The network devices within the network 108 may interact with the system 302 through communication links.
[042] As discussed above, the system 302 may be implemented in a computing device 304, such as a hand-held device, a laptop or other portable computer, a tablet computer, a mobile phone, a PDA, a smartphone, and a desktop computer. The system 302 may also be implemented in a workstation, a mainframe computer, a server, and a network server. In an embodiment, the system 302 may be coupled to a data repository, for example, a repository 312. The repository 312 may store data processed, received, and generated by the system 102. In an alternate embodiment, the system 102 may include the data repository 312.
[043] The network environment 300 supports various connectivity options such as BLUETOOTH®, USB, ZigBee and other cellular services. The network environment enables connection of devices 306 such as Smartphone with the server 104, and accordingly with the database 312 using any communication link including Internet, WAN, MAN, and so on. In an exemplary embodiment, the system 302 is implemented to operate as a stand-alone device. In another embodiment, the system 302 may be implemented to work as a loosely coupled device to a smart computing environment. Further details of the modules of the system 302 are defined further in FIG. 4 below.
[044] FIG. 4 illustrates an example block diagram of a system 400 for complex question answering on knowledge graphs using machine translation and multi-task learning, in accordance with an example embodiment. The system 400 is an example of the system 302 (of FIG. 3). The system 400 is a joint model (CQA-NMT) which is an encoder-decoder based model where, Bidirectional Encoder Representations from Transformers (BERT) is used as an Encoder and a Transformer as a decoder. Herein, BERT to generate path (or inference chains),
perform sequence labeling and classification jointly. Details of each module are described further in the description.
[045] 1. Entity Mention Detection Module: To extract the mentioned entity(s) from NL query, the system 400 performs a sequence labeling task using BERT’s hidden states (FIG. 5). Sequence labeling is a seq2seq task that tags the input word sequence with the output label sequence
In the disclosed embodiments, CQA-NMT is augmented to jointly infer the type of the mentioned entity(s) along with its(their) ‘span’. The final hidden states of the tokens h2, h3,…, hT-1, are fed into a softmax layer to generate output sequence. Herein, the h1 and hT i.e., [CLS] and [SEP] tokens may be ignored as they can never be a part of an entity(s) and are only required as a preprocessing step of BERT. Since BERT uses WordPiece tokenization, the same label is assigned to the other tokenized input corresponding to their first sub-token. For e.g., the output of BERT’s Wordpiece tokenizer is ‘Jim Hen ##son’ for the input ‘Jim Henson’. The labels for the tokenized output are assigned as ‘B-Per I-Per I-per’, i.e., the second sub-word ‘##son’ is given the same label as the first sub-word ‘Hen’. The output of the softmax layer is:
where, hi is the hidden state corresponding to the ith token
[046] Entity Linking: The output of the Entity Mention Detection Module is a sequence of tokens along with its type (ti) for a candidate entity. These mentioned entities still need to be linked to a KG node for traversal. The disclose system relies on an ensemble of string matching algorithms and a graph-based model to capture structural bias in the KG to break the ties between candidate entities. Examples of the graph-based model to capture structural bias in the KG may be Indegree algorithm, Outdegree algorithm, PageRank algorithm and so on. The Entity Mention Detection Module outputs as many entities as provided in a query and their associated type (ti). To link the mentioned entity in the NL query, the candidates are extracted from V(G) of type ti. Then 3 string-matching algorithms are applied, and a majority voting is taken to further break the ties. Finally, a graph-based model is applied to link the mentioned entity with a KG entity. In an example
scenario, the usability of the graph-based model is understood by considering the notion of popularity. For e.g., if a user queries ‘Where was Obama born?’, the user here is more likely referring to the famous Barack Obama, compared to any other. A detailed description of the entity mention detection and the entity linking procedure is shown in FIG. 5.
[047] Path prediction Module: To generate the sequence of predicates for an input query, we augmented our architecture with a Transformer-based Vaswani et al. (2017) decoder which is often used in Neural Machine Translation (NMT) tasks. We define ypath=fp1; p2; :::; pNg where each pi 2 E(G). In our work, we do not constraint the number of predicates (multiple-hops) that are required to extract the final answer. Hence, an obvious choice was to use a decoder module which can stop generating the predicates once it has predicted the end-of-sentence ([EOS]) token (FIG. 5).
[048] 4. Question Type and Answer Entity Type prediction module: In our work, we formulate the task of determining the question type and the AET as a classification task since we have a discrete set for both qtype and Answer Entity Types. Using the hidden states of the first special token from BERT, i.e., [CLS], we predict:
[049] To jointly model all the task using a single architecture, we define our training objective as:
[051] For training the conditional probability is
maximized. The model is finetuned end-to-end via minimizing the cross-entropy loss.
[052] FIG. 6 illustrates an example flow chart of a method 600 for complex question answering on knowledge graphs using machine translation and multi-task learning, in accordance with an example embodiment of the present disclosure. The method 700 depicted in the flow chart may be executed by a system, for example, the system, 302 of FIG. 3. In an example embodiment, the system 302 may be embodied in a computing device.
[053] Operations of the flowchart, and combinations of operation in the flowchart, may be implemented by various means, such as hardware, firmware, processor, circuitry and/or other device associated with execution of software including one or more computer program instructions. For example, one or more of the procedures described in various embodiments may be embodied by computer program instructions. In an example embodiment, the computer program instructions, which embody the procedures, described in various embodiments may be stored by at least one memory device of a system and executed by at least one processor in the system. Any such computer program instructions may be loaded onto a computer or other programmable system (for example, hardware) to produce a machine, such that the resulting computer or other programmable system embody means for implementing the operations specified in the flowchart. It will be noted herein that the operations of the method 600 are described with help of system 302. However, the operations of the method 600 can be described and/or practiced by using any other system.
[054] At 602, the method 600 includes receiving a natural language query (NLQ). Examples of NLQ are presented in FIG. 1. In an embodiment, the NLQ may be received via one or more hardware processors of the system 302.
[055] At 604, the method 600 includes determining one of a presence and an absence of one or more mentioned entities in the NLQ. In certain example
scenarios, the NLQ may not have mentioned entities. For example, for a query 'Show me research areas', there is no mentioned entity but there is an entity type. In some other example scenarios, the NLQ may have only mentioned entity. For example, NLQ at serial number 1 in FIG. 1 has one mentioned entity. In yet another example scenario, the NLQ may have more than one mentioned entities. For example, NLQ at serial number 3 in FIG. 1 has more than one mentioned entities.
[056] At 606, the method 600 includes upon determination of the presence of the one or more mentioned entities in the NLQ, extracting the one or more mentioned entities and the type of the one or more mentioned entities are extracted using a BERT encoder. In an embodiment, extracting the one or more mentioned entities includes performing sequence labeling of the one or more mentioned entities using BERT’s hidden states by tagging an input word sequence associated with the NLQ with an output label sequence. Extraction of the mentioned entities and type thereof are already described further with reference to FIG. 5.
[057] At 608, the method 600 includes linking each mentioned entity of the one or more mentioned entities to a node of a knowledge graph (KG) by using ensemble of string-matching algorithms and graph based models. Herein, the type of the node of the KG and the type of each mentioned entity are same. Using String matching algorithm, the possible candidates are extracted from the KG. For example, for a NLQ "Show me papers by Puneet", the candidate entities for the mentioned entity ‘Puneet’ may be ['R_1: Puneet Agarwal', 'R_2: Puneet Patwari', 'R_3: Puneet Khullar', 'R_4: Neet Singh']. Now to filter out irrelevant entities such as ('Neet' above) 3-string matching algorithms may be employed, where the candidates entities which have similarity score less than a threshold value (e.g, THETA) may be filtered out. Now, after this step, only 3 Puneet's will be remaining. Now, majority voting may be applied. Now, each string-matching algorithm may be employed to vote for R1, R2, and R3 above and if one entity gets more than 1 vote out of 3, then such entity may be selected as the subject entity. In case more than one entity gets same score, then the graph based model may be used to determine the subject entity.
[058] At 610, the method 600 includes generating a sequence of
predicates for the NLQ using a Transformer-based decoder. Each predicate of the sequence of predicates represented by an edge of the KG. The sequence of predicates is representative of a path to be traversed in the KG from mentioned entity to the node of the KG node and further to the answer entity nodes and thus to extract the correct answer for the NLQ. In other words, the sequence of predicates is representative of a set of answer entities corresponding to the one or more entities of the NLQ.
[059] The disclosed method is capable of handling complex topology, as explained in FIG.1, using the supervision of the sequence of predicates annotations. Using the annotations, multi-entity and multi relations (or multi-path or IC) are handled to answer the complex queries. In an embodiment, a semi-colon ';' may be used as a delimiter to separate the paths for each mentioned entity (for example as shown in Query 5 in FIG. 1). In an embodiment, the Transformer-based decoder stops generating the predicates on predicting an end-of-sentence ([EOS]) token.
[060] At 612, the method 600 includes extracting an answer to the NLQ based on the sequence of predicates, the type of the NLQ (qtype), the AET of the set of answer entities and the one or more mentioned entities. The answer to the NLQ is extracted using one or more predefined templates of a formal query. The formal query is capable of being executed on the KG. Example of a formal query may include, but is not limited to, SPARQL query. An example of extracting an answer to a NLQ is explained further in the description below.
[061] FIG. 7 is a block diagram of an exemplary computer system 701 for implementing embodiments consistent with the present disclosure. The computer system 701 may be implemented in alone or in combination of components of the system 102 (FIG. 1). Variations of computer system 701 may be used for implementing the devices included in this disclosure. Computer system 701 may comprise a central processing unit (“CPU” or “hardware processor”) 702. The hardware processor 702 may comprise at least one data processor for executing program components for executing user- or system-generated requests. The processor may include specialized processing units such as integrated system (bus)
controllers, memory management control units, floating point units, graphics processing units, digital signal processing units, etc. The processor may include a microprocessor, such as AMD AthlonTM, DuronTM or OpteronTM, ARM’s application, embedded or secure processors, IBM PowerPCTM, Intel’s Core, ItaniumTM, XeonTM, CeleronTM or other line of processors, etc. The processor 702 may be implemented using mainframe, distributed processor, multi-core, parallel, grid, or other architectures. Some embodiments may utilize embedded technologies like application specific integrated circuits (ASICs), digital signal processors (DSPs), Field Programmable Gate Arrays (FPGAs), etc. The processor 702 may be a multi-core multi-threaded processor.
[062] Processor 702 may be disposed in communication with one or more input/output (I/O) devices via I/O interface 703. The I/O interface 703 may employ communication protocols/methods such as, without limitation, audio, analog, digital, monoaural, RCA, stereo, IEEE-1394, serial bus, universal serial bus (USB), infrared, PS/2, BNC, coaxial, component, composite, digital visual interface (DVI), high-definition multimedia interface (HDMI), RF antennas, S-Video, VGA, IEEE 702.11 a/b/g/n/x, Bluetooth, cellular (e.g., code-division multiple access (CDMA), high-speed packet access (HSPA+), global system for mobile communications (GSM), long-term evolution (LTE), WiMax, or the like), etc.
[063] Using the I/O interface 703, the computer system 701 may communicate with one or more I/O devices. For example, the input device 704 may be an antenna, keyboard, mouse, joystick, (infrared) remote control, camera, card reader, fax machine, dongle, biometric reader, microphone, touch screen, touchpad, trackball, sensor (e.g., accelerometer, light sensor, GPS, gyroscope, proximity sensor, or the like), stylus, scanner, storage device, transceiver, video device/source, visors, etc.
[064] Output device 705 may be a printer, fax machine, video display (e.g., cathode ray tube (CRT), liquid crystal display (LCD), light-emitting diode (LED), plasma, or the like), audio speaker, etc. In some embodiments, a transceiver 706 may be disposed in connection with the processor 702. The transceiver may facilitate various types of wireless transmission or reception. For example, the
transceiver may include an antenna operatively connected to a transceiver chip (e.g., Texas Instruments WiLink WL1283, Broadcom BCM4750IUB8, Infineon Technologies X-Gold 618-PMB9800, or the like), providing IEEE 702.11a/b/g/n, Bluetooth, FM, global positioning system (GPS), 2G/3G HSDPA/HSUPA communications, etc.
[065] In some embodiments, the processor 702 may be disposed in communication with a communication network 808 via a network interface 707. The network interface 707 may communicate with the communication network 708. The network interface may employ connection protocols including, without limitation, direct connect, Ethernet (e.g., twisted pair 10/100/1000 Base T), transmission control protocol/internet protocol (TCP/IP), token ring, IEEE 702.11a/b/g/n/x, etc. The communication network 708 may include, without limitation, a direct interconnection, local area network (LAN), wide area network (WAN), wireless network (e.g., using Wireless Application Protocol), the Internet, etc. Using the network interface 707 and the communication network 708, the computer system 701 may communicate with devices 709 and 710. These devices may include, without limitation, personal computer(s), server(s), fax machines, printers, scanners, various mobile devices such as cellular telephones, smartphones (e.g., Apple iPhone, Blackberry, Android-based phones, etc.), tablet computers, eBook readers (Amazon Kindle, Nook, etc.), laptop computers, notebooks, gaming consoles (Microsoft Xbox, Nintendo DS, Sony PlayStation, etc.), or the like. In some embodiments, the computer system 701 may itself embody one or more of these devices.
[066] In some embodiments, the processor 702 may be disposed in communication with one or more memory devices (e.g., RAM 713, ROM 714, etc.) via a storage interface 712. The storage interface may connect to memory devices including, without limitation, memory drives, removable disc drives, etc., employing connection protocols such as serial advanced technology attachment (SATA), integrated drive electronics (IDE), IEEE-1394, universal serial bus (USB), fiber channel, small computer systems interface (SCSI), etc. The memory drives may further include a drum, magnetic disc drive, magneto-optical drive, optical
drive, redundant array of independent discs (RAID), solid-state memory devices, solid-state drives, etc. Variations of memory devices may be used for implementing, for example, any databases utilized in this disclosure.
[067] The memory devices may store a collection of programs or database components, including, without limitation, an operating system 716, user interface application 717, user/application data 718 (e.g., any data variables or data records discussed in this disclosure), etc. The operating system 716 may facilitate resource management and operation of the computer system 701. Examples of operating systems include, without limitation, Apple Macintosh OS X, Unix, Unix-like system distributions (e.g., Berkeley Software Distribution (BSD), FreeBSD, NetBSD, OpenBSD, etc.), Linux distributions (e.g., Red Hat, Ubuntu, Kubuntu, etc.), IBM OS/2, Microsoft Windows (XP, Vista/7/8, etc.), Apple iOS, Google Android, Blackberry OS, or the like. User interface 717 may facilitate display, execution, interaction, manipulation, or operation of program components through textual or graphical facilities. For example, user interfaces may provide computer interaction interface elements on a display system operatively connected to the computer system 701, such as cursors, icons, check boxes, menus, scrollers, windows, widgets, etc. Graphical user interfaces (GUIs) may be employed, including, without limitation, Apple Macintosh operating systems’ Aqua, IBM OS/2, Microsoft Windows (e.g., Aero, Metro, etc.), Unix X-Windows, web interface libraries (e.g., ActiveX, Java, Javascript, AJAX, HTML, Adobe Flash, etc.), or the like.
[068] In some embodiments, computer system 701 may store user/application data 718, such as the data, variables, records, etc. as described in this disclosure. Such databases may be implemented as fault-tolerant, relational, scalable, secure databases such as Oracle or Sybase. Alternatively, such databases may be implemented using standardized data structures, such as an array, hash, linked list, structured text file (e.g., XML), table, or as hand-oriented databases (e.g., using HandStore, Poet, Zope, etc.). Such databases may be consolidated or distributed, sometimes among various computer systems discussed above. It is to be understood that the structure and operation of any computer or database
component may be combined, consolidated, or distributed in any working combination.
[069] Additionally, in some embodiments, (the server, messaging and instructions transmitted or received may emanate from hardware, including operating system, and program code (i.e., application code) residing in a cloud implementation. Further, it should be noted that one or more of the systems and methods provided herein may be suitable for cloud-based implementation. For example, in some embodiments, some or all of the data used in the disclosed methods may be sourced from or stored on any cloud computing platform.
[070] For demonstrating the end-to-end processing of the disclosed embodiments, an experiment scenario is presented below.
[071] Example scenario
[072] In the example scenario, LOCA dataset was used which consists of 5010 entities, 42 unique predicates, and a total of 45,869 facts. The dataset has 3,275 one or multi-hop questions that have 0, 1, or more entities mentioned in the questions. It contains multiple question types like count, factoid, and boolean. For the questions with multiple entities, an operator “;” was used as a delimiter to separate paths corresponding to each entity (in FIG. 1, query 5, 7, and 8). For the scope of experimentation, those queries were considered which involved the only intersection which can be replaced with other operators like union, set-difference, etc. without loss of any generality. The operator “;” help us detect and predict the different topologies involved in an NL query.
[073] MetaQA: The dataset proposed in Zhang et al. (2018) consists of 3 different datasets namely, Vanilla, NTM, and, Audio Data. All the datasets contain
single and multi-hop (maximum 3-hop) questions from the movie domain. For our experiments, we used the Vanilla and the NTM version of the datasets and the KB as provided in Zhang et al. (2018). Since, both versions of MetaQA do not consider the AET and question type, we assigned a default label to both the tasks.
[074] Metrics: Different metrics were used for different subtasks. Since a query can contain partially mentioned entities, F-score was used to evaluate mention and its type detection module. For Inference Chain (or Path prediction), question type, and, answer entity type prediction, the accuracy measure was used. In Table 2, the Hits@1 was used to evaluate the query-answer accuracy.
Table 2: Results of the baselines and CQA-NMT on MetaQA (vanilla and
NTM) and LOCA dataset
[075] Baselines: The baselines used were KV-mem, GraftNet, PullNet, VRN and EmbedKGQA.
[076] All the baselines and the proposed approach were trained on DGX 32GB NVIDIA GPU using TensorFlow and Texar libraries. For CQA-NMT, the
small uncased version of pre-trained BERT model was used. Adam Kingma and Ba optimizer were employed with a learning rate of 2e-5 for BERT and default for others. The training objective of each model was maximized using the cross-entropy loss and the best models were selected using the validation loss. Dropout values were set to .5 and were optimized. For BERT, 10% of total training data was used for the warmup phase of BERT. Finally, for the division of dataset into train, test, and, dev, the split used was a ratio of 80-10-10 for LOCA dataset.
[077] The experimental results for LOCA dataset are shown in the last row of Table 2. The results affirm that the proposed approach outperforms the baselines. It was observed that the baselines’ inability to handle Duplicate KG Entity limits their performance. Additionally, the ability of the NMT model to effectively handle complex and un-known topologies helped to retrieve answers with better accuracy for variable-hop (v-hop) queries.
[078] The experimental results for MetaQA are shown in Table 2. For Vanilla MetaQA, better answer accuracy was achieved on 1-hop and 3-hop settings. However, in a 2-hop setting, results comparable to state-of-the-art were acheived. An increment of about 2% and 4.9% Hits@1 can be seen in the 1-hop and 3-hop settings. To obtain the performance of each baseline on v-hop (variable-hop) dataset, the existing models were re-used for 1-hop, 2-hop, and 3-hop and it was assumed that there is an oracle which can redirect query to the correct model. Thus, estimated accuracy of various approaches is shown in the 4th row of Table 2, while the actual results on v-hop dataset are shown in the 5th row. It is evident that CQA-NMT outperforms all the baselines on MetaQA dataset in variable hop setting. To gauge the effectiveness and robustness of our model, the same models trained on vanilla MetaQA dataset were used and its performance was evaluated on NTM Meta QA, i.e., in zero-shot setting. For this, better results were achieved on 1 and 3-hop. The worse performance of CQA-NMT on MetaQANMT(2-hop) can be because of zero shot setting. Because, as compared to VRN, CQA-NMT was not trained on MetaQA-NTM dataset, it was trained on MetaQA vanilla dataset only.
[079] Advantage of Transformers: In the LSTM based implementation of mentioned entity detection, it could not detect different entity types for the same
phrase “deep learning” in query 1 and 2 of FIG. 1. However, in BERT-based approach it was able to. It can therefore be inferred that such phenomenon could occur due to key features of BERT such as multi head attention, WordPiece embeddings, Positional embeddings, and/ or Segment embeddings. Moreover, in a different context, it was able to assign different types to the entities with the same mentions (Query 1 and 2 from FIG. 1).
[080] Effects of using less annotations: To study the importance of annotation in the disclosed approach, several components were removed from the disclosed approach and studied the effects (Table 3).
Mention
Detection AET Inference Chains Question Type Answer Accuracy
Unsupervised 67.12 533 510 - 42.9
BERT( Mention) 833 53 22 519 - 47.22
BERT (AET only} 67 38 75.7 55.6 - 45.2
BERT (IC only) 67.8 50.9 80.1 - 40.1
BERT (mention and AET) 871 76.3 716 - S5.33
BERT (AET and IC) 66.3 74.1 77.8 - 60.39
BERT (Mention and IC) 873 531 79.6 - 51.3
CQA-NMT 93.66 79.89 8195 97.65 71.01
Table 3: Effects for reducing the supervision from the disclosed approach. The numbers in italics are obtained without any supervision
[081] First CQA-NMT was studied after removing all the supervision and used heuristics-based-approaches for AET and Mention Detection. The shortest path between the linked KG entity and AET, was then taken to retrieve the answers. This setting (row 1) results in the worst performance. In row 2, 3, and 4 of Table 3, only one component of CQA-NMT was kept as supervised and heuristics were applied for others as mentioned above. As evident from these rows, mention detection plays a crucial role in extracting the correct answer (a jump in range of 2%-5% in answer accuracy).
[082] In summary, while testing each component of CQA-NMT, supervising different components at a time was attempted and heuristics-based approaches were used for the remaining components. The heuristics are:
a) Shortest path algorithm for Path Prediction Task.
b) Candidate and Relation Pairing Score for Entity Linking.
c) LSTM based Classifier for Answer Entity Type Prediction. [ however in Row 1 “Unsupervised”, the AET was identified if name of the entity-type is present in the NLQ].
[083] Benefits of joint training: From row 5-7 of Table 3, it was inferred that joint training not only improves the scores of individual components (in range 15%- 20%) but also, the overall answer accuracy. It was observed that the challenges relation names mismatch and implicit relations indication were handled significantly better after jointly training CQA-NMT for AET and mention detection (row 5). We found that the context used by a mentioned entity for AET was different in different queries. For e.g., in queries ‘Who heads Deep Learning?’ and ‘papers in Deep Learning’, Deep Learning, is a research area with AET ‘head’ in the former, and in the later, is a keyword with AET ‘paper.title’. After jointly training the mention detection model with AET or IC, a jump of ~4% can be observed for the individual components and ~11-20% in answer accuracy.
[084] Errors with Joint Training: From table 3, it seems that the joint training helps. However, it was also found that the error made by a single module produces a cascading of errors. For example, if for a single mention span ‘deep learning and ai’ with the gold label as ‘B-research area I-research area I-research area I-research area’, the entity detection predicts ‘B-research area I-research area O B-keyword’, the path prediction module generates two different paths, one for each entity, leading to wrong answer.
[085] Similarly, if AET prediction module makes an ‘error’, it gets propagated to all the other modules simultaneously. The frequency of such error, however, was considerably small and in range of 1-2% of total training or test data.
[086] Robustness against variable hops: All the 3 vanilla MetaQA datasets were combined into a single dataset. The results on this new dataset, MetaQA v-
hop (variable-hop) is shown in Table 2. Since the other approaches in the literature did not perform such analysis, MetaQA v-hop is a new challenge that was proposed.
[087] Motivation for PageRank: When there are more than one candidate entity for a mentioned entity, the one with higher popularity may be chosen. Further, when more than one entity are mentioned in an NLQ, there can be more than one candidate entity for each of them. The graph-based approach also helps in choosing the candidates that are well connected.
[088] Experiments were also performed using other measures such as in-degree and out-degree of nodes. However, for LOCA dataset, an increment of 22% was achieved using PageRank on Entity Linking task, as compared to the in-degree and out-degree measures. PageRank also helped in reducing the challenges such as incomplete entity mention and co-occurrence disambiguation
[089] Retrieval of answer(s) from KG
[090] The final objective of a KG-QA system is to retrieve the correct answer from KG against a query q. To this end, the outputs of the different components of CQA-NMT were used and fed to complete the pre-written SPARQL sketchs. A bunch of rules were defined for different question-types and a simple-mapping rules were used to map the queries to the sketches. For e.g., consider the query, q = “Who is working in automated regulatory compliance and has published a paper in NLP?”. The output of CQA-NMT contains all the information that is required to form a structured query such as SPARQL. The outputs of CQA-NMT are:
[091] 1. Linked Entities: {e5: automated regulatory compliance (sub-area), e6: NLP (keyword)|
[092] 2. Inference Chain: key person; has paper, author
[093] 3. Answer Entity Type (AET): researcher.name
[094] 4. Question Type (qtype): Factoid
[095] After using the qtype information, a sketch using other outputs was filled. The generated SPARQL query is:
where, e5 and e6 are unique identities assigned to ‘automated regulatory
compliance’ (of type subarea) and NLP (of type keyword).
[097] The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
[098] Various embodiments disclosed herein provides method and system for complex question answering on knowledge graphs using machine translation and multi-task learning. Complex version of the KGQA problem involves mention of multiple entities in the question. Multiple sequence of relationships combined in complex topologies, are required to answer such questions. It is evident that such questions, while required to be answered in real world industrial setting, cannot be answered using prior approaches. The disclosed method and system provide a CQA-NMT model to answer such questions.
[099] It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g. any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g. hardware means like e.g. an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g. an ASIC and an FPGA, or at least one microprocessor and at least one memory with software processing components located therein. Thus, the means can include both hardware means and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software
means. Alternatively, the embodiments may be implemented on different hardware devices, e.g. using a plurality of CPUs.
[0100] The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various components described herein may be implemented in other components or combinations of other components. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
[0101] The illustrated steps are set out to explain the exemplary
embodiments shown, and it should be anticipated that ongoing technological
development will change the manner in which particular functions are performed.
These examples are presented herein for purposes of illustration, and not limitation.
Further, the boundaries of the functional building blocks have been arbitrarily
defined herein for the convenience of the description. Alternative boundaries can
be defined so long as the specified functions and relationships thereof are
appropriately performed. Alternatives (including equivalents, extensions,
variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
[0102] Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-
readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
[0103] It is intended that the disclosure and examples be considered as exemplary only, with a true scope of disclosed embodiments being indicated by the following claims.
We Claim:
1. A processor implemented method, comprising:
receiving a natural language query (NLQ), via one or more hardware processors;
determining one of a presence and an absence of one or more mentioned entities in the NLQ, via the one or more hardware processors;
upon determination of the presence of the one or more mentioned entities in the NLQ, extracting the one or more mentioned entities and the type of the one or more mentioned entities using a Bidirectional Encoder Representations from Transformers (BERT) encoder, via the one or more hardware processors; linking each mentioned entity of the one or more mentioned entities to a node of a knowledge graph (KG) by using ensemble of string-matching algorithms and a graph-based model capable of capturing structural bias in the KG, via the one or more hardware processors, wherein a type of the node of the KG and the type of the each mentioned entity are same;
generating a sequence of predicates for the NLQ using a Transformer-based decoder, via one or more hardware processors, each predicate represented by an edge of the KG, and the sequence of predicates is representative of a path to be traversed from the mentioned entity to the node of the KG node and further to the answer entity; and
extracting an answer to the NLQ based on the sequence of predicates, the type of the NLQ (qtype), the AET of the set of answer entities and the one or more mentioned entities, via the one or more hardware processors, using one or more predefined templates of a formal query, the formal query capable of being executed on the KG.
2. The method of claim 1, wherein extracting the one or more
mentioned entities comprises performing sequence labeling of the one or more mentioned entities using BERT’s hidden states by
tagging an input word sequence associated with the NLQ with an output label sequence.
3. The method of claim 1, wherein linking a mentioned entity from
amongst the one or more mentioned entities to the node of the KG comprises:
extracting one or more candidates from V(G) of same type as that of the type of the mentioned entity,
applying 3 string-matching algorithms and taking a majority voting to further break the ties, and
applying the graph-based model to link the mentioned entity with a KG entity.
4. The method of claim 1, wherein the NLQ is associated with a plurality of topologies corresponding to a set of sequences of predicates, and wherein the set of sequences of predicates are separated by a delimiter.
5. The method of claim 1, wherein the Transformer-based decoder stops generating the predicates on predicting an end-of-sentence ([EOS]) token.
6. The method of claim 1, wherein the one or more types of NLQs comprises a Boolean type, a factoid type and a count type.
7. A system (701), comprising:
a memory (715) storing instructions;
one or more communication interfaces (702); and
one or more hardware processors (703) coupled to the memory (715)
via the one or more communication interfaces (702), wherein the one or
more hardware processors (703) are configured by the instructions to:
receive a natural language query (NLQ), via one or more hardware processors;
determine one of a presence and an absence of one or more mentioned entities in the NLQ;
upon determination of the presence of the one or more mentioned entities in the NLQ, extract the one or more mentioned entities and the type of the one or more mentioned entities using a Bidirectional Encoder Representations from Transformers (BERT) encoder;
link each mentioned entity of the one or more mentioned entities to a node of a knowledge graph (KG) by using ensemble of string-matching algorithms and a graph-based model capable of capturing structural bias in the KG, via the one or more hardware processors, wherein a type of the node of the KG and the type of the each mentioned entity are same; generate a sequence of predicates for the NLQ using a Transformer-based decoder, each predicate of the sequence of predicates represented by an edge of the KG, the sequence of predicates is representative of a path to be traversed in the KG from mentioned entity to the node of the KG node and further to the answer entity; and
extract an answer to the NLQ based on the sequence of predicates, the type of the NLQ (qtype), the AET of the set of answer entities and the one or more mentioned entities, using one or more predefined templates of a formal query, the formal query capable of being executed on the KG.
8. The system of claim 6, The method of claim 1, wherein extracting the one or more mentioned entities comprises performing sequence labeling of the one or more mentioned entities using BERT’s hidden states by tagging an input word sequence associated with the NLQ with an output label sequence.
9. The method of claim 1, wherein linking a mentioned entity from amongst the one or more mentioned entities to the node of the KG comprises:
extracting one or more candidates from V(G) of same type as that of the
type of the mentioned entity,
applying 3 string-matching algorithms and taking a majority voting to
further break the ties, and applying the graph-based model to link the mentioned entity with a KG entity.
10. The method of claim 1, wherein the NLQ is associated with a plurality of topologies corresponding to a set of sequences of predicates, and wherein the set of sequences of predicates are separated by a delimiter.
11. The method of claim 1, wherein the Transformer-based decoder stops generating the predicates on predicting an end-of-sentence ([EOS]) token.
12. The method of claim 1, wherein the one or more types of NLQs comprises a Boolean type, a factoid type and a count type.
| # | Name | Date |
|---|---|---|
| 1 | 202121011866-STATEMENT OF UNDERTAKING (FORM 3) [19-03-2021(online)].pdf | 2021-03-19 |
| 2 | 202121011866-REQUEST FOR EXAMINATION (FORM-18) [19-03-2021(online)].pdf | 2021-03-19 |
| 3 | 202121011866-PROOF OF RIGHT [19-03-2021(online)].pdf | 2021-03-19 |
| 4 | 202121011866-FORM 18 [19-03-2021(online)].pdf | 2021-03-19 |
| 5 | 202121011866-FORM 1 [19-03-2021(online)].pdf | 2021-03-19 |
| 6 | 202121011866-FIGURE OF ABSTRACT [19-03-2021(online)].jpg | 2021-03-19 |
| 7 | 202121011866-DRAWINGS [19-03-2021(online)].pdf | 2021-03-19 |
| 8 | 202121011866-DECLARATION OF INVENTORSHIP (FORM 5) [19-03-2021(online)].pdf | 2021-03-19 |
| 9 | 202121011866-COMPLETE SPECIFICATION [19-03-2021(online)].pdf | 2021-03-19 |
| 10 | 202121011866-FORM-26 [21-10-2021(online)].pdf | 2021-10-21 |
| 11 | Abstract1.jpg | 2022-02-22 |
| 12 | 202121011866-FER.pdf | 2023-05-08 |
| 13 | 202121011866-OTHERS [22-09-2023(online)].pdf | 2023-09-22 |
| 14 | 202121011866-FER_SER_REPLY [22-09-2023(online)].pdf | 2023-09-22 |
| 15 | 202121011866-CLAIMS [22-09-2023(online)].pdf | 2023-09-22 |
| 1 | SearchStrategyMatrixE_04-05-2023.pdf |
| 2 | SearchHistory(11)AE_04-12-2023.pdf |