System And Method To Translate A Natural Language Query To Retrieve

< Back

System And Method To Translate A Natural Language Query To Retrieve Domain Specific Information

Abstract: ABSTRACT SYSTEM AND METHOD TO TRANSLATE A NATURAL LANGUAGE QUERY TO RETRIEVE DOMAIN SPECIFIC INFORMATION Existing systems have issues in identifying a correct table and column entities based on a domain-based natural language query. Embodiments provide a system and method to translate a natural language query using one or more metadata configurations to retrieve a domain specific report from a Relational Database Management System (RDBMS). Herein, the system is configured to receive and translate query to generate answers from RDBMS. The system pre-processes and parses the received query based on NLP technique and metadata configuration files. At least one entity is identified from the pre-processed query. Further, the query is converted into a SQL format, wherein one or more SQL components are used in the SQL format of the query. Further, the system is configured to identify a database pertaining to the detected entity from the received query. A handler function is triggered with the SQL query to generate an answer to the user’s query. [To be published with FIG. 2]

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

28 June 2021

Publication Number

52/2022

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

kcopatents@khaitanco.com

Parent Application

Applicants

Tata Consultancy Services Limited

Nirmal Building, 9th Floor, Nariman Point Mumbai Maharashtra 400021 India

Inventors

1. SAHOO, Nihar Ranjan

Tata Consultancy Services Limited SDF V, Santacruz Electronic Export Processing Zone, Andheri (East), Mumbai Maharashtra 400096 India

2. KSHIRSAGAR, Mahesh

Tata Consultancy Services Limited SDF V, Santacruz Electronic Export Processing Zone, Andheri (East), Mumbai Maharashtra 400096 India

Specification

Claims:We Claim:
1. A processor-implemented method (400) for translating a natural language query into structured query language using one or more metadata configurations to retrieve a domain specific report from a database comprising:
receiving (402), via an input/output interface, the natural language query from a user, wherein the natural language query comprising of text and voice;
parsing (404), via one or more hardware processors, the received natural language query into a computer-interpretable semantic representation of the natural language query using a natural language parser, wherein the parsing comprises:
executing stopwords removal, lemmatization, tokenization, POS tagging, and chunking to determine relationship among words of the received natural language query;
generating a set of clauses based on relationships among words included in the natural language query using a hybrid approach; and
pruning the set of clauses to map syntactic patterns onto semantic relations correspond to the computer-interpretable semantic representation;
converting (406), via the one or more hardware processors, the computer-interpretable semantic representation into a computer-interpretable logical schema using a set of axioms;
classifying (408), via the one or more hardware processors, a plurality of terms of the computer interpretable semantic representation and the computer-interpretable logical schema to build a domain ontology;
determining (410), via the one or more hardware processors, a solution to the computer-interpretable logical schema using a pre-trained artificial intelligence (AI) model, the domain ontology and at least one data source; and
retrieving (412), via the one or more hardware processors, a domain specific report based on the determined solution and a heuristic search, wherein the heuristic search is a rule-based query optimization that involves equivalence rule on relational expressions.
2. The processor-implemented method (400) of claim 1, further comprising:
maintaining, via the one or more hardware processors, logs of the user interactions and associated queries;
creating, via the one or more hardware processors, the computer-interpretable semantic representation from the maintained logs on receiving the natural language query from the user; and
building, via the one or more hardware processors, a context to query the domain ontology managed in the at least one data source for an interactive report.
3. A system (100) for translating a natural language query into structured query language using one or more metadata configurations to retrieve a domain specific report from a database comprising:
an input/output interface (104) to receive the natural language query from a user, wherein the natural language query comprising of text and voice;
one or more hardware processors;
a memory in communication with the one or more hardware processors (108), wherein the one or more hardware processors (108) are configured to execute programmed instructions stored in the memory, to:
parse the received natural language query into a computer-interpretable semantic representation of the natural language query using a natural language parser;
convert the computer-interpretable semantic representation into a computer-interpretable logical schema using a set of axioms;
classify a plurality of terms of the computer interpretable semantic representation and the computer interpretable logical schema to build a domain ontology;
determine a solution to the computer-interpretable logical schema using a pre-trained artificial intelligence (AI) model, the domain ontology and at least one data source; and
retrieve a domain specific report based on the determined solution and a heuristic search, wherein the heuristic search is a rule-based query optimization that involves equivalence rule on relational expressions.
4. The system (100) of claim 3, wherein the parsing comprises:
executing stopwords removal, lemmatization, tokenization, POS tagging, and chunking to determine relationship among words of the natural language query;
generating a set of clauses based on relationships among words included in the natural language query using a hybrid approach; and
pruning the set of clauses to map the syntactic patterns onto semantic relations correspond to the computer-interpretable semantic representation.
5. The system (100) of claim 3, further comprising:
maintaining, via the one or more hardware processors, logs of the user interactions and associated queries;
creating, via the one or more hardware processors, the computer-interpretable semantic representation from the maintained logs on receiving the natural language query from the user; and
building, via the one or more hardware processors, a context to query the ontology managed in the at least one data source for an interactive report.
6. A non-transitory computer readable medium storing one or more instructions which when executed by one or more processors on a system, cause the one or more processors to perform method comprising:
receiving, via an input/output interface, the natural language query from a user, wherein the natural language query comprising of text and voice;
parsing, via one or more hardware processors, the received natural language query into a computer-interpretable semantic representation of the natural language query using a natural language parser, wherein the parsing comprises:
executing stopwords removal, lemmatization, tokenization, POS tagging, and chunking to determine relationship among words of the received natural language query;
generating a set of clauses based on relationships among words included in the natural language query using a hybrid approach; and
pruning the set of clauses to map syntactic patterns onto semantic relations correspond to the computer-interpretable semantic representation;
converting, via the one or more hardware processors, the computer-interpretable semantic representation into a computer-interpretable logical schema using a set of axioms;
classifying, via the one or more hardware processors, a plurality of terms of the computer interpretable semantic representation and the computer interpretable logical schema to build a domain ontology;
determining, via the one or more hardware processors, a solution to the computer-interpretable logical schema using a pre-trained artificial intelligence (AI) model, the domain ontology and at least one data source; and
retrieving, via the one or more hardware processors, a domain specific report based on the determined solution and a heuristic search, wherein the heuristic search is a rule-based query optimization that involves equivalence rule on relational expressions.
7. The non-transitory computer readable medium of claim 6, further comprising:
maintaining, via the one or more hardware processors, logs of the user interactions and associated queries;
creating, via the one or more hardware processors, the computer-interpretable semantic representation from the maintained logs on receiving the natural language query from the user; and
building, via the one or more hardware processors, a context to query the domain ontology managed in the at least one data source for an interactive report.
Dated this 28th Day of June 2021
Tata Consultancy Services Limited
By their Agent & Attorney

(Adheesh Nargolkar)
of Khaitan & Co
Reg No IN-PA-1086 , Description: TECHNICAL FIELD
[001] The disclosure herein generally relates to a field of natural language interface to a database, a system and method to translate a natural language query using metadata configuration approach to retrieve a domain specific report from a database.

BACKGROUND
[002] Natural language processing (NLP) systems play a central role in human interaction with computers and even among humans themselves. In particular, NLP is a multidisciplinary field related to how the computers should be programmed for analyzing large set of human language data. There has been another set of system, popularly known as natural language interfaces to databases (NLIDB).
[003] NLIDB systems let the users retrieve information out of the database by querying requested in a similar way humans interact with each other. NLIDB have gone through dramatic breakthrough as information retrieved from them makes business decision an easy task. Typing queries in natural language to retrieve answers from databases has become an appropriate and convenient method for accessing databases. However, getting such information directly from databases demand knowledge of programming language such as structured query language (SQL). To overcome the complexity of writing SQL, the natural language query is used instead of using such technical language. Therefore, each query asked to the system in natural language will first be processed using various NLP techniques and then processed query would be converted into its SQL form.
[004] However, the existing NLIDB systems can deal only with limited subset of natural languages. The NLIDB systems have drawbacks like handling linguistic capabilities and most of them have no reasoning capabilities. Further, the pattern-based approach works on set of rules and generates SQL query but herein if any context that is out of the given rule then that will fail. Moreover, the existing systems have issues in identifying a correct table and column entities based on a domain-based query.

SUMMARY
[005] Embodiments of the present disclosure provides technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one embodiment, a method and system to translate a natural language query using one or more metadata configurations to retrieve a domain specific report from a database is provided.
[006] In one aspect, a system is configured to translate a natural language query using one or more metadata configurations to retrieve a domain specific report from a database. The system comprises an input/output interface to receive a natural language query from a user, one or more hardware processors and a memory in communication with the one or more hardware processors to execute programmed instructions stored in the memory. The system is configured to receive at least one query from a user to generate answer from a database. The system is configured to parse the received natural language query into a computer-interpretable semantic representation of the natural language query using a natural language parser, convert the computer-interpretable semantic representation into a computer-interpretable logical schema, classify terms of one or more of the computer interpretable semantic representation and the computer interpretable logical schema to build a domain ontology. Further, the system is configured to determine a solution to the computer-interpretable logical schema using a pre-trained artificial intelligence (AI) model, the domain ontology and at least one data source. Finally, the system retrieves the report to the natural language query using the solution to the computer-interpretable logical schema. It would be appreciated that the heuristic search is a rule-based query optimization that involves equivalence rule on relational expressions.
[007] Furthermore, the system is configured to maintain logs of the user interactions and associated queries, creates the computer-interpretable semantic representation from the maintained logs on receiving a natural language query from a user, and builds a context to query the domain ontology managed in the at least one data source for an interactive report.
[008] In another aspect, a processor-implemented method to translate a natural language query using one or more metadata configurations to retrieve a domain specific report from a database. The processor-implemented method comprises one or steps as follows. An input/output interface receives a natural language query from a user that the natural language query comprising of text and voice. The received natural language query is parsed into a computer-interpretable semantic representation of the natural language query using a natural language parser, and the computer-interpretable semantic representation is converted into a computer-interpretable logical schema. Terms of one or more of the computer interpretable semantic representation and the computer interpretable logical schema are classified to build a domain ontology. Further, a solution to the computer-interpretable logical schema is determined using a pre-trained artificial intelligence (AI) model, the domain ontology and at least one data source. Finally, a domain specific report is retrieved by use of a heuristic search to the natural language query using the solution to the computer-interpretable logical schema. It would be appreciated that the heuristic search is a rule-based query optimization that involves equivalence rule on relational expressions.
[009] Furthermore, the method comprising maintaining logs of the user interactions and associated queries, creating the computer-interpretable semantic representation from the maintained logs on receiving a natural language query from a user, and building a context to query the domain ontology managed in the at least one data source for an interactive report.
[010] In yet another aspect, there are provided one or more non-transitory machine-readable information storage mediums comprising one or more instructions, which when executed by one or more hardware processors causes a method for statistical subject identification from variety of unseen input data is provided. The processor-implemented method comprises one or steps as follows. An input/output interface receives a natural language query from a user that the natural language query comprising of text and voice. The received natural language query is parsed into a computer-interpretable semantic representation of the natural language query using a natural language parser, and the computer-interpretable semantic representation is converted into a computer-interpretable logical schema. Terms of one or more of the computer interpretable semantic representation and the computer interpretable logical schema are classified to build a domain ontology. Furthermore, a solution to the computer-interpretable logical schema is determined using a pre-trained artificial intelligence (AI) model, the domain ontology and at least one data source. Finally, a report is retrieved by use of a heuristic search to the natural language query using the solution to the computer-interpretable logical schema. It would be appreciated that the heuristic search is a rule-based query optimization that involves equivalence rule on relational expressions.
[011] It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

BRIEF DESCRIPTION OF THE DRAWINGS
[012] The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles.
[013] FIG. 1 illustrates an exemplary system for translating a natural language query using one or more metadata configurations to retrieve a domain specific report from a database, in accordance with some embodiments of the present disclosure.
[014] FIG. 2 is a schematic architecture of the system of FIG. 1 for translating a natural language query using one or more metadata configurations to retrieve a domain specific report from a database, in accordance with some embodiments of the present disclosure.
[015] FIG. 3 is a functional block diagram to illustrate SQL query generation, in accordance with some embodiments of the present disclosure.
[016] FIG. 4 is a flow diagram to illustrate a method to translate a natural language query using one or more metadata configurations to retrieve a domain specific report from a database, in accordance with some embodiments of the present disclosure.
[017] It should be appreciated by those skilled in the art that any block diagrams herein represent conceptual views of illustrative systems and devices embodying the principles of the present subject matter. Similarly, it will be appreciated that any flow charts, flow diagrams, and the like represent various processes, which may be substantially represented in computer readable medium and so executed by a computer or processor, whether or not such computer or processor is explicitly shown.

DETAILED DESCRIPTION OF EMBODIMENTS
[018] Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments.
[019] The embodiments herein provide a system and method to translate a natural language query using one or more metadata configurations to retrieve a domain specific report from a database. It would be appreciated that the natural language processing (NLP) plays a central role in human interaction with computers and specifically the NLP is a multidisciplinary related to how the computers should be programmed for analyzing large set of human language data. Further, natural language interfaces to databases is also an important arrangement through which users can retrieve information from database by querying requests in the natural language data.
[020] Referring now to the drawings, and more particularly to FIG. 1 through FIG. 4, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method. The system is configured to receive and translate at least one query from a user to generate answer from a database. The system is configured to pre-process the received query based on at least one NLP technique and one or more predefined metadata configuration files. At least one entity is identified from the pre-processed query. Further, the pre-processed query is converted into a SQL format, wherein one or more SQL components are used in the SQL format of the query. Further, the system is configured to identify at least one database pertaining to the detected at least one entity from the received query. A handler function is triggered with the SQL query to generate an answer to the user’s query.
[021] FIG. 1 illustrates an exemplary system (100) that translates a natural language query using one or more metadata configurations to retrieve a domain specific report from a database, in accordance with an example embodiment. Although the present disclosure is explained considering that the system (100) is implemented on a server, it may be understood that the system (100) may comprise one or more computing devices (102), such as a laptop computer, a desktop computer, a notebook, a workstation, a cloud-based computing environment and the like. It will be understood that the system 100 may be accessed through one or more input/output interfaces 104-1, 104-2... 104-N, collectively referred to as I/O interface (104). Examples of the I/O interface (104) may include, but are not limited to, a user a portable computer, a personal digital assistant, a handheld device, a smartphone, a tablet computer, a workstation, and the like. The I/O interface (104) are communicatively coupled to the system (100) through a communication network (106).
[022] In an embodiment, the communication network (106) may be a wireless or a wired network, or a combination thereof. In an example, the communication network (106) can be implemented as a computer network, as one of the different types of networks, such as virtual private network (VPN), intranet, local area network (LAN), wide area network (WAN), the internet, and such. The communication network (106) may either be a dedicated network or a shared network, which represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), and Wireless Application Protocol (WAP), to communicate with each other. Further, the communication network (106) may include a variety of network devices, including routers, bridges, servers, computing devices, storage devices. The network devices within the communication network (106) may interact with the system (100) through communication links.
[023] The system (100) supports various connectivity options such as BLUETOOTH®, USB, ZigBee and other cellular services. The communication network (106) environment enables connection of various components of the system (100) using any communication link including Internet, WAN, MAN, and so on. In an exemplary embodiment, the system (100) is implemented to operate as a stand-alone device. In another embodiment, the system (100) may be implemented to work as a loosely coupled device to a smart computing environment. The components and functionalities of the system (100) are described further in detail.
[024] Referring FIG. 2, a schematic architecture (200) of the system (100) to translate a natural language query using one or more metadata configurations to retrieve a domain specific report from a database, in accordance with an example embodiment. The system (100) comprises at least one memory with a plurality of instructions, one or more databases (112), one or more input/output (I/O) interfaces (104) and one or more hardware processors (108) which are communicatively coupled with the at least one memory to execute a plurality of modules therein.
[025] The one or more I/O interfaces (104) are configured to receive an input data including a text string, one or more images, one or more videos and one or more audios, wherein the received one or more images, one or more videos and the audio are converted into a text string using a transcription model as shown in FIG. 3. The one or more I/O interfaces (104) are configured to recommend at least one subject based on the score to each of the one or more mapped subjects from the system (100) back to the user via a web interface layer (106).
[026] In one embodiment, wherein the system (100) is configured to to translate a natural language query using one or more metadata configurations to retrieve a domain specific report from a database. Herein, the metadata comprises of the key information of the enterprise and is inbuilt in the system (100). These are extracted from a concept model that lists the key concepts and mentions along with weights. Based on the higher frequency of the concepts, and vetted by domain expert, the metadata will be finalized.
[027] It is to be noted that the database contents are often prone to both internal and external threats such as privilege abuse, unauthorized privilege elevations etc. Therefore, the system is configured to bring these threat vectors to zero. Further, the system (100) is configured to handle structured data, which is highly organized and can be formatted making a relational database management system (RDBMS) search an easy process. The structured data includes domains as telecom, retail, insurance, FMCG, and sources of data as open source and sensor data. Further, the system (100) is configured to manage any format as .csv, .xls, etc. or files having single and multiple tables etc.
[028] In another embodiment, the system (100) comprises at least one memory (102) with a plurality of instructions and one or more hardware processors (104) which are communicatively coupled with the at least one memory (102) to execute modules therein. The one or more hardware processors (104) may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the one or more hardware processors (104) are configured to fetch and execute computer-readable instructions stored in the memory (102).
[029] In the preferred embodiment of the disclosure, the system (100) is configured to receive at least one natural language query from the user. Wherein, the natural language query may comprise of text and voice.
[030] In another embodiment, the system (100) is configured to pre-process and parse the received natural language query into a computer-interpretable semantic representation of the natural language query using a natural language parser based on at least one NLP technique and one or more predefined metadata configuration files. It is to be noted that the predefined metadata configuration files are used to make the system (100) to understand what is required for the received query from the user in natural language to be processed in a SQL format. The predefined metadata configuration files include columns, synonyms, priority, primary-foreign key, aggregation, and stopwords. These files are stored in csv format.
[031] The parsing of the received query in the natural language includes date parsing, removal of stopwords, lemmatization, mapping of synonyms, tokenization, POS tagging, and chunking. The date parsing allows checking if any date is present in the received query. If the date is present, then the query is formatted accordingly. The date parsing supports various forms as last year, last month, last week, today, tomorrow, yesterday, only month. In addition to, different formats of dates are also taken care of using this function. Further, the removal of stopwords and mapping with synonyms would take place.
[032] Further, the received natural query is tokenized whereby the query is split up using whitespace as a delimiter. The token obtained in this process are passed to a POS tagger to assign a tag to each of these tokens. The tags are related to part of speech giving us the information about whether a particular token is an adjective or a noun or a pronoun or a number etc. The chunking is being used to transform the sentence query into a tree whose leaves represent all POS tags. Further, the system (100) is configured to generate a set of clauses based on relationship among words included in the natural language query using a hybrid approach and pruning the clauses to map the syntactic patterns onto semantic relations to correspond to the computer-interpretable semantic representation.
[033] In another embodiment, the system (100) is configured to convert the computer-interpretable semantic representation into a computer-interpretable logical schema using a set of axioms.
[034] In one aspect, wherein the system (100) is configured to identify columns in the input query. If there is more than one column then identifying whether those columns belong to same table or different tables. In case different tables then the system (100) may use a priority, configuration file and join the tables belonging to these columns accordingly. Also, a different scenario, wherein more than one column is present, and both belongs to different tables so there must be a scenario where these two tables are not linked to each other directly i.e. absence of a foreign key. So here the system (100) makes use of priority configuration file and primary foreign key mapping configuration files. Using these two, the system (100) analyzes if no direct link then is there a common table to which both of these tables have a link, and that common table is used to join the two tables asked by the user. Hence, instead of one join there would be two joins happening. Similarly, the system (100) can extend this case to more than two columns. This has been joining four tables in the database as of now. As the number of table increases there would be a greater number of combinations arising for joining any two tables out of them. Hence, more permutations and combinations with more complexity.
[035] Further, the system (100) is configured to detect at least one entity from the pre-processed query, wherein the at least one entity includes column entity, table entity, and a value entity. In the column entity detection, the one or more chunks are passed to a column configuration file. In case there is a match between a chunk and columns mentioned in this configuration file then that is chosen out and stored as a column entity. Similarly, the one or more chunks are passed to table-column mapping configuration file. If there is a match between a chunk and table mentioned in the table-column mapping configuration file, then that is chosen out and stored as a table entity. Further, once the column and table entities are being identified out of the chunks, whatever is left is considered as a value entity. It would be appreciated that an aggregation component is used when the user wants to aggregate a column.
[036] Referring FIG. 3, a functional block diagram (300) to illustrate SQL query generation, in accordance with some embodiments of the present disclosure. Herein, the system (100) is configured to convert the pre-processed query into a SQL format, wherein one or more SQL components are used in the SQL format of the query. It is to be noted that the based on the detected entities in the pre-processed query, one or more SQL components are identified to get to a SQL format of the query in the natural language. Further, the one or more SQL components include select, join, group by, and where clause. The select component in the SQL query is an integral part whereby which columns and which tables to be called are selected using it. There are two sub-components within the scale component. It includes all the columns mentioned in the input query and adding the table names to which these columns belong. All this information would come from table column mapping configuration file.
[037] In one embodiment, the system (100) is configured to classify terms of the computer interpretable semantic representation and the computer interpretable logical schema to build a domain ontology. The domain ontology would provide a general framework to represent knowledgebase that constitutes entities. Herein, enterprise documents in varied forms are converted into a common format without disturbing the semantics of the contents. Basic processing and cleansing would include removal of junk characters or words, abbreviations not covered in the concept model and metadata, and specific stop words. It also covers the process of identification and correction of incorrect, incomplete data elements. The cleansed business process documents along with the metadata and concept model will be input into the solution to build the entities and a hierarchical knowledge base.
[038] In another embodiment, the system (100) is configured to build a concept dictionary. Herein, the identification of enterprise business terminology used in queries needs to create a dictionary of all such keywords that are useful for identifying right context in the queries. A dictionary of such keywords is created using a domain ontology builder tool called Concept Model Generator (CMG). The domain ontology tool identifies all keywords used as subject or objects in the sentences with their weights. Outgoing edges of keyword graph represent subject, whereas incoming edges represent the use of keyword as an object in the sentence. Keywords with more incoming and outgoing edges have more weights. However, domain experts can change their weights as per importance of that keyword or concept. As there is additional information of the enterprise, the same are pre-processed and added up to repository the concepts, mentions and weights are updated in the dictionary.
[039] In yet another embodiment, the system (100) is configured to determine a solution to the SQL query based on a pre-trained artificial intelligence (AI) model, and at least one data source. In one instance, the system (100) triggers a handler function with the SQL query, wherein the SQL query will be connected to the database through a predefined server to generate answer to the user’s query.
[040] Further, the system (100) is configured to determine a report using a heuristic search. The heuristic search is a rule-based query optimization that involves equivalence rule on relational expressions leading to having reduced number of combination of queries. These equivalence rules are by providing an alternative way of writing and evaluating the query, gives the better path to evaluate the query. It works by having all the selection operation as early as possible in the query. By performing the selection operation, it reduces the number of records involved in the query, rather than using the whole tables throughout the query.
[041] Furthermore, the system (100) identifies domain specific named entities using a pattern-based approach and a named entity resolution is used to resolve the named entity into different categories or entity types. For named entity resolution, named entity is search against the database columns to map it with respective database column. Hence, the system (100) retrieves a report to the SQL.
[042] In another aspect, the system (100) is configured to maintain logs of the user interactions and associated queries for context retention. Next time when similar interaction is witnessed, the system re-used its existing SQL query, rather than repeating the entire process of creating the SQL query statement from scratch, the system builds a context, to query the domain ontology managed in the at least one data source for an interactive report. It would be appreciated that the conversations from the user logs act as a training data for the system (100) making it more responsive. Further, the system (100) creates the computer-interpretable semantic representation from the maintained logs on receiving a natural language query from a user.
[043] Referring FIG. 4, a processor-implemented method (400) to translate a natural language query using one or more metadata configurations to retrieve a domain specific report from a database. Herein, a query is received from a user at the input/output (I/O) interface of the system. The received query is pre-processed based on at least one NLP technique and one or more predefined configuration files. At least one entity is detected from the pre-processed query, wherein the at least one entity includes column entity, table entity, and a value entity. The pre-processed query is converted into a SQL format, wherein one or more SQL components are used in the SQL format of the query. At least one database is identified based on the detected entity and a handler function is triggered with the SQL query to connect with the identified database through a predefined server. The handle function will collect answer to the user’s query from the identified database. The processor-implemented method comprises one or steps as follows.
[044] Initially, at the step (402), receiving a natural language query from a user, wherein the natural language query comprising of text and voice.
[045] At the next step (404), parsing the received natural language query into a computer-interpretable semantic representation of the natural language query using a natural language parser. The parsing comprises executing stopwords removal, lemmatization, tokenization, POS tagging, and chunking to determine relationship among words. Set of clauses are generated based on relationships among words included in the natural language query using a hybrid approach and pruning the clauses to map the syntactic patterns onto semantic relations to correspond to the computer-interpretable semantic representation.
[046] At the next step (406), converting the computer-interpretable semantic representation into a computer-interpretable logical schema using a set of axioms. It would be appreciated that one or more SQL components are used while converting the natural language query into the SQL format. The computer-interpretable logical schema includes column entity, table entity, and a value entity. Once, the list of chunks is obtained are passed to a table-column mapping configuration file. In case, there is a match between a chunk and columns mentioned in the configuration file then that is chosen out and stored as a column entity. Similar to column entity, each chunk is passed to table-column mapping configuration file. If there is a match between a chunk and the table mentioned in the configuration file, then that is chosen out and stored as a table entity. After, column and table entities have been identified out of the chunks, whatever is left is considered as value entity. Now, with the value entity there are two things done majorly in the context to it. For example, in a received query “show me sales for 2018”, 2018 is the value entity.
[047] At the next step (408), classifying terms of the computer interpretable semantic representation and the computer interpretable logical schema to build a domain ontology.
[048] At the next step (410), determining a solution to the computer-interpretable logical schema based on a pre-trained artificial intelligence (AI) model, the domain ontology and at least one data source.
[049] At the last step (412), retrieving a domain specific report based on the determined solution and a heuristic search. Wherein, the heuristic search is a rule-based query optimization that involves equivalence rule on relational expressions.
[050] The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
[051] The embodiments of present disclosure herein address unresolved problem associated with the existing NLIDB systems, which can deal only with limited subset of natural languages. The existing NLIDB systems has drawbacks like handling linguistic capabilities and most of them have no reasoning capabilities. Further, the pattern-based approach works on set of rules and generates SQL query but herein if any context that is out of the given rule then that will fail. Moreover, the existing systems have issues in identifying a correct table and column entities based on a specific domain-based query.
[052] It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device, which can be programmed including e.g. any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g. hardware means like e.g. an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g. an ASIC and an FPGA, or at least one microprocessor and at least one memory with software modules located therein. Thus, the means can include both hardware means, and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g. using a plurality of CPUs.
[053] The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various modules described herein may be implemented in other modules or combinations of other modules. For the purpose of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
[054] The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development would change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope and spirit of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
[055] Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
[056] It is intended that the disclosure and examples be considered as exemplary only, with a true scope of disclosed embodiments being indicated by the following claims.

Documents

Application Documents

#	Name	Date
1	202121028877-STATEMENT OF UNDERTAKING (FORM 3) [28-06-2021(online)].pdf	2021-06-28
2	202121028877-REQUEST FOR EXAMINATION (FORM-18) [28-06-2021(online)].pdf	2021-06-28
3	202121028877-FORM 18 [28-06-2021(online)].pdf	2021-06-28
4	202121028877-FORM 1 [28-06-2021(online)].pdf	2021-06-28
5	202121028877-DRAWINGS [28-06-2021(online)].pdf	2021-06-28
6	202121028877-DECLARATION OF INVENTORSHIP (FORM 5) [28-06-2021(online)].pdf	2021-06-28
7	202121028877-COMPLETE SPECIFICATION [28-06-2021(online)].pdf	2021-06-28
8	202121028877-Proof of Right [20-07-2021(online)].pdf	2021-07-20
9	202121028877-FORM-26 [22-10-2021(online)].pdf	2021-10-22
10	Abstract1..jpg	2021-12-10
11	202121028877-FER.pdf	2023-02-14
12	202121028877-FER_SER_REPLY [28-06-2023(online)].pdf	2023-06-28
13	202121028877-DRAWING [28-06-2023(online)].pdf	2023-06-28
14	202121028877-CLAIMS [28-06-2023(online)].pdf	2023-06-28
15	202121028877-ABSTRACT [28-06-2023(online)].pdf	2023-06-28

Search Strategy

1	9february3202121028877E_09-02-2023.pdf