Abstract: The present invention related to Enterprise Data Insights Retrieval system and method Utilizing Graph Databases and Large Language Models for generating insights from structured data. Leveraging a graph-based approach, the system models data as nodes and edges within a Neo4j knowledge graph database, enabling more flexible and accurate querying compared to traditional SQL-based methods. The system employs advanced technologies including large language models (LLMs) for semantic retrieval and Cypher query generation, dynamic few-shot learning for improved query accuracy, and Retrieval-Augmented Generation (RAG) for high-precision node and edge identification. The process begins with a user inputting a natural language query, which is processed through an API call to a backend service hosted on AWS ECS. A microservice handles the query, utilizing LLMs and dynamic few-shot learning to generate Cypher code tailored to the graph database. The Cypher code is validated and executed, retrieving relevant data and generating a final response. The system also integrates with data analysis and visualization tools such as Snowflake, Power BI, and Streamlit, allowing users to further analyze and visualize the generated insights. By optimizing data querying, retrieval, and processing, the system enhances both the accuracy of insights and the efficiency of data communication, making it a valuable tool for enterprise environments with high data processing and transfer demands. Figure 1
Description:FIELD OF THE INVENTION
The present invention pertains to the fields of data analytics, natural language processing, and network communication. It specifically addresses the development of a system and method for generating insights and information from structured data stored in databases using natural language queries. The invention leverages a graph-based approach to model data, enabling more flexible and accurate querying compared to traditional SQL-based methods. Additionally, the invention focuses on optimizing data processing at the application layer to improve data transfer and communication efficiency at the TCP/IP layer by reducing redundant data exchanges and enabling more efficient queries. This solution is particularly suitable for enterprise environments with high data processing and transfer demands.
BACKGROUND OF THE INVENTION AND PRIOR ART
As data analytics evolve, there is a growing demand for natural language interfaces that allow users to interact with databases more intuitively. While advancements in Generative AI and Large Language Models (LLMs) have simplified extracting insights from unstructured data, generating insights from structured data remains complex. Structured data, often stored in relational databases or data warehouses, typically requires SQL queries, a process that can be cumbersome and error-prone when mediated through Natural Language to SQL (NL2SQL) systems.
NL2SQL solutions are often constrained by their reliance on pre-trained models, which struggle with complex enterprise schema structures. These systems exhibit low accuracy and require significant resources to train on specific datasets. As a result, deploying reliable natural language interfaces in enterprise environments remains difficult.
Several studies and benchmarks have evaluated the performance of NL2SQL systems, highlighting their limitations in terms of accuracy. Below are some key findings:
1. Spider Benchmark:
The Spider dataset is a large-scale benchmark for evaluating the performance of NL2SQL systems. It includes complex and cross-domain SQL queries. According to the results published in the Spider leaderboard:
• The best-performing NL2SQL model achieved an accuracy of approximately 70% on the Spider dataset.
• Many models, especially those not fine-tuned for specific domains, exhibit significantly lower accuracy, often below 50%.
2. Academic Research:
A study titled "A Comprehensive Study on Natural Language to SQL Systems" published in the Journal of Data Science evaluated several state-of-the-art NL2SQL systems. The study found:
• On average, the accuracy of NL2SQL systems across various datasets was around 60%.
• For complex queries involving multiple joins and nested subqueries, the accuracy dropped to approximately 40%.
3. Industry Benchmarks:
In a benchmark conducted by a leading data analytics company, the performance of NL2SQL systems was evaluated in an enterprise environment with complex schema structures. The results indicated:
• The overall accuracy of NL2SQL systems was around 55%.
• For queries involving advanced SQL features such as window functions and CTEs (Common Table Expressions), the accuracy was below 35%.
These findings underscore the challenges faced by existing NL2SQL systems in achieving high accuracy, particularly in complex and enterprise-specific scenarios. The limitations in generalizing to diverse schema structures and handling intricate SQL queries contribute to their low accuracy, making them less reliable for enterprise use cases.
In parallel, the growing need to efficiently transfer data across systems and networks has introduced challenges in optimizing network communication, especially as more data is processed and queried within enterprise environments. The TCP/IP layer, responsible for reliable data transfer, often faces strain from high data volumes and inefficient communication patterns between hardware and software.
Problem in the Prior Art
Existing NL2SQL systems and traditional data querying methods face several significant challenges:
(a) Low Accuracy: Pre-trained models used in NL2SQL systems often fail to generalize well to complex enterprise schema structures, resulting in low accuracy in query generation.
(b) High Resource Requirements: These systems require extensive training on specific datasets, making them resource-intensive and difficult to deploy effectively in enterprise environments.
(c) Inefficient Data Transfer: Traditional data querying methods often result in large volumes of data being transferred across networks, leading to network congestion and strain on the TCP/IP layer. Traditional SQL-based querying methods can result in the transfer of large datasets. For example, a single query retrieving a full table with millions of rows can result in gigabytes of data being transferred over the network.
(d) Redundant Data Exchanges: Inefficient communication patterns between hardware and software result in redundant data exchanges, further exacerbating network congestion and reducing overall data communication efficiency. For instance, in enterprise environments, it is estimated that up to 30% of network traffic can be attributed to redundant or inefficient data transfers.
Disadvantages of the Prior Art
The disadvantages of existing solutions include:
(a) Complexity and Error-Prone Processes: SQL queries required for structured data are cumbersome and prone to errors, especially when mediated through NL2SQL systems.
(b) Scalability Issues: The reliance on pre-trained models and extensive training requirements makes it difficult to scale NL2SQL systems for large enterprise environments.
(c) Network Strain: High data volumes and inefficient communication patterns lead to network congestion and strain on the TCP/IP layer, resulting in slower and less reliable data transfers.
(d) Resource Intensive: The need for significant computational resources to train and deploy NL2SQL systems makes them costly and less accessible for many enterprises. Further frequent retraining of NL2SQL model with small change in schema or data model makes it almost impractical for enterprise use.
Estimating the cost of training a medium-sized Natural Language to SQL (NL2SQL) model involves several factors, including computational resources, data preparation, and time:
1. Computational Resources:
o Cloud Computing Costs: Training a medium-sized NL2SQL model typically requires powerful GPUs or TPUs. For instance, using cloud services like AWS, Google Cloud, or Azure, the cost of a single NVIDIA V100 GPU can range from $2.50 to $3.00 per hour.
o Training Duration: Training a medium-sized NL2SQL model can take anywhere from a few days to several weeks, depending on the complexity and size of the dataset. Assuming an average training time of 2 weeks (336 hours), the cost for a single GPU would be:
336 hours×$2.75 per hour=$924 per GPU
• Multiple GPUs: To speed up training, multiple GPUs are often used. Using 4 GPUs, the cost would be:
4 GPUs×$924=$3,696
2. Data Preparation:
o Data Collection and Cleaning: Preparing the dataset for training involves collecting, cleaning, and annotating data. This process can be labor-intensive and may require hiring data scientists or annotators. The cost can vary widely, but a rough estimate for a medium-sized dataset might be around 5,000to10,000.
3. Storage Costs:
o Data Storage: Storing large datasets and model checkpoints during training incurs additional costs. Cloud storage services typically charge around $0.02 per GB per month. Assuming a dataset size of 500 GB and model checkpoints of 100 GB, the monthly storage cost would be:
600 GB×$0.02 per GB=$12 per month
4. Personnel Costs:
o Data Scientists and Engineers: The cost of hiring data scientists and machine learning engineers to design, implement, and monitor the training process can be significant. Assuming an average salary of $100,000 per year, the cost for a 2-week period would be:
$100,00052 weeks×2 weeks=$3,846
5. Miscellaneous Costs:
o Software Licenses and Tools: Additional costs may include software licenses, tools, and other resources required for training. This can add another 500to1,000.
Total Estimated Cost: Combining all these factors, the total estimated cost for training a medium-sized NL2SQL model would be:
$3,696 (GPU costs)+$7,500 (data preparation)+$12 (storage)+$3,846 (personnel)+$750 (miscellaneous)=$15,804
Therefore, the estimated cost for training a medium-sized NL2SQL model is approximately $15,804. This estimate can vary based on specific requirements, resource availability, and other factors.
Technical Solution of the Present Invention
The present invention, the Enterprise Data Insights Retrieval System utilizing graph databases and Large Language Models, addresses these problems by introducing a novel method for generating insights from structured data using natural language queries. The technical solutions provided by GBIRS include:
(a) Graph-Based Data Modelling: GBIRS stores entity relationships as nodes and edges in a graph database, moving away from the traditional tabular approach. This graph-based model allows for more efficient representation and querying of complex relationships, significantly reducing the volume of data that needs to be retrieved and transferred over the network.
(b) Insight Generation System: A Python-based application within GBIRS transforms raw data into structured insights. The system captures relevant business logic and insight semantics, storing this information within the graph database. By processing data locally and more efficiently, GBIRS minimizes the need to transfer large datasets between systems, optimizing data communication across the network.
(c) Cypher Query Generation Using LLMs: GBIRS utilizes LLMs to generate Cypher queries tailored to the graph database. This method takes advantage of LLMs' natural language understanding capabilities and allows for more targeted data queries. The result is fewer requests for large data transfers, which ultimately reduces the burden on the network and improves data communication efficiency, especially at the TCP/IP layer.
(d) Semantic Node and Edge Identification via RAG: GBIRS employs Retrieval-Augmented Generation (RAG) to ensure high accuracy in identifying nodes and edges before generating Cypher queries. This process ensures that only the most relevant data is retrieved, significantly reducing the volume of unnecessary data sent across the network, optimizing data transfer at the application level, and indirectly improving communication at the TCP/IP layer.
(e) Cypher Query Validation and Execution: After generating Cypher queries, GBIRS includes a validation step to verify accuracy before execution. By ensuring that only validated queries are sent, the system further reduces the risk of data duplication or unnecessary network communication, helping to streamline data transfers.
By integrating these advanced technologies, GBIRS not only enhances the accuracy and flexibility of insights retrieval from structured data but also reduces the volume of data transferred over networks. This dual impact makes GBIRS a valuable tool for enterprise environments, where both accurate data insights and efficient communication are critical.
OBJECT OF THE INVENTION
The primary object of the present invention is to provide a system and method for generating insights from structured data using natural language queries, which overcomes the limitations of existing NL2SQL systems and traditional data querying methods. The specific objectives of the invention include:
(a) To develop a graph-based data modelling approach that stores entity relationships as nodes and edges in a graph database, allowing for more efficient representation and querying of complex relationships.
(b) To create an insight generation system that transforms raw data into structured insights and stores these insights within the graph database, thereby minimizing the need for large data transfers between systems.
(c) To utilize large language models (LLMs) for generating Cypher queries tailored to the graph database, enabling more targeted data queries and reducing the volume of data transferred over networks.
(d) To include a query validation step to verify the accuracy of Cypher queries before execution, reducing the risk of data duplication or unnecessary network communication.
(e) To improve data communication efficiency, particularly at the TCP/IP layer, by reducing network congestion and optimizing data transfer volumes through more precise and targeted queries.
(f) To enhance the overall performance and reliability of data communication in enterprise environments by minimizing network latency and instances of dropped or fragmented packets.
(g) To provide a scalable and resource-efficient solution for querying structured data in enterprise environments, addressing the challenges of complex schema structures and high data processing demands.
By achieving these objectives, the invention aims to provide a robust and efficient system for generating insights from structured data, improving both data retrieval accuracy and network communication efficiency in enterprise environments.
SUMMARY OF THE INVENTION
The summary of the invention provided herein is intended to offer an overview of the key features and advantages of the invention. It is not intended to limit the scope of the invention. Various modifications, changes, and variations can be made without departing from the scope and spirit of the invention as defined by the appended claims. The detailed description and figures are to be considered as illustrative and not restrictive, and the invention is not limited to the specific details and configurations shown and described. The scope of the invention is defined by the claims and their equivalents.
The present invention, the Enterprise Data Insights Retrieval System Utilizing Graph Databases and Large Language Models, introduces a novel method for generating insights from structured data using natural language queries. The system leverages a graph-based approach to model data, enabling more flexible and accurate querying compared to traditional SQL-based methods. Additionally, the system improves data communication efficiency by reducing redundant data retrieval and minimizing unnecessary data transfers across networks.
The key components and aspects of the system of the present invention include:
(a) Graph-Based Data Modelling: The system stores entity relationships as nodes and edges in a graph database, moving away from the traditional tabular approach. This graph-based model allows for more efficient representation and querying of complex relationships, significantly reducing the volume of data that needs to be retrieved and transferred over the network.
(b) Insight Generation System: A Python-based application within the system transforms raw data into structured insights. The system captures relevant business logic and insight semantics, storing this information within the graph database. By processing data locally and more efficiently, the Enterprise Data Insights Retrieval System minimizes the need to transfer large datasets between systems, optimizing data communication across the network.
(c) Cypher Query Generation Using LLMs: The system utilizes large language models (LLMs) to generate Cypher queries tailored to the graph database. This method takes advantage of LLMs' natural language understanding capabilities and allows for more targeted data queries. The result is fewer requests for large data transfers, which ultimately reduces the burden on the network and improves data communication efficiency, especially at the TCP/IP layer.
(d) Semantic Node and Edge Identification via RAG: The system employs Retrieval-Augmented Generation (RAG) to ensure high accuracy in identifying nodes and edges before generating Cypher queries. This process ensures that only the most relevant data is retrieved, significantly reducing the volume of unnecessary data sent across the network, optimizing data transfer at the application level, and indirectly improving communication at the TCP/IP layer.
(e) Cypher Query Validation and Execution: After generating Cypher queries, the system incorporates a validation step to ensure their accuracy before execution. Given that Large Language Models (LLMs) are not always 100% accurate in generating Cypher queries, this validation step is crucial. By validating the queries before they are sent, the system minimizes the risk of data duplication or unnecessary network communication, thereby streamlining data transfers.
(f) Impact on Data Communication and TCP/IP Layer: The system enhances data communication efficiency, particularly at the TCP/IP layer, by:
1. Reduced Data Transfer: Enabling more precise and targeted queries to minimize the amount of data transferred between databases and applications, reducing network congestion and the overall data load handled by the TCP/IP layer. The Enterprise Data Insights Retrieval System stores entity relationships as nodes and edges in a graph database, moving away from the traditional tabular approach. This graph-based model allows for more efficient representation and querying of complex relationships. By focusing on the specific nodes and edges relevant to a query, the system significantly reduces the volume of data that needs to be retrieved and transferred over the network.
2. Optimized Data Processing at the Application Layer: Handling a significant amount of data processing and insights generation within the graph database itself, reducing the need for data transfers between systems and optimizing bandwidth utilization.
3. Efficient Data Queries: Leveraging Cypher queries and RAG techniques to reduce extensive data exchanges, enhancing application performance and reducing network traffic.
4. Impact on Network Latency and Reliability: Reducing data transfers to lower network latency and instances of dropped or fragmented packets, contributing to more reliable network communication.
(g) Enterprise Data Focus: The system is specifically designed to address the challenges of querying enterprise private data, improving both data retrieval and network communication in large-scale environments.
The accompanying figure illustrates the architecture and workflow of the Enterprise Data Insights Retrieval System Utilizing Graph Databases and Large Language Models. A user inputs a query, which is processed through an API call to AWS ECS. A microservice handles semantic retrieval using LLMs and dynamic few-shot learning. The query is then processed by the Neo4j knowledge graph database, where context retrieval and Cypher code generation occur. The Cypher code is validated and executed, and the final response is generated and sent back to the user. Logs are saved, and the system integrates with tools like Snowflake, Power BI, and Streamlit for further data analysis and visualization.
Overall, the system represents a significant advancement in both data insights retrieval and network communication. By optimizing how data is queried, retrieved, and processed, the system not only enhances the accuracy of insights but also improves data communication efficiency, particularly at the TCP/IP layer. This dual impact makes the Enterprise Data Insights Retrieval System Utilizing Graph Databases and Large Language Models a valuable tool for enterprise environments, where both accurate data insights and efficient communication are critical.
Accordingly, in an aspect the present invention provides an Enterprise Data Insights Retrieval System Utilizing Graph Databases and Large Language Models, comprising:
a user interface configured to receive natural language queries from a user;
a microservice hosted on a container orchestration platform, configured to handle API calls from the user interface;
a large language model (LLM) configured to perform semantic retrieval and understand the intent behind the user's query;
a dynamic few-shot learning module configured to improve the accuracy of the query generation process;
a knowledge graph database for storing and retrieving contextual information;
a Cypher code generation module configured to generate Cypher code tailored to the graph database using the LLM;
a query validation module configured to validate the generated Cypher code;
a query execution module configured to execute the validated Cypher code against the knowledge graph database and retrieve relevant data;
a question-answering system configured to convert the structured data into a human-readable natural language answer;
an asynchronous logging and analytics module configured to monitor system performance and user interactions.
Accordingly, in another aspect the present invention provides a method for generating insights from structured data using natural language queries, comprising:
receiving a natural language query from a user;
processing the query through an API call to a backend service hosted on a container orchestration platform;
performing semantic retrieval using a large language model (LLM) to understand the intent behind the query;
employing dynamic few-shot learning to improve the accuracy of the query generation process;
retrieving relevant contextual information from a knowledge graph database;
generating Cypher code tailored to the graph database using the LLM;
validating the generated Cypher code to ensure accuracy;
executing the validated Cypher code against the knowledge graph database to retrieve relevant data;
generating a final response based on the retrieved data;
sending the final response back to the user;
recording and storing logs of all activities, queries, and responses for monitoring, debugging, and auditing purposes.
Accordingly, yet in another aspect the present invention provides a device for generating insights from structured data using natural language queries, comprising:
a processor configured to execute machine learning algorithms;
a memory storing machine learning models, including a large language model (LLM) and a dynamic few-shot learning model;
a knowledge graph database for storing and retrieving contextual information;
a Cypher code generation module configured to generate Cypher code tailored to the graph database using the LLM;
a query validation module configured to validate the generated Cypher code;
a query execution module configured to execute the validated Cypher code against the knowledge graph database and retrieve relevant data;
a question-answering system configured to convert the structured data into a human-readable natural language answer;
an asynchronous logging and analytics module configured to monitor system performance and user interactions.
Accordingly, in a further aspect the present invention provides an apparatus for generating insights from structured data using natural language queries, comprising:
a processor configured to execute machine learning algorithms;
a memory storing machine learning models, including a large language model (LLM) and a dynamic few-shot learning model;
a user interface configured to receive natural language queries from a user;
a microservice hosted on a container orchestration platform, configured to handle API calls from the user interface;
a knowledge graph database for storing and retrieving contextual information;
a Cypher code generation module configured to generate Cypher code tailored to the graph database using the LLM;
a query validation module configured to validate the generated Cypher code;
a query execution module configured to execute the validated Cypher code against the knowledge graph database and retrieve relevant data;
a question-answering system configured to convert the structured data into a human-readable natural language answer;
an asynchronous logging and analytics module configured to monitor system performance and user interactions.
BRIEF DESCRIPTION OF THE ACCOMPANYING DRAWINGS
The above and other aspects, features and advantages of the embodiments of the present disclosure will be more apparent in the following description taken in conjunction with the accompanying drawings, in which:
FIG. 1 depicts a system architecture of the Enterprise Data Insights Retrieval System Utilizing Graph Databases and Large Language Models, illustrating the flow of data and interactions between components.
The accompanying drawings illustrate various embodiments of the present invention and are provided for the purpose of explaining the principles of the invention. These drawings are not intended to limit the scope of the invention. The components and arrangements shown in the drawings may be modified or substituted with equivalent elements to achieve the same or similar results. The invention is not limited to the specific details and configurations shown and described in the drawings.
DETAILED DESCRIPTION OF THE INVENTION
The following description with reference to the accompanying drawings is provided to assist in a comprehensive understanding of various embodiments of the present disclosure as defined by the claims and their equivalents. It includes various specific details to assist in that understanding, but these are to be regarded as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the various embodiments described herein can be made without departing from the scope and spirit of the present disclosure. In addition, descriptions of well-known functions and constructions may be omitted for clarity and conciseness.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which various embodiments belong. Further, the meaning of terms or words used in the specification and the claims should not be limited to the literal or commonly employed sense but should be construed in accordance with the spirit of the disclosure to most properly describe the present disclosure.
The terminology used herein is for the purpose of describing particular various embodiments only and is not intended to be limiting of various embodiments. As used herein, the singular forms "a," "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising" used herein specify the presence of stated features, integers, steps, operations, members, components, and/or groups thereof, but do not preclude the presence or addition of one or more other features, integers, steps, operations, members, components, and/or groups thereof.
The present disclosure will now be described more fully with reference to the accompanying drawings, in which various embodiments of the present disclosure are shown.
Definition of some technical terms:
API Call: An application programming interface (API) call is a request made by a software application to access data or services from another application or service.
AWS ECS: Amazon Web Services Elastic Container Service (AWS ECS) is a fully managed container orchestration service that allows users to run and manage Docker containers on a cluster of Amazon EC2 instances.
Microservice: A microservice is a small, independent service that performs a specific function within a larger system. Microservices are designed to be loosely coupled and can be developed, deployed, and scaled independently.
Semantic Retrieval: Semantic retrieval is the process of retrieving information based on the meaning and context of the query, rather than merely matching keywords. This approach relies on a concept called "embedding," which represents words as numerical vectors. By understanding these numerical representations, the system can grasp the intent behind the query and provide more accurate results.
LLM (Large Language Model): A large language model (LLM) is an advanced artificial intelligence model trained on vast amounts of text data to understand and generate human-like language. LLMs are used for natural language processing tasks such as text generation, translation, and question answering.
The LLM is responsible for several critical tasks:
Understanding the Query: Initially, the LLM reformulates the user’s natural language query into a structured format using dynamic few-shot learning.
Generating the Database Query: Interacting with the LLM for the second time, it again uses few-shot examples to create a Cypher query to retrieve the required data from the Neo4j knowledge graph.
Generating the Final Response: Once the data is retrieved, the LLM combines it with the structured query from the initial interaction to generate a detailed, context-rich response for the user. This response leverages both the reformulated user query and the data pulled from the knowledge graph, ensuring a precise and complete answer.
Dynamic Few Shots: Dynamic few-shot learning is a technique in machine learning where the model is trained to make predictions or generate outputs based on a small number of examples (few shots) that are dynamically selected based on the context of the query.
Neo4j: Neo4j is a graph database management system that uses graph structures with nodes, edges, and properties to represent and store data. It is designed for efficiently managing and querying complex relationships within data.
Knowledge Graph Database: A knowledge graph database is a type of database that uses graph structures to represent and store interconnected data, enabling more efficient querying and retrieval of complex relationships.
Neo4j stores all the relevant data for answering user queries. The LLM-generated Cypher query is executed on the database, and the resulting data is returned to the LLM for final response generation.
The microservice, hosted on AWS ECS, orchestrates everything through RESTful API calls, facilitating interaction between the user and the backend system. Once the API calls reach the backend from the frontend, the entire backend operates as a cohesive entity, with subcomponents performing specific tasks. The microservice coordinates between the LLM and Neo4j, ensuring smooth data flow and efficient response generation. The system is designed to scale easily and handle multiple queries simultaneously, providing fast and reliable performance.
Context Retrieval: Context retrieval is the process of matching the correct node and edge with the use query using semantic search.
The process begins with the conversion of a natural language query into Cypher code, a query language specifically designed for graph databases like Neo4j. This conversion is crucial as it translates the user's intent into a format that can be executed against the graph database. The Cypher code undergoes validation checks to ensure its accuracy and correctness, minimizing the risk of errors and ensuring that the query aligns with the user's intent.
Once validated, the Cypher query is executed to extract relevant information from the graph database. This database, structured with nodes and edges, encapsulates complex relationships and provides a rich context for data retrieval. The extracted information is not isolated; it is combined with dynamic few-shot examples and the graph schema. Dynamic few-shot learning allows the system to adapt to new queries by selecting relevant examples based on the query's context, enhancing the accuracy of the response.
This comprehensive input, which includes the extracted data, dynamic examples, and the graph schema, is fed into a Large Language Model (LLM). The LLM leverages this rich context to generate accurate responses. Effective prompting techniques are employed to guide the LLM in formulating its answers, ensuring that the responses are both relevant and reliable.
By utilizing this amalgamation of data, the approach significantly minimizes the occurrence of hallucinations—instances where the LLM generates information that is not grounded in the provided data. This ensures that the generated responses are not only contextually appropriate but also factually accurate, making the system a robust tool for generating insights from structured data.
Cypher Code: Cypher is a query language for graph databases, such as Neo4j, used to describe patterns in graphs and retrieve data based on those patterns. Cypher code refers to the specific queries written in the Cypher language.
Question Classification & Improvement: Question classification and improvement involve categorizing the query into specific types and refining it to ensure that it is accurately understood and processed by the system.
Final Response: The final response is the output generated by the system in response to the user's query, providing the requested insights or information.
Logs Saved: Logs saved refer to the process of recording and storing system activities, queries, and responses for monitoring, debugging, and auditing purposes.
Snowflake: Snowflake is a cloud-based data warehousing platform that allows organizations to store, manage, and analyze large volumes of data in a scalable and efficient manner.
Power BI: Power BI is a business analytics service by Microsoft that provides interactive visualizations and business intelligence capabilities with an interface for end users to create their own reports and dashboards.
Streamlit: Streamlit is primarily used to visualise the whole system preformation in a realtime manner.
The present invention, the Graph Based Insights Retrieval System (GBIRS), is designed to generate insights from structured data using natural language queries. The system leverages a graph-based approach to model data, enabling more flexible and accurate querying compared to traditional SQL-based methods. Additionally, GBIRS improves data communication efficiency by reducing redundant data retrieval and minimizing unnecessary data transfers across networks.
The invention is illustrated in FIG. 1, which depicts the architecture and workflow of GBIRS. In an embodiment of the present invention the system comprises several key components and processes, each of which is described in detail below.
1. User Inputs Query
The process begins when a user inputs a natural language query (101). This query is intended to retrieve insights from structured data stored in the system's data warehouse, specifically Snowflake, which houses all the OLAP (Online Analytical Processing) data for analytical work.
In the context of the present invention, the process begins when a user inputs a natural language query into the system. This query is intended to retrieve insights from structured data stored in the system's databases. The user interface for inputting the query can be a web application, a mobile app, or any other platform that supports natural language input.
The natural language query is a key aspect of the invention, as it allows users to interact with the system in an intuitive and user-friendly manner. Unlike traditional SQL-based systems that require users to write complex and precise queries, the present invention enables users to simply type or speak their questions in natural language. This significantly lowers the barrier to accessing and retrieving valuable insights from structured data, making the system accessible to a broader range of users, including those without technical expertise in database querying.
Once the user inputs the query, the system processes it through an API call to the backend, which is hosted on AWS ECS (Amazon Web Services Elastic Container Service). The API call is handled by a microservice within the system, which initiates the semantic retrieval process. The microservice leverages advanced natural language processing (NLP) techniques and large language models (LLMs) to understand the intent behind the user's query and retrieve relevant information based on the meaning and context of the query.
The system's ability to process natural language queries is powered by several key components:
(a) Semantic Retrieval: The system performs semantic retrieval using LLMs, which are trained on vast amounts of text data to understand and generate human-like language. When a user query is received, the semantic retrieval process is triggered to find the right match using "Dynamic Few Shots." This means that instead of passing every few shots to the LLM for each query, the system maintains a master list of few shots. For each new query, the system dynamically selects the best matching few shots from this master list using semantic retrieval or search. Each selected few shot includes instructions on how the Cypher query can be generated. The LLM then uses these selected few shots to generate the actual Cypher query for the incoming user query. Additionally, another semantic search occurs to identify the appropriate nodes and edges on which the Cypher query can be executed, ensuring accurate and relevant information retrieval from the graph database.
In the field of natural language processing (NLP), few-shot learning has emerged as a significant technique for enhancing the performance of large language models (LLMs). Traditionally, few-shot learning involves providing a set of examples or "few-shots" alongside the relevant context, enabling the LLM to generate responses that adhere to specific rules or guidelines derived from those examples. However, static few-shot approaches face substantial limitations, particularly when the application encompasses a broad range of use cases.
Limitations of Static Few-Shots
The reliance on a static set of few-shots can lead to several challenges:
1. Diverse Use Cases: Applications may require dozens of distinct few-shots to accommodate various rules and question-and-answer formats. This diversity can complicate the selection of appropriate examples.
2. Context Window Constraints: LLMs need fine-tuned and specific instructions to operate effectively. Incorporating an excessive number of static few-shots can overwhelm the LLM's limited context window, resulting in reduced accuracy and comprehension.
3. Increased Hallucinations: Passing all few-shots as instructions to the LLM can introduce noise and ambiguity, leading to hallucinations—erroneous or nonsensical outputs generated by the model. Finding the best-matched few-shots and passing only those to the LLM improves the accuracy of the generated Cypher query.
4. Higher Operational Costs: The computational resources required to process numerous static few-shots can elevate operational costs, diminishing the feasibility of the approach in resource-constrained environments. By dynamically selecting the most relevant few-shots, the system can optimize resource usage and maintain high accuracy.
Advantages of Dynamic Few-Shots
To mitigate the aforementioned limitations, the present invention employs a dynamic few-shot learning approach. This innovative technique delivers a tailored subset of few-shots that are directly relevant to each user query. The advantages of this approach include:
1. Enhanced Accuracy: By providing a limited number of contextually appropriate few-shots, the model can better adhere to the user’s intent, leading to more accurate responses.
2. Cost Efficiency: Reducing the number of few-shots processed in response to each query decreases the computational burden, resulting in lower operational costs.
3. Minimized Hallucinations: The focus on relevant few-shots decreases the likelihood of generating extraneous or nonsensical outputs, thereby improving the overall reliability of the model.
Methodology
In an embodiment of the present invention, the implementation of the dynamic few-shot approach involves the following steps:
1. Vector Embedding Generation: An open-source NLP model is utilized to generate vector embedding based on the semantic meaning of each sentence. This model effectively captures the underlying semantics, enabling a more nuanced understanding of language.
2. Database Construction: A database is created that comprises a collection of sentences, associated few-shots, and their respective embeddings. This database is maintained using PostgreSQL as the database management system (DBMS).
3. Query Processing: Upon receiving a user query, the system converts the query into a vector embedding. A similarity search is conducted within the database to identify the topmost relevant few-shots. This process utilizes a defined threshold cutoff to exclude queries that fall below an acceptable level of semantic similarity, thus preventing absurd or irrelevant responses.
4. Response Generation: The identified relevant few-shots are then provided to the LLM, guiding its response generation in a manner that aligns closely with the user’s query.
Further the method for dynamic few-shot learning offers a robust solution to the limitations associated with static few-shot approaches in NLP. By leveraging vector embeddings and a targeted retrieval mechanism, this technique enhances accuracy, reduces operational costs, and minimizes the occurrence of hallucinations in generated outputs. This methodology represents a significant advancement in the field of NLP, facilitating more efficient and effective interactions between users and language models.
2. API Call
The user's query is sent as an API call (102) to the system's backend, which is hosted on AWS ECS (103). AWS ECS (Amazon Web Services Elastic Container Service) is a fully managed container orchestration service that allows the system to run and manage Docker containers on a cluster of Amazon EC2 instances.
In the context of the present invention, an API (Application Programming Interface) call is a crucial step in the process of generating insights from structured data using natural language queries. When a user inputs a natural language query into the system, this query is sent as an API call to the backend infrastructure of the Graph Based Insights Retrieval System (GBIRS).
The API call serves as a bridge between the user interface and the backend services, enabling seamless communication and data exchange.
3. Microservice
The API call is handled by a microservice (104) within the system. Microservices are small, independent services that perform specific functions within the larger system. In this case, the microservice is responsible for processing the user's query and initiating the semantic retrieval process. The microservice acts as the main hub, handling incoming API requests, interacting with the LLM to process queries, and communicating with the Neo4j database to retrieve data. It also logs all queries and responses for future reference. When a query comes in, the microservice first calls the LLM to understand the request, then uses a second LLM to generate a Cypher query to fetch data from Neo4j. After getting the data, the microservice coordinates the generation of the final response.
In the context of the present invention, a microservice is a small, independent service that performs a specific function within the larger architecture of the Graph Based Insights Retrieval System (GBIRS). The microservice architecture is a key design principle that enables the system to be scalable, resilient, and maintainable. Here is a detailed explanation of the role and functionality of microservices within the present invention:
(a) Handling API Calls: When a user inputs a natural language query, it is sent as an API call to the backend services hosted on AWS ECS (Amazon Web Services Elastic Container Service). The API call is received by a specific microservice responsible for processing the user's query. This microservice acts as the entry point for the query processing workflow.
In summary, microservices play a critical role in the present invention by enabling efficient query processing, semantic retrieval, context retrieval, and Cypher code generation. The microservice architecture ensures that the system is scalable, resilient, and maintainable, making it a robust solution for generating insights from structured data using natural language queries.
4. Semantic Retrieval
The microservice performs semantic retrieval (105) using a large language model (LLM) (106). Semantic retrieval involves understanding the intent behind the user's query and retrieving information based on the meaning and context of the query, rather than just matching keywords.
Semantic retrieval refers to the process of understanding and extracting meaningful information from data based on the context and relationships between different pieces of data. In the context of the present system, the Enterprise Data Insights Retrieval System Utilizing Graph Databases and Large Language Models employs semantic retrieval to enhance the accuracy and efficiency of querying structured data stored in databases.
The system uses a graph-based data model where data entities and their relationships are represented as nodes and edges in a graph database. This model allows the system to capture complex relationships and context more effectively than traditional tabular data models. By understanding these relationships, the system can generate more accurate and relevant insights from the data.
To achieve semantic retrieval, the system utilizes several advanced techniques:
1. Graph-Based Data Modelling: This technique involves storing data as nodes and edges in a graph database. By representing data in this way, the system can more easily identify and understand the relationships between different data points.
2. Retrieval-Augmented Generation (RAG):
RAG plays a crucial role in ensuring high accuracy and relevance in identifying nodes and edges before generating Cypher queries. Here is how RAG is utilized within the Enterprise Data Insights Retrieval System:
• Semantic Node and Edge Identification: The system employs Retrieval-Augmented Generation (RAG) to accurately identify the nodes and edges in the graph database that are most relevant to the user's query. This process is crucial for ensuring that the system retrieves only the most pertinent data points, thereby optimizing data communication and reducing unnecessary data transfers. Here is a detailed explanation of how this process works:
1. User Query Input: The process begins when a user inputs a natural language query into the system. This query is sent as an API call to the backend services hosted on AWS ECS. The API call is handled by a microservice, which initiates the semantic retrieval process using a large language model (LLM).
2. Semantic Retrieval: The LLM performs semantic retrieval to understand the intent behind the user's query. This involves interpreting the meaning and context of the query, rather than just matching keywords. The system uses dynamic few-shot learning to improve the accuracy of the query generation process. This means that instead of passing every few shot to the LLM for each query, the system maintains a master list of few shots and dynamically selects the best matching few shots based on the context of the query.
3. Context Retrieval: The system interacts with the Neo4j Knowledge Graph Database to retrieve relevant contextual information. The graph database uses nodes and edges to represent data entities and their relationships, allowing the system to capture complex relationships and context within the data. This step ensures that the system has the necessary context to generate accurate and meaningful responses.
4. RAG for Node and Edge Identification: The system employs RAG to accurately identify the nodes and edges in the graph database that are most relevant to the user's query. RAG works by augmenting the retrieval process with generation capabilities, allowing the system to dynamically select the most relevant few shots from the master list. Each selected few shot includes instructions on how the Cypher query can be generated. This dynamic selection process ensures that the system retrieves only the most pertinent data points, reducing unnecessary data transfers and optimizing data communication.
5. Cypher Code Generation: Based on the retrieved contextual information and the selected few shots, the system generates Cypher code tailored to the graph database using the LLM. Cypher is a query language designed for querying graph databases like Neo4j. The generated Cypher code describes patterns in the graph and retrieves data based on those patterns. This step involves translating the natural language query into a precise and executable Cypher query.
6. Validation of Cypher Code: Before executing the generated Cypher code, the system includes a validation step to verify the accuracy and correctness of the query. This validation process ensures that the Cypher code accurately represents the user's intent and retrieves the correct data from the graph database. It also helps to reduce the risk of data duplication or unnecessary network communication.
7. Execution of Cypher Code: Once validated, the Cypher code is executed against the Neo4j Knowledge Graph Database. The execution process involves running the Cypher query on the graph database to retrieve the relevant data. The graph database efficiently processes the query by traversing the nodes and edges to extract the required information.
8. Answer Generation: After executing the Cypher code, the system uses the retrieved data to generate the final response. The LLM assists in formulating the response based on the data extracted from the graph database. This response is then sent back to the user, providing the requested insights or information. The answer generation process involves synthesizing the retrieved data into a coherent and contextually relevant response that addresses the user's query.
9. Logs and Integration: Throughout the entire process, the system records detailed logs of all activities, queries, and responses. These logs include information such as the user's query, the generated Cypher code, the validation results, the execution details, and the final response. The logs are stored in a secure and accessible manner, allowing for easy retrieval and analysis. Additionally, the system integrates with various data analysis and visualization tools, such as Snowflake, Power BI, and Streamlit, allowing users to further analyze and visualize the data and insights generated by the system.
In summary, the Enterprise Data Insights Retrieval System employs RAG to accurately identify the nodes and edges in the graph database that are most relevant to the user's query. By retrieving the most pertinent data points, the system ensures that only the necessary information is used to generate Cypher queries, reducing unnecessary data transfers and optimizing data communication. This process enhances the accuracy and efficiency of the system, providing users with precise and contextually relevant insights.
• Contextual Relevance: RAG helps the system maintain high contextual relevance in the generated responses. By combining retrieval-based accuracy with generation-based fluency, the system can produce responses that are both accurate and contextually appropriate, enhancing the overall user experience.
• Efficiency: By retrieving only the most relevant information and generating targeted responses, RAG helps the system minimize the volume of data that needs to be transferred over the network. This optimization improves data communication efficiency and reduces network congestion.
In summary, Retrieval-Augmented Generation (RAG) is a powerful technique that combines the strengths of retrieval-based and generation-based models to improve the accuracy and relevance of generated responses. Within the Enterprise Data Insights Retrieval System Utilizing Graph Databases and Large Language Models, RAG is used to ensure high accuracy in identifying relevant nodes and edges, maintain contextual relevance, and optimize data communication efficiency, making it a valuable component of the system.
3. Cypher Query Generation Using Large Language Models (LLMs): The system leverages LLMs to generate Cypher queries, which are tailored to the graph database. These queries are designed to be more precise and context-aware, further improving the accuracy of data retrieval.
In summary, semantic retrieval in the context of the system involves using a graph-based data model, RAG, and LLMs to accurately and efficiently retrieve meaningful information from structured data. This approach not only enhances the accuracy of insights but also improves data communication efficiency by reducing unnecessary data transfers.
5. Dynamic Few Shots
The system employs dynamic few-shot learning (107) to improve the accuracy of the query generation process. Dynamic few-shot learning is a technique in machine learning where the model is trained to make predictions or generate outputs based on a small number of examples (few shots) that are dynamically selected based on the context of the query.
In the context of the present invention, "Dynamic Few Shots" refers to a machine learning technique that enhances the system's ability to generate accurate and relevant responses to natural language queries based on a limited number of examples. This technique is particularly useful in scenarios where the system needs to adapt to new queries or contexts with minimal training data.
Here is a detailed explanation of the role and functionality of Dynamic Few Shots within the Enterprise Data Insights Retrieval System :
(a) Few-Shot Learning: Few-shot learning is a subfield of machine learning where models are trained to make predictions or generate outputs based on a small number of examples (few shots). This is in contrast to traditional machine learning models that require large amounts of training data to achieve high accuracy. Few-shot learning is particularly valuable in situations where obtaining extensive labelled data is impractical or costly.
(b) Dynamic Adaptation: The "dynamic" aspect of Dynamic Few Shots refers to the system's ability to adapt to new queries or contexts on-the-fly. Instead of relying on a static set of examples, the system dynamically selects the most relevant examples based on the context of the user's query. This allows the system to provide accurate and contextually appropriate responses even when faced with novel or previously unseen queries.
(c) Integration with Large Language Models (LLMs): The system leverages large language models (LLMs) to implement Dynamic Few Shots. LLMs are advanced AI models trained on vast amounts of text data, enabling them to understand and generate human-like language. By integrating few-shot learning capabilities into LLMs, the system can generate Cypher queries tailored to the graph database with high accuracy, even with limited training data.
(d) Contextual Relevance: Dynamic Few Shots enhances the system's ability to understand the context and intent behind the user's query. By dynamically selecting relevant examples, the system can generate more precise and context-aware Cypher queries. This improves the overall accuracy and relevance of the insights retrieved from the graph database.
(e) Efficiency and Scalability: The use of Dynamic Few Shots allows the Enterprise Data Insights Retrieval System to efficiently handle a wide range of queries without the need for extensive retraining. This makes the system more scalable and adaptable to different enterprise environments and data structures. The ability to dynamically adapt to new queries ensures that the system remains effective and responsive, even as the underlying data and user requirements evolve.
In summary, Dynamic Few Shots is a powerful technique that enhances the system’s ability to generate accurate and contextually relevant responses to natural language queries. By dynamically selecting relevant examples and integrating few-shot learning capabilities into LLMs, the system can provide high-quality insights with minimal training data, making it a valuable tool for enterprise environments with diverse and evolving data needs.
6. Neo4j Knowledge Graph Database
The core of the system is the Neo4j knowledge graph database (108, 109). Neo4j is a graph database management system that uses graph structures with nodes, edges, and properties to represent and store data. The knowledge graph database allows for efficient representation and querying of complex relationships within the data.
In the context of the present invention, the Neo4j Knowledge Graph Database plays a central role in storing, managing, and querying structured data. Neo4j is a graph database management system that uses graph structures with nodes, edges, and properties to represent and store data. This graph-based approach is particularly well-suited for capturing complex relationships and context within the data, which is essential for generating accurate and meaningful insights from natural language queries.
Here is a detailed explanation of the role and functionality of the Neo4j Knowledge Graph Database within the Enterprise Data Insights Retrieval System:
(a) Graph-Based Data Modelling: Unlike traditional relational databases that use tables to store data, Neo4j uses graph structures to represent data entities and their relationships. In Neo4j, data entities are represented as nodes, and the relationships between them are represented as edges. This graph-based data model allows for more efficient representation and querying of complex relationships, making it easier to capture the context and semantics of the data.
(b) Efficient Querying with Cypher: Neo4j uses a query language called Cypher, which is specifically designed for querying graph databases. Cypher allows users to describe patterns in graphs and retrieve data based on those patterns. In the context of GBIRS, Cypher queries are generated using large language models (LLMs) to accurately retrieve relevant data from the graph database. The use of Cypher enables precise and context-aware querying, which is essential for generating accurate insights from structured data.
(c) Context Retrieval: One of the key advantages of using a graph database like Neo4j is its ability to efficiently retrieve contextual information. When a user inputs a natural language query, the system leverages the graph database to retrieve relevant contextual information that helps in understanding the query's intent and generating accurate responses. The graph structure allows the system to traverse relationships and retrieve interconnected data points, providing a comprehensive view of the data.
(d) Semantic Node and Edge Identification: The system employs Retrieval-Augmented Generation (RAG) techniques to ensure high accuracy in identifying relevant nodes and edges before generating Cypher queries. This process ensures that only the most pertinent data is retrieved, reducing unnecessary data transfers and optimizing data communication. The graph database's ability to represent and query complex relationships is crucial for this semantic identification process.
(e) Scalability and Flexibility: Neo4j's graph-based approach provides scalability and flexibility, making it suitable for enterprise environments with large and complex datasets. The system can efficiently handle a high volume of queries and data, ensuring that it remains responsive and effective even as the data grows and evolves. The flexibility of the graph model allows the system to adapt to different data structures and requirements, making it a versatile solution for various use cases.
(f) Integration with Other Components: The Neo4j Knowledge Graph Database is seamlessly integrated with other components of the system, such as the insight generation system, query generation module, and query validation module. This integration ensures that the system can efficiently process natural language queries, generate accurate Cypher queries, validate them, and retrieve relevant insights from the graph database.
In summary, the Neo4j Knowledge Graph Database is a fundamental component of the present invention, enabling efficient storage, management, and querying of structured data. Its graph-based data model, efficient querying with Cypher, context retrieval capabilities, and scalability make it an ideal choice for generating accurate and meaningful insights from natural language queries in enterprise environments.
7. Context Retrieval
The system retrieves relevant contextual information from the knowledge graph database (110) to provide more accurate and meaningful responses to the user's query.
In the context of the present invention, "Context Retrieval" refers to the process of extracting relevant contextual information from the Neo4j Knowledge Graph Database to provide accurate and meaningful responses to user queries. This step is crucial for understanding the intent behind the user's natural language query and generating precise Cypher queries that retrieve the most pertinent data from the graph database.
The diagram illustrates the role of context retrieval within the Enterprise Data Insights Retrieval System.
8. Cypher Code Generation and Execution
The system generates Cypher code (111) tailored to the graph database using the LLM. Cypher is a query language for graph databases, such as Neo4j, used to describe patterns in graphs and retrieve data based on those patterns. The generated Cypher code is then sent to the database for execution.
In the context of the present invention, the Neo4j Graph Database is utilized, which employs the Cypher query language specifically designed for operations within Graph Database Management Systems (DBMS). The process commences with a Large Language Model (LLM) that is equipped with dynamic few-shot learning capabilities, as described in Point 2. This configuration enables the LLM to produce relevant Cypher code examples based on the user's query, augmented by contextual insights from existing graph schema.
The diagram illustrates the role of Cypher code generation and execution within the Enterprise Data Insights Retrieval System.
Upon the generation of Cypher code by the LLM, an initial verification step is executed to ascertain the semantic correctness of the generated code. Following this validation, the Cypher code is executed against the Neo4j database to retrieve contextual data.
If the resulting context is empty, the system proactively responds to the user with a static suggestion to rephrase their query for improved clarity or relevance. Conversely, if the execution yields pertinent context from the database, this data is subsequently fed back to the LLM, along with dynamic few-shot examples, to facilitate the generation of a comprehensive answer.
This methodology outlines a systematic approach for the generation and validation of Cypher code, ensuring both accuracy in query execution and relevance in response formulation.
In summary, Cypher code generation and execution in the context of the present invention involve translating natural language queries into precise Cypher queries, validating them, and executing them against the Neo4j Knowledge Graph Database. This process enables the system to accurately retrieve relevant data and provide meaningful insights to the user.
9. Cypher Code & Answer Generation
The LLM also assists in generating the final response (112) based on the executed Cypher code. This response is then sent back to the user.
In the context of the present invention, "Cypher Code & Answer Generation" refers to the process of creating Cypher queries to retrieve relevant data from the Neo4j Knowledge Graph Database and generating a meaningful response based on the retrieved data. This process is essential for transforming natural language queries into precise database queries and providing accurate and contextually relevant answers to the user.
The diagram illustrates the role of Cypher code and answer generation within the System .
10. Question Classification & Improvement
The system includes a module for question classification and improvement (113). This module categorizes the query into specific types and refines it to ensure that it is accurately understood and processed by the system.
In the context of the present invention, "Question Classification & Improvement" refers to the process of categorizing and refining user queries to ensure they are accurately understood and effectively processed by the system. This step is crucial for enhancing the accuracy and relevance of the responses generated by the system.
The diagram illustrates the role of question classification and improvement within the system.
In summary, question classification and improvement in the context of the present invention involve categorizing and refining user queries to ensure they are accurately understood and effectively processed. This process enhances the accuracy and relevance of the responses generated by the system, enabling it to provide meaningful insights to the user.
11. Final Response
The final response (114) is generated and sent back to the user, providing the requested insights or information.
In the context of the present invention, the "Final Response" refers to the output generated by the Enterprise Data Insights Retrieval System in response to a user's natural language query. This response is the culmination of a series of processes that transform the user's query into a precise and contextually relevant answer, leveraging advanced technologies such as large language models (LLMs), semantic retrieval, and graph databases.
The diagram illustrates the role of the final response within the system.
In summary, the final response in the context of the present invention is the output generated by the system in response to a user's natural language query. This response is the result of a series of processes, including semantic retrieval, context retrieval, Cypher code generation, validation, execution, and answer generation. The final response provides accurate and contextually relevant insights to the user, leveraging advanced technologies and the graph-based data model of the Neo4j Knowledge Graph Database.
12. Logs Saved
The system records and stores logs (115) of all activities, queries, and responses for monitoring, debugging, and auditing purposes.
In the context of the present invention, "Logs Saved" refers to the process of recording and storing detailed logs of all activities, queries, and responses within the Enterprise Data Insights Retrieval System. These logs serve multiple purposes, including monitoring system performance, debugging issues, auditing user interactions, and ensuring compliance with data governance policies.
The diagram illustrates the role of logs saved within the system.
(a) Logs Saved: Throughout the entire process, the system records detailed logs of all activities, queries, and responses. These logs include information such as the user's query, the generated Cypher code, the validation results, the execution details, and the final response. The logs are stored in a secure and accessible manner, allowing for easy retrieval and analysis.
(b) Monitoring and Debugging: The saved logs are used for monitoring the system's performance and identifying any issues or bottlenecks. By analyzing the logs, system administrators can detect anomalies, diagnose problems, and implement corrective actions to ensure the system operates smoothly and efficiently.
(c) Auditing and Compliance: The logs also serve as an audit trail, providing a record of all user interactions and system activities. This is important for compliance with data governance policies and regulatory requirements. The logs can be reviewed to ensure that the system is being used appropriately and that data privacy and security standards are being maintained.
(d) Data Analysis and Insights: The saved logs can be analyzed to gain insights into user behavior, query patterns, and system performance. This information can be used to improve the system's functionality, enhance the user experience, and optimize the underlying algorithms and processes.
Example 1: User Behavior Analysis
By analyzing the saved logs, the system can identify patterns in user behavior, such as the most frequently asked questions, common query structures, and peak usage times. For instance, if the logs reveal that a significant number of users frequently ask about sales performance metrics during the end of each quarter, the system can be optimized to pre-fetch and cache relevant data during these periods. This proactive approach can reduce query response times and enhance the user experience by providing faster and more efficient access to the requested information.
Example 2: Query Pattern Analysis
The saved logs can also be used to analyze query patterns, such as the types of queries that result in errors or require additional clarification. For example, if the logs show that users often struggle with queries related to complex joins or nested data structures, the system can be improved by providing better guidance or examples for these types of queries. Additionally, the underlying algorithms can be optimized to handle these complex queries more effectively, reducing the likelihood of errors and improving overall system performance.
Example 3: System Performance Optimization
By examining the logs for system performance metrics, such as query processing times, resource utilization, and error rates, the system can identify bottlenecks and areas for improvement. For instance, if the logs indicate that certain types of queries consistently take longer to process, the system can be optimized by refining the query execution plans or upgrading the hardware resources allocated to those queries. This continuous monitoring and optimization process ensures that the system remains efficient and responsive, providing a better experience for users.
In summary, "Logs Saved" in the context of the present invention refers to the comprehensive recording and storage of all activities, queries, and responses within GBIRS. These logs are essential for monitoring, debugging, auditing, compliance, and data analysis, ensuring the system operates effectively and meets the needs of its users.
13. Integration with Data Analysis Tools
In the context of the present invention, "Integration with Data Analysis Tools" refers to the capability of the Enterprise Data Insights Retrieval System to seamlessly connect and work with various data analysis and visualization platforms. This integration enhances the system's functionality by allowing users to further analyze, visualize, and derive insights from the data retrieved and processed by the system.
The diagram illustrates the role of integration with data analysis tools within GBIRS. Here is a detailed explanation of this process:
(a) Logs Saved: Throughout the entire process, the system records detailed logs of all activities, queries, and responses. These logs include information such as the user's query, the generated Cypher code, the validation results, the execution details, and the final response. The logs are stored in a secure and accessible manner, allowing for easy retrieval and analysis.
(b) Integration with Data Analysis Tools: The system integrates with various data analysis and visualization tools, such as Snowflake, Power BI, and Streamlit. This integration allows users to further analyze and visualize the data and insights generated by Enterprise Data Insights Retrieval System. Here is how each tool contributes to the overall functionality:
1. Snowflake: Snowflake is a cloud-based data warehousing platform that allows organizations to store, manage, and analyze large volumes of data in a scalable and efficient manner. By integrating with Snowflake, the Enterprise Data Insights Retrieval System can leverage its powerful data storage and processing capabilities to handle large datasets and perform complex analyses.
2. Power BI: Power BI is a business analytics service by Microsoft that provides interactive visualizations and business intelligence capabilities with an interface for end users to create their own reports and dashboards. Integration with Power BI enables users to create custom visualizations and dashboards based on the data and insights generated by the Enterprise Data Insights Retrieval System , facilitating better decision-making and data-driven strategies. This dashboard sources its data from a Snowflake table, which is populated with logs of user queries and responses from the LLM. These logs are transferred daily from a PostgreSQL table using an ETL process. Although the data in Power BI is not real-time, it allows users to analyze comprehensive usage metrics with a historical perspective.
3. Streamlit: Streamlit is an open-source app framework for creating and sharing data science and machine learning web applications quickly and easily. By integrating with Streamlit, the system allows users to build interactive web applications that showcase the insights and analyses generated by the system. This enhances the accessibility and usability of the data for a wider audience.
In summary, integration with data analysis tools in the context of the present invention enhances the functionality of the Enterprise Data Insights Retrieval System by allowing users to further analyze, visualize, and derive insights from the data. This integration with platforms like Snowflake, Power BI, and Streamlit ensures that users can leverage the full potential of the data and insights generated by the system, facilitating better decision-making and data-driven strategies.
The system integrates with various data analysis and visualization tools, such as Snowflake (116), Power BI (117), and Streamlit (118). These tools allow users to further analyze and visualize the data and insights generated by the system.
Overall, the Enterprise Data Insights Retrieval System Utilizing Graph Databases and Large Language Models represents a significant advancement in both data insights retrieval and network communication. By optimizing how data is queried, retrieved, and processed, the system not only enhances the accuracy of insights but also improves data communication efficiency, particularly at the TCP/IP layer. This dual impact makes the system of the present invention a valuable tool for enterprise environments, where both accurate data insights and efficient communication are critical.
In another embodiment of the present invention the method involves a series of steps for generating insights from structured data using natural language queries. This method leverages advanced technologies such as large language models (LLMs), semantic retrieval, and graph databases to provide accurate and contextually relevant responses to user queries. The following is a detailed description of the method embodiment:
(a) Receiving a Natural Language Query: The method begins with receiving a natural language query from a user. The user inputs the query through a user interface, which can be a web application, mobile app, or any other platform that supports natural language input. The query is then sent as an API call to the backend services hosted on AWS ECS (Amazon Web Services Elastic Container Service).
(b) Saving Logs: Throughout the entire process, the system records detailed logs of all activities, queries, and responses. These logs include information such as the user's query, the generated Cypher code, the validation results, the execution details, and the final response. The logs are stored in a secure and accessible manner, allowing for easy retrieval and analysis. The saved logs are used for monitoring the system's performance, debugging issues, auditing user interactions, and ensuring compliance with data governance policies.
(c) Integrating with Data Analysis Tools: The system integrates with various data analysis and visualization tools, such as Snowflake, Power BI, and Streamlit. This integration allows users to further analyze and visualize the data and insights generated by the system. By leveraging these tools, users can create custom visualizations, dashboards, and interactive web applications that showcase the insights and analyses generated by the system.
In summary, the method embodiment of the present invention involves a series of steps for generating insights from structured data using natural language queries. This method leverages advanced technologies such as LLMs, semantic retrieval, and graph databases to provide accurate and contextually relevant responses to user queries. The method includes receiving and processing the query, performing semantic retrieval, retrieving contextual information, generating and validating Cypher code, executing the query, generating the final response, saving logs, and integrating with data analysis tools.
In another embodiment of the present invention the apparatus of the present invention refers to the physical and logical components that make up the Enterprise Data Insights Retrieval System . This apparatus is designed to generate insights from structured data using natural language queries, leveraging advanced technologies such as large language models (LLMs), semantic retrieval, and graph databases. The physical and logical components are already described herein above under the system embodiment. The apparatus includes components for receiving and processing user queries, generating and validating Cypher code, executing queries against a knowledge graph database, and converting structured data into human-readable responses. The apparatus also includes modules for logging, analytics, and user feedback, ensuring continuous improvement and optimal performance.
Another embodiment of the present invention relates to a computer program for generating insights from structured data using natural language queries. The computer program comprises instructions that, when executed by a processor, cause the system to perform the steps of:
(a) receiving a natural language query from a user;
(b) processing the query through an API call to a backend service hosted on a container orchestration platform;
(c) performing semantic retrieval using a large language model (LLM) to understand the intent behind the query;
(d) employing dynamic few-shot learning to improve the accuracy of the query generation process;
(e) retrieving relevant contextual information from a knowledge graph database;
(f) generating Cypher code tailored to the graph database using the LLM;
(g) validating the generated Cypher code to ensure accuracy;
(h) executing the validated Cypher code against the knowledge graph database to retrieve relevant data;
(i) generating a final response based on the retrieved data; and
(j) sending the final response back to the user.
A further embodiment of the present invention relates to a non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the system to perform the steps of the method for generating insights from structured data using natural language queries as described above.
Another embodiment of the present invention relates to a computer program product comprising a computer-readable storage medium having computer-readable program code embodied therein. The computer-readable program code comprises instructions that, when executed by a processor, cause the system to perform the steps of the method for generating insights from structured data using natural language queries as described above.
Another embodiment of the present invention relates to an artificial intelligence (AI) system for generating insights from structured data using natural language queries. The AI system comprises:
(a) a large language model (LLM) configured to perform semantic retrieval and understand the intent behind user queries;
(b) a dynamic few-shot learning module configured to improve the accuracy of the query generation process;
(c) a knowledge graph database for storing and retrieving contextual information;
(d) a Cypher code generation module configured to generate Cypher code tailored to the graph database using the LLM;
(e) a query validation module configured to validate the generated Cypher code; and
(f) a query execution module configured to execute the validated Cypher code against the knowledge graph database and generate a final response based on the retrieved data.
An embodiment of the present invention relates to a machine learning system for generating insights from structured data using natural language queries. The machine learning system comprises:
(a) a processor configured to execute machine learning algorithms;
(b) a memory storing machine learning models, including a large language model (LLM) and a dynamic few-shot learning model;
(c) a knowledge graph database for storing and retrieving contextual information;
(d) a Cypher code generation module configured to generate Cypher code tailored to the graph database using the LLM;
(e) a query validation module configured to validate the generated Cypher code; and
(f) a query execution module configured to execute the validated Cypher code against the knowledge graph database and generate a final response based on the retrieved data.
Advantages of the Present Invention:
1. Enhanced Query Accuracy:
The use of large language models (LLMs) for semantic retrieval and Cypher query generation ensures a high level of accuracy in understanding and processing natural language queries. This leads to more precise and relevant insights from structured data.
2. Efficient Data Representation:
The graph-based data modeling approach, utilizing Neo4j knowledge graph database, allows for efficient representation and querying of complex relationships within the data. This enhances the system's ability to capture and utilize contextual information.
3. Reduced Data Transfer:
By processing data locally and minimizing the need for large data transfers between systems, the invention optimizes data communication efficiency. This reduces network congestion and strain on the TCP/IP layer, leading to faster and more reliable data transfers.
4. Scalability and Flexibility:
The microservice architecture and dynamic few-shot learning capabilities enable the system to scale efficiently and adapt to different enterprise environments and data structures. This ensures that the system remains responsive and effective even as data volumes and user requirements evolve.
5. Improved User Experience:
The ability to input queries in natural language makes the system accessible to a broader range of users, including those without technical expertise in database querying. This significantly lowers the barrier to accessing and retrieving valuable insights from structured data.
6. Integration with Data Analysis Tools:
The system's integration with data analysis and visualization tools such as Snowflake, Power BI, and Streamlit allows users to further analyze and visualize the data and insights generated by the system. This enhances the overall utility and value of the insights provided.
7. Robust Logging and Monitoring:
The system records and stores detailed logs of all activities, queries, and responses, enabling effective monitoring, debugging, and auditing. This ensures system reliability and compliance with data governance policies.
8. Real-Time Insights:
The system's ability to process and analyze data in real-time allows for timely detection of trends, anomalies, and patterns. This enables proactive decision-making and timely interventions in enterprise environments.
9. Enhanced Data Privacy and Security:
Advanced privacy-preserving techniques and secure data handling mechanisms ensure that sensitive information is protected while still enabling valuable insights to be generated. This addresses growing concerns around data privacy and security.
10. Versatility Across Domains:
The principles and technologies underlying the invention can be applied to various domains beyond enterprise data analytics, including healthcare, finance, and smart cities. This versatility makes the system a valuable tool for a wide range of applications.
11. Continuous Learning and Adaptation:
The system's ability to continuously learn from new data, user interactions, and feedback ensures that it evolves and improves over time. This leads to ongoing enhancements in query accuracy, system performance, and user satisfaction.
In summary, the present invention offers numerous advantages, including enhanced query accuracy, efficient data representation, reduced data transfer, scalability, improved user experience, integration with data analysis tools, robust logging and monitoring, real-time insights, enhanced data privacy and security, versatility across domains, and continuous learning and adaptation. These advantages make the Enterprise Data Insights Retrieval System a powerful and valuable tool for generating insights from structured data using natural language queries.
The present invention has been described in terms of specific embodiments and applications. However, it is to be understood that the scope and spirit of the invention are not limited to the particular forms or methods disclosed. Various modifications, adaptations, and variations may be made without departing from the broader scope and spirit of the invention as defined by the appended claims and their equivalents. The invention encompasses all such modifications, adaptations, and variations that fall within the scope of the claims. The descriptions provided herein are intended to illustrate the principles of the invention and its practical applications, and are not intended to be exhaustive or to limit the invention to the precise forms disclosed. The scope of the invention is to be determined by the claims appended hereto and their equivalents, rather than by the specific examples provided.
The present invention, the Enterprise Data Insights Retrieval System Utilizing Graph Databases and Large Language Models, holds significant potential for future advancements and applications. As data analytics and natural language processing technologies continue to evolve, several future possibilities can be envisioned for the invention:
1. Enhanced Integration with Emerging Technologies: The Enterprise Data Insights Retrieval System can be further integrated with emerging technologies such as quantum computing, blockchain, and edge computing. Quantum computing could enhance the processing speed and efficiency of complex queries, while blockchain could provide secure and transparent data transactions. Edge computing could enable real-time data processing and insights generation at the edge of the network, reducing latency and improving responsiveness.
2. Expansion to Multimodal Data: The system can be extended to handle multimodal data, including text, images, audio, and video. By incorporating advanced machine learning models capable of processing and analyzing different data types, the Enterprise Data Insights Retrieval System could provide more comprehensive and enriched insights, catering to a wider range of applications and industries.
3. Personalized Insights and Recommendations: Future iterations of the Enterprise Data Insights Retrieval System could incorporate user profiling and personalization techniques to deliver tailored insights and recommendations. By understanding user preferences, behavior, and historical interactions, the system could provide more relevant and actionable insights, enhancing user experience and decision-making.
4. Advanced Natural Language Understanding: As natural language processing (NLP) technologies advance, the Enterprise Data Insights Retrieval System could leverage more sophisticated NLP models to improve the accuracy and depth of understanding of user queries. This could include better handling of context, sentiment analysis, and the ability to process more complex and nuanced queries.
5. Real-Time Data Insights: The system could be enhanced to provide real-time data insights and alerts. By continuously monitoring data streams and applying real-time analytics, the system could detect anomalies, trends, and patterns as they occur, enabling proactive decision-making and timely interventions.
6. Cross-Domain Applications: The principles and technologies underlying Enterprise Data Insights Retrieval System can be applied to various domains beyond enterprise data analytics. Potential applications include healthcare (e.g., patient data analysis and personalized treatment recommendations), finance (e.g., fraud detection and risk assessment), and smart cities (e.g., urban planning and resource management).
7. Collaborative and Interactive Features: Future versions of the system could incorporate collaborative and interactive features, allowing multiple users to interact with the system simultaneously. This could include collaborative query building, shared dashboards, and real-time collaboration on data analysis and insights generation.
8. Enhanced Data Privacy and Security: As data privacy and security concerns continue to grow, GBIRS could incorporate advanced privacy-preserving techniques such as differential privacy, federated learning, and secure multi-party computation. These techniques would ensure that sensitive data is protected while still enabling valuable insights to be generated.
9. Integration with IoT Devices: The system could be integrated with Internet of Things (IoT) devices to collect and analyze data from a wide range of sensors and connected devices. This would enable GBIRS to provide insights and recommendations based on real-time data from physical environments, enhancing applications in areas such as industrial automation, smart homes, and environmental monitoring.
10. Continuous Learning and Adaptation: Future iterations of the system could incorporate continuous learning and adaptation mechanisms, allowing the system to evolve and improve over time. By continuously learning from new data, user interactions, and feedback, the system could refine its models, improve query accuracy, and provide increasingly valuable insights.
In conclusion, the system presents a robust foundation for generating insights from structured data using natural language queries. The future possibilities outlined above highlight the potential for the system to evolve and expand, addressing new challenges and opportunities in data analytics, natural language processing, and network communication. The invention's adaptability and scalability ensure that it can remain at the forefront of technological advancements, providing valuable insights and driving innovation across various domains.
The information provided in this patent specification is intended to describe the invention and its potential applications. It is not intended to limit the scope of the invention, which is defined by the claims. The embodiments and examples provided are illustrative and not exhaustive. Variations and modifications may be made without departing from the spirit and scope of the invention. The inventor and assignees are not responsible for any misuse or unintended application of the invention. Users are advised to comply with all applicable regulations and guidelines when using the invention for specific purposes.
, Claims:
1. An Enterprise Data Insights Retrieval System Utilizing Graph Databases and Large Language Models , comprising:
a user interface configured to receive natural language queries from a user;
a microservice hosted on a container orchestration platform, configured to handle API calls from the user interface;
a large language model (LLM) configured to perform semantic retrieval and understand the intent behind the user's query;
a dynamic few-shot learning module configured to improve the accuracy of the query generation process;
a knowledge graph database for storing and retrieving contextual information;
a Cypher code generation module configured to generate Cypher code tailored to the graph database using the LLM;
a query validation module configured to validate the generated Cypher code;
a query execution module configured to execute the validated Cypher code against the knowledge graph database and retrieve relevant data;
a question-answering system configured to convert the structured data into a human-readable natural language answer;
an asynchronous logging and analytics module configured to monitor system performance and user interactions.
2. The system of claim 1, wherein the user interface is a web application.
3. The system of claim 1, wherein the user interface is a mobile application.
4. The system of claim 1, wherein the container orchestration platform is AWS ECS.
5. The system of claim 1, wherein the large language model (LLM) is trained on vast amounts of text data to understand and generate human-like language.
6. The system of claim 1, wherein the dynamic few-shot learning module dynamically selects the best matching few shots from a master list using semantic retrieval.
7. The system of claim 1, wherein the knowledge graph database is Neo4j.
8. The system of claim 1, wherein the Cypher code generation module generates Cypher queries based on the structure of nodes and relationships within the graph database.
9. The system of claim 1, wherein the query validation module verifies the accuracy and correctness of the generated Cypher code.
10. The system of claim 1, wherein the query execution module retrieves data in a structured, machine-readable format.
11. The system of claim 1, wherein the question-answering system formats the structured data into a generative natural language response.
12. The system of claim 1, wherein the asynchronous logging and analytics module provides real-time dashboards for monitoring usage, cost, and data availability.
13. The system of claim 1, wherein the asynchronous logging and analytics module captures user feedback to improve system performance.
14. The system of claim 1, further comprising an integration module for connecting with data analysis and visualization tools.
15. The system of claim 1, wherein the data analysis and visualization tools include Snowflake, Power BI, and Streamlit.
16. The system of claim 1, wherein the system is designed to scale easily and handle multiple queries simultaneously.
17. The system of claim 1, wherein the system employs advanced privacy-preserving techniques to ensure data security.
18. An Enterprise Data Insights Retrieval method Utilizing Graph Databases and Large Language Models, comprising:
receiving a natural language query from a user;
processing the query through an API call to a backend service hosted on a container orchestration platform;
performing semantic retrieval using a large language model (LLM) to understand the intent behind the query;
employing dynamic few-shot learning to improve the accuracy of the query generation process;
retrieving relevant contextual information from a knowledge graph database;
generating Cypher code tailored to the graph database using the LLM;
validating the generated Cypher code to ensure accuracy;
executing the validated Cypher code against the knowledge graph database to retrieve relevant data;
generating a final response based on the retrieved data;
sending the final response back to the user;
recording and storing logs of all activities, queries, and responses for monitoring, debugging, and auditing purposes.
19. The method of claim 18, wherein the natural language query is received through a web application.
20. The method of claim 18, wherein the natural language query is received through a mobile application.
21. The method of claim 18, wherein the backend service is hosted on AWS ECS.
22. The method of claim 18, wherein the semantic retrieval is performed using a large language model (LLM) trained on vast amounts of text data.
23. The method of claim 18, wherein the dynamic few-shot learning involves selecting the best matching few shots from a master list using semantic retrieval.
24. The method of claim 18, wherein the knowledge graph database is Neo4j.
25. The method of claim 18, wherein the Cypher code is generated based on the structure of nodes and relationships within the graph database.
26. The method of claim 18, wherein the validation of the Cypher code includes verifying its accuracy and correctness.
27. The method of claim 18, wherein the execution of the Cypher code retrieves data in a structured, machine-readable format.
28. The method of claim 18, wherein the final response is formatted into a generative natural language response.
29. The method of claim 18, wherein the logs include information such as query, source, response, cost, time, and errors.
30. The method of claim 18, wherein the logs are used for real-time monitoring and analytics.
31. The method of claim 18, wherein the method includes capturing user feedback to improve system performance.
32. The method of claim 18, wherein the method includes integrating with data analysis and visualization tools.
33. The method of claim 18, wherein the data analysis and visualization tools include Snowflake, Power BI, and Streamlit.
34. The method of claim 18, wherein the method employs advanced privacy-preserving techniques to ensure data security.
35. An Enterprise Data Insights Retrieval device utilizing graph databases and Large Language Models, comprising:
a processor configured to execute machine learning algorithms;
a memory storing machine learning models, including a large language model (LLM) and a dynamic few-shot learning model;
a knowledge graph database for storing and retrieving contextual information;
a Cypher code generation module configured to generate Cypher code tailored to the graph database using the LLM;
a query validation module configured to validate the generated Cypher code;
a query execution module configured to execute the validated Cypher code against the knowledge graph database and retrieve relevant data;
a question-answering system configured to convert the structured data into a human-readable natural language answer;
an asynchronous logging and analytics module configured to monitor system performance and user interactions.
36. The device of claim 35, wherein the processor is configured to execute machine learning algorithms in parallel.
37. The device of claim 35, wherein the memory stores pre-trained large language models (LLMs).
38. The device of claim 35, wherein the knowledge graph database is stored on a cloud-based platform.
39. The device of claim 35, wherein the Cypher code generation module is implemented as a software component.
40. The device of claim 35, wherein the query validation module includes a rule-based engine for verifying Cypher code accuracy.
41. The device of claim 35, wherein the query execution module is optimized for low-latency data retrieval.
42. The device of claim 35, wherein the question-answering system includes a natural language generation (NLG) component.
43. The device of claim 35, wherein the asynchronous logging and analytics module includes a data visualization dashboard.
44. The device of claim 35, wherein the device is configured to integrate with external data sources.
45. The device of claim 35, wherein the device supports real-time data processing.
46. The device of claim 35, wherein the device includes a user feedback mechanism for continuous improvement.
47. The device of claim 35, wherein the device employs encryption techniques for secure data storage.
48. The device of claim 35, wherein the device supports multi-user access with role-based permissions.
49. The device of claim 35, wherein the device includes a monitoring system for tracking system performance.
50. The device of claim 35, wherein the device is designed to be energy-efficient.
51. The device of claim 35, wherein the device includes a backup and recovery system for data protection.
52. An Enterprise Data Insights Retrieval apparatus Utilizing Graph Databases and Large Language Models, comprising:
a processor configured to execute machine learning algorithms;
a memory storing machine learning models, including a large language model (LLM) and a dynamic few-shot learning model;
a user interface configured to receive natural language queries from a user;
a microservice hosted on a container orchestration platform, configured to handle API calls from the user interface;
a knowledge graph database for storing and retrieving contextual information;
a Cypher code generation module configured to generate Cypher code tailored to the graph database using the LLM;
a query validation module configured to validate the generated Cypher code;
a query execution module configured to execute the validated Cypher code against the knowledge graph database and retrieve relevant data;
a question-answering system configured to convert the structured data into a human-readable natural language answer;
an asynchronous logging and analytics module configured to monitor system performance and user interactions.
53. The apparatus of claim 52, wherein the processor is configured to execute machine learning algorithms in parallel.
54. The apparatus of claim 52, wherein the memory stores pre-trained large language models (LLMs).
55. The apparatus of claim 52, wherein the user interface is a web application.
56. The apparatus of claim 52, wherein the user interface is a mobile application.
57. The apparatus of claim 52, wherein the container orchestration platform is AWS ECS.
58. The apparatus of claim 52, wherein the knowledge graph database is stored on a cloud-based platform.
59. The apparatus of claim 52, wherein the Cypher code generation module is implemented as a software component.
60. The apparatus of claim 52, wherein the query validation module includes a rule-based engine for verifying Cypher code accuracy.
61. The apparatus of claim 52, wherein the query execution module is optimized for low-latency data retrieval.
62. The apparatus of claim 52, wherein the question-answering system includes a natural language generation (NLG) component.
63. The apparatus of claim 52, wherein the asynchronous logging and analytics module includes a data visualization dashboard.
64. The apparatus of claim 52, wherein the apparatus is configured to integrate with external data sources.
65. The apparatus of claim 52, wherein the apparatus supports real-time data processing.
66. The apparatus of claim 52, wherein the apparatus includes a user feedback mechanism for continuous improvement.
67. The apparatus of claim 52, wherein the apparatus employs encryption techniques for secure data storage.
68. The apparatus of claim 52, wherein the apparatus supports multi-user access with role-based permissions.
69. The apparatus of claim 52, wherein the apparatus includes a monitoring system for tracking system performance.
70. The apparatus of claim 52, wherein the apparatus is designed to be energy-efficient.
71. The apparatus of claim 52, wherein the apparatus includes a backup and recovery system for data protection.
72. A computer program for generating insights from structured data using natural language queries, the computer program comprising instructions that, when executed by a processor, cause the system to:
receive a natural language query from a user;
process the query through an API call to a backend service hosted on a container orchestration platform;
perform semantic retrieval using a large language model (LLM) to understand the intent behind the query;
employ dynamic few-shot learning to improve the accuracy of the query generation process;
retrieve relevant contextual information from a knowledge graph database;
generate Cypher code tailored to the graph database using the LLM;
validate the generated Cypher code to ensure accuracy;
execute the validated Cypher code against the knowledge graph database to retrieve relevant data;
generate a final response based on the retrieved data;
send the final response back to the user;
record and store logs of all activities, queries, and responses for monitoring, debugging, and auditing purposes.
73. A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the system to:
receive a natural language query from a user;
process the query through an API call to a backend service hosted on a container orchestration platform;
perform semantic retrieval using a large language model (LLM) to understand the intent behind the query;
employ dynamic few-shot learning to improve the accuracy of the query generation process;
retrieve relevant contextual information from a knowledge graph database;
generate Cypher code tailored to the graph database using the LLM;
validate the generated Cypher code to ensure accuracy;
execute the validated Cypher code against the knowledge graph database to retrieve relevant data;
generate a final response based on the retrieved data;
send the final response back to the user;
record and store logs of all activities, queries, and responses for monitoring, debugging, and auditing purposes.
74. A computer program product comprising a computer-readable storage medium having computer-readable program code embodied therein, the computer-readable program code comprising instructions that, when executed by a processor, cause the system to:
receive a natural language query from a user;
process the query through an API call to a backend service hosted on a container orchestration platform;
perform semantic retrieval using a large language model (LLM) to understand the intent behind the query;
employ dynamic few-shot learning to improve the accuracy of the query generation process;
retrieve relevant contextual information from a knowledge graph database;
generate Cypher code tailored to the graph database using the LLM;
validate the generated Cypher code to ensure accuracy;
execute the validated Cypher code against the knowledge graph database to retrieve relevant data;
generate a final response based on the retrieved data;
send the final response back to the user;
record and store logs of all activities, queries, and responses for monitoring, debugging, and auditing purposes.
| # | Name | Date |
|---|---|---|
| 1 | 202521005092-STATEMENT OF UNDERTAKING (FORM 3) [22-01-2025(online)].pdf | 2025-01-22 |
| 2 | 202521005092-REQUEST FOR EXAMINATION (FORM-18) [22-01-2025(online)].pdf | 2025-01-22 |
| 3 | 202521005092-REQUEST FOR EARLY PUBLICATION(FORM-9) [22-01-2025(online)].pdf | 2025-01-22 |
| 4 | 202521005092-FORM-9 [22-01-2025(online)].pdf | 2025-01-22 |
| 5 | 202521005092-FORM 18 [22-01-2025(online)].pdf | 2025-01-22 |
| 6 | 202521005092-FORM 1 [22-01-2025(online)].pdf | 2025-01-22 |
| 7 | 202521005092-DRAWINGS [22-01-2025(online)].pdf | 2025-01-22 |
| 8 | 202521005092-COMPLETE SPECIFICATION [22-01-2025(online)].pdf | 2025-01-22 |
| 9 | Abstract.jpg | 2025-02-11 |
| 10 | 202521005092-RELEVANT DOCUMENTS [09-04-2025(online)].pdf | 2025-04-09 |
| 11 | 202521005092-FORM 13 [09-04-2025(online)].pdf | 2025-04-09 |
| 12 | 202521005092-AMENDED DOCUMENTS [09-04-2025(online)].pdf | 2025-04-09 |
| 13 | 202521005092-FORM-26 [14-04-2025(online)].pdf | 2025-04-14 |
| 14 | 202521005092-Response to office action [22-04-2025(online)].pdf | 2025-04-22 |
| 15 | 202521005092-Proof of Right [21-07-2025(online)].pdf | 2025-07-21 |