A System And Method For Unified Querying Of Heterogenous Enterprise

< Back

A System And Method For Unified Querying Of Heterogenous Enterprise Data Sources Using Polyglot Query Graph

Abstract: ABSTRACT: Title: A SYSTEM AND METHOD FOR UNIFIED QUERYING OF HETEROGENOUS ENTERPRISE DATA SOURCES USING POLYGLOT QUERY GRAPH A system and method for unified querying of heterogenous enterprise data sources using polyglot query graph; the system comprising an input unit (10), a processing unit (20) with a query API and planner module (21), a subquery executors module (22), a fusion and alignment module (23), a ranking module (24), a materializer module (25), and a governance, security and feedback loop module (26) and an output unit (30). The query API and planner receives high-level queries in GraphQL, REST, or natural language, normalizes them into an intermediate representation, and decomposes them into subqueries, which are executed in parallel and merged by a fusion and alignment layer performing entity resolution, schema alignment, deduplication, and provenance tracking. A ranking framework combines BM25 text relevance, semantic similarity, and graph centrality signals to produce a unified graph view. A materializer exposes the results in formats such as JSON, RDF, or property graphs for use by large language models, analytics dashboards, or enterprise services. Governance and security enforce role-based access control, masking, and audit logging; while caching and feedback loops improve efficiency and refine ranking models.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

02 September 2025

Publication Number

40/2025

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

Persistent Systems

Bhageerath, 402, Senapati Bapat Rd, Shivaji Cooperative Housing Society, Gokhale Nagar, Pune - 411016, Maharashtra, India.

Inventors

1. Mr. Nitish Shrivastava

10764 Farallone Dr, Cupertino, CA 95014-4453, United States

2. Mr. Pradeep Kumar Sharma

20200 Lucille Ave Apt 62 Cupertino CA 95014, United States.

Specification

Description:FIELD OF THE INVENTION
The present invention relates to database systems and information retrieval. More specifically, it relates to the systems and methods that enable unified access to enterprise data stored in different formats.

BACKGROUND OF THE INVENTION
In most organizations today, data is stored in many different places and formats. Some of it is stored neatly in relational databases, some is organized as graphs showing relationships, some exists as vectors used for similarity searches, and a lot of it sits in documents or text repositories. Each of these systems speaks its own “language,” and traditional query tools usually work with only one of them at a time. This means that when someone needs to search across all of these sources, the results are often incomplete, fragmented, or inconsistent.
For example, relational queries are excellent at retrieving structured information with precision, but they cannot capture semantic meaning or contextual similarity the way vector searches can. Graph databases reveal connections between entities, but they lack in handling the full-text queries well. As a result, organizations that rely on separate systems end up stitching results together manually or using ad-hoc integration, which is slow, inefficient, and error-prone.
With the growing demand for deeper insights and connected knowledge, there is a clear need for a smarter way to query across all these data sources at once. Therefore, there is a need for a unified query framework that can take a single high-level request, break it down into the right kind of queries for each data engine, merge the results, resolve duplicates and overlaps, and present a single, coherent knowledge graph. This would make it far easier to extract meaningful insights from complex enterprise data without being limited by the constraints of any one system.
Prior Art:
For instance, WO2023040499A1, along with its US counterpart US20240144032A1, discloses systems for generating and fusing knowledge graphs derived from heterogeneous documents. The disclosure emphasizes entity linking, schema alignment, and provenance tracking to produce a consolidated knowledge base. While this approach is valuable for combining graph fragments, it is primarily concerned with graph fusion after data extraction. It does not propose a structured orchestration layer that can receive a high-level query and decompose it into relational, graph, vector, and semantic subqueries across different engines. Nor does it disclose mechanisms for parallel query execution, hybrid scoring of results using text, semantic, and graph-based signals, or materialization of a unified query graph view for application and LLM consumption.
US2022/0075948A1 also addresses the problem of integrating knowledge graphs through entity resolution and schema alignment. However, the focus remains on cross-graph integration and document-derived graph construction. The system does not provide a polyglot query graph framework capable of unifying multiple query paradigms into one representation. It lacks the disclosure of a ranking and re-ranking pipeline that fuses heterogeneous results, as well as an integrated governance, caching, and feedback loop to enforce policies, improve performance, and refine results over time.
Although these prior arts contribute methods for graph fusion, they do not offer a complete end-to-end framework for unified querying across relational, graph, vector, and semantic paradigms. The present invention fills this gap by introducing the Polyglot Query Graph, which combines query orchestration, subquery execution, fusion and alignment, hybrid ranking, and materialization into a single scalable system designed to support enterprise-scale applications and artificial intelligence grounding.

DEFINITIONS
The expression “system” used hereinafter in this specification refers to an ecosystem comprising, but is not limited to a system with a user, input and output devices, processing unit, plurality of mobile devices, a mobile device-based application, a visualization platform, and output; and is extended to computing systems like mobile, laptops, computers, PCs, etc.
The expression “input unit” used hereinafter in this specification refers to, but is not limited to, mobile, laptops, computers, PCs, keyboards, mouse, pen drives or drives.
The expression “output unit” used hereinafter in this specification refers to, but is not limited to, an onboard output device, a user interface (UI), a display kit, a local display, a screen, a dashboard, or a visualization platform enabling the user to visualize, observe or analyse any data or scores provided by the system.
The expression “processing unit” refers to, but is not limited to, a processor of at least one computing device that optimizes the system.
The expression “polyglot query graph (PQG)” used hereinafter in this specification refers to a unified query representation that integrates results from heterogeneous query paradigms including relational, graph, vector, and semantic queries into a single coherent graph view.
The expression “query orchestration layer” refers to the system component that receives a high-level query, normalizes it into an intermediate representation, decomposes it into subqueries for different engines, and applies policy checks before dispatch.
The expression “subquery executors” refers to modules that translate subqueries into engine-specific dialects (such as SQL, Cypher, vector similarity search, or full-text search) and execute them in parallel across the respective databases.
The expression “fusion and alignment layer” refers to the component responsible for merging heterogeneous results by performing entity resolution, schema alignment, deduplication, and provenance tracking, thereby producing a unified graph.
The expression “ranking module” refers to the system component that applies hybrid scoring models, including textual relevance (BM25), semantic similarity (embeddings), and graph-based measures (such as centrality), to order and prioritize results.
The expression “materializer” refers to the component that produces the unified Polyglot Query Graph in consumable formats such as JSON, RDF, or property graphs, enabling its use by applications, analytics dashboards, or external systems.
The expression “governance” refers to mechanisms that enforce compliance and security at query time, including role-based access control (RBAC), masking of personally identifiable information (PII), and audit logging of queries and results.
The expression “caching mechanism” refers to the component of the system that stores partial query results for a defined time-to-live (TTL) to reduce latency and improve system performance for repeated queries.
The expression “feedback loop” refers to the mechanism by which client relevance feedback, whether explicit or implicit, is used to refine vector indexes, update ranking weights, and continuously improve query result accuracy.

OBJECTS OF THE INVENTION
The primary object of the present invention is to provide a system and method for unified querying of heterogenous enterprise data sources using polyglot query graph.
Another embodiment of the invention is to provide a system and method that make it possible to run a single query across different types of enterprise data sources through a polyglot query graph.
Another object of the invention is to break down high-level query requests into smaller subqueries for relational, graph, vector, and semantic engines, and to execute them in parallel so that results can be obtained quickly and efficiently.
Yet another object of the invention is to bring together the results from these different engines by performing entity resolution, schema alignment, deduplication, and provenance tracking, ensuring that the final output is consistent and accurate.
A further object of the invention is to apply a hybrid ranking process that combines text relevance, semantic similarity, and graph centrality, so that the most meaningful and useful results are presented first.
An additional object of the invention is to generate a unified query graph view that can be directly consumed by enterprise applications, analytics dashboards, or even large language models.
Another object of the invention is to build in governance and security features such as role-based access control, masking of sensitive information, and audit logging, in order to maintain compliance and trust.
A still further object of the invention is to use caching and a feedback loop that both speed up repeated queries and learn from user interactions, thereby continuously improving the performance and quality of results.

SUMMARY
Before the present invention is described, it is to be understood that the present invention is not limited to specific methodologies and materials described, as these may vary as per the person skilled in the art. It is also to be understood that the terminology used in the description is for the purpose of describing the particular embodiments only and is not intended to limit the scope of the present invention.
The present invention describes a system and method unified querying of heterogenous enterprise data sources using polyglot query graph. The system comprises an input unit, a processing unit, and an output unit, wherein the processing unit includes a query API and planner module, subquery executor module, fusion and alignment module, ranking module, materializer module, and governance, security and feedback loop module. Together, these modules enable orchestration of heterogeneous queries and deliver results in a unified graph representation.
According to an aspect of the present invention, the method for unified querying of heterogeneous enterprise data sources using a Polyglot Query Graph comprises the steps of receiving, by a processing system, a query request via a query API in GraphQL, REST, or natural language format; normalizing the query into an intermediate representation; decomposing the intermediate representation into subqueries targeting relational, graph, vector, and semantic engines; and executing the subqueries in parallel across the underlying engines to ensure efficiency and low latency.
According to an aspect of the present invention, the method further includes merging the subquery results into a unified query graph by performing entity resolution, schema alignment, deduplication, and provenance tracking. The merged results are then ranked using a hybrid scoring model that combines BM25 for text relevance, semantic similarity from embeddings, and graph centrality for structural importance. The ranked results are materialized into a unified graph representation in formats such as JSON, RDF, or property graph, making them accessible for downstream consumption.
According to an aspect of the present invention, the method further includes outputting the unified query graph for use by applications such as large language models, analytics dashboards, or enterprise services. The system applies governance and security policies at query time, including role-based access control and masking of sensitive information. A caching mechanism stores partial results for repeated queries to improve efficiency, while audit logs of all queries and results are maintained together with provenance to ensure compliance. A continuous feedback loop captures client relevance signals and uses them to update the vector index and ranking models, thereby refining results over time.

BRIEF DESCRIPTION OF DRAWINGS
A complete understanding of the present invention may be made by reference to the following detailed description which is to be taken in conjugation with the accompanying drawing. The accompanying drawing, which is incorporated into and constitutes a part of the specification, illustrates one or more embodiments of the present invention and, together with the detailed description, it serves to explain the principles and implementations of the invention.
FIG. 1. illustrates the structural and functional components of the system.
FIG. 2. illustrates the high-level system architecture showing the query API and planner, subquery executors, fusion and alignment, ranking, materializer, and governance modules.
FIG. 3. illustrates the query decomposition and execution pipeline showing normalization into an intermediate representation and generation of subqueries for relational, graph, vector, and semantic engines.
FIG. 4. illustrates the fusion and alignment process performing entity resolution, schema alignment, deduplication, and provenance tracking.
FIG. 5. illustrates the ranking framework combining BM25, semantic similarity, and graph centrality with configurable weights.
FIG. 6. illustrates the materialized unified graph for the consumption by downstream applications in JSON, RDF, or property graph formats.
FIG. 7. illustrates the governance, caching, and feedback loop showing access control, masking, audit logging, caching, and model updates from client feedback.

DETAILED DESCRIPTION OF INVENTION:
Before the present invention is described, it is to be understood that this invention is not limited to methodologies described, as these may vary as per the person skilled in the art. It is also to be understood that the terminology used in the description is for the purpose of describing the particular embodiments only and is not intended to limit the scope of the present invention. Throughout this specification, the word “comprise”, or variations such as “comprises” or “comprising”, will be understood to imply the inclusion of a stated element, integer or step, or group of elements, integers or steps, but not the exclusion of any other element, integer or step, or group of elements, integers or steps. The use of the expression “at least” or “at least one” suggests the use of one or more elements or ingredients or quantities, as the use may be in the embodiment of the invention to achieve one or more of the desired objects or results. Various embodiments of the present invention are described below. It is, however, noted that the present invention is not limited to these embodiments, but rather the intention is that modifications that are apparent are also included.
The present invention describes a system and method for unified querying of heterogenous enterprise data sources using polyglot query graph; for unifying heterogeneous enterprise data queries across relational, graph, vector, and semantic paradigms into a single query graph representation, referred to as the Polyglot Query Graph (PQG). The system provides an orchestration layer, query decomposition, subquery execution, fusion and alignment, ranking, materialization, and governance mechanisms. In doing so, the system enables improved query expressivity, accuracy, and scalability across enterprise datasets of large size and diverse structure.
According to the embodiment of the present invention, as illustrated in FIG. 1, the system (100) comprises an input unit (10), a processing unit (20), and an output unit (30), wherein the processing unit (20) further comprises a query API and planner module (21), a subquery executors module (22), a fusion and alignment module (23), a ranking module (24), a materializer module (25), and a governance, security and feedback loop module (26). The orchestration provided by these modules enables unified access to heterogeneous enterprise data sources and delivers results in a unified query graph.
According to the embodiment of the present invention, as illustrated in FIG. 2, the query API and planner module (21) accepts high-level queries in formats such as GraphQL, REST, or natural language. All received queries are normalized into an internal intermediate representation (IR), thereby converting diverse query inputs into a standard form. The planner decomposes the intermediate representation into subqueries for relational, graph, vector, and semantic engines, while applying policy checks through governance before dispatching subqueries. This module therefore ensures query normalization, secure decomposition, and controlled execution planning.
According to the embodiment of the present invention, the subquery executor module (22) is responsible for translating subqueries into engine-specific dialects. Relational subqueries are expressed in SQL for relational databases such as Postgres or Oracle. Graph subqueries are expressed in Cypher or PGQL for property graph stores. Vector subqueries are executed as approximate nearest neighbor queries such as HNSW or IVF. Semantic subqueries include full-text or neural retrieval methods such as BM25 or transformer-based retrievers. As illustrated in FIG. 2, these subqueries are executed in parallel across the underlying engines, thereby ensuring efficiency, scalability, and low latency in query responses.
According to the embodiment of the present invention, as illustrated in FIG. 4, the fusion and alignment module (23) merges heterogeneous results into a single query graph. This process includes entity resolution, schema alignment, deduplication, and provenance tracking. Entity resolution merges duplicate across engines, for example mapping “cust_id=123” in SQL to “Customer:123” in a graph or to a vector neighbor. Schema alignment maps attributes from different sources into a unified ontology. Deduplication eliminates overlaps and ensures unique representations across systems. Provenance tracking maintains links back to the source subquery results, thereby enabling auditability and traceability of the unified query graph.
According to the embodiment of the present invention, as illustrated in FIG. 5, the ranking module (24) applies a hybrid scoring model including signals such as BM25 for text relevance, semantic similarity derived from embeddings, and graph centrality for structural importance. These signals are combined using configurable fusion weights to produce a ranked unified graph view. By combining these heterogeneous scoring methods, the system ensures that the most relevant, semantically aligned, and structurally significant results are prioritized for output.
According to the embodiment of the present invention, as illustrated in FIG. 6, the materializer module (25) exposes the unified query graph for consumption by downstream applications. Materialization module (25) supports multiple formats, including JSON, RDF, or property graph representations. The unified graph can be fine-tuned by large language models (LLMs) for retrieval-augmented generation, enterprise analytics dashboards, or application services such as recommendation systems and fraud detection. This module therefore enables the integration of relational, semantic, and contextual knowledge into a single consumable graph representation.
According to the embodiment of the present invention, as illustrated in FIG. 7, the governance, security, and feedback loop module (26) provides integrated compliance and optimization features; wherein security includes role-based access control, row/column-level policies, and masking of personally identifiable information; a cache mechanism stores partial results with a time-to-live (TTL) to reduce latency for repeated queries; a feedback loop captures client relevance signals, both implicit and explicit, and feeds them into vector re-indexing and ranker fine-tuning. Provenance and audit logs are maintained for all queries and results, thereby ensuring compliance, accountability, and governance across enterprise environments.
According to a preferred embodiment of the present invention, the method for unified querying of heterogeneous enterprise data sources using a polyglot query graph comprises the following steps:
● Receiving a query request by a processing system via a query API in GraphQL, REST, or natural language format;
● normalizing the query into an internal intermediate representation (IR), thereby standardizing diverse query inputs;
● decomposing the intermediate representation into subqueries targeting relational, graph, vector, and semantic engines;
● executing the subqueries in parallel across the underlying engines to ensure efficiency and low latency;
● merging the subquery results into a unified query graph by performing entity resolution, schema alignment, deduplication, and provenance tracking;
● ranking the fused results using a hybrid scoring model combining BM25 for text relevance, semantic similarity, and graph centrality signals;
● materializing the ranked results into a unified graph representation in formats such as JSON, RDF, or property graph;
● outputting the unified query graph for consumption by downstream applications, including large language models, analytics dashboards, or application services;
● applying governance and security policies at query time, including role-based access control and masking of sensitive information;
● caching partial results for repeated queries to improve efficiency and responsiveness;
● maintaining audit logs of all queries and results together with provenance to ensure compliance; and
● updating the vector index and ranking models through a continuous feedback loop based on client relevance signals.
According to yet another embodiment, the system and method of the present invention offer significant advantages such as:
● Unified Context: Combines relational precision, graph connectivity, and semantic similarity in a single query result, ensuring that diverse enterprise data can be accessed coherently without fragmented retrieval.
● Improved Accuracy: Reduces hallucinations for large language model grounding by applying multi-paradigm fusion, thereby providing more reliable and contextually relevant results.
● Flexibility: Operates seamlessly with heterogeneous enterprise stacks without requiring schema homogenization, enabling organizations to integrate data across multiple existing systems.
● Scalability: Supports very large database workloads through parallel subquery execution and caching of partial results with time-to-live (TTL), thereby reducing latency and improving responsiveness.
● Governance-First Compliance: Enforces security policies such as role-based access control, row/column-level restrictions, and masking of sensitive information, while maintaining provenance for all results.
While considerable emphasis has been placed herein on the specific elements of the preferred embodiment, it will be appreciated that many alterations can be made and that many modifications can be made in preferred embodiment without departing from the principles of the invention. These and other changes in the preferred embodiments of the invention will be apparent to those skilled in the art from the disclosure herein, whereby it is to be distinctly understood that the foregoing descriptive matter is to be interpreted merely as illustrative of the invention and not as a limitation.
, Claims:CLAIMS:
We claim,
1. A system and method for unified querying of heterogenous enterprise data sources using polyglot query graph;
wherein the system (100) comprises an input unit (10), a processing unit (20), and an output unit (30), wherein the processing unit (20) further comprises a query API and planner module (21), a subquery executors module (22), a fusion and alignment module (23), a ranking module (24), a materializer module (25), and a governance, security and feedback loop module (26);
characterized in that:
the method for unified querying of heterogeneous enterprise data sources using a polyglot query graph comprises the steps of;
● receiving a query request by a processing system via a query API in GraphQL, REST, or natural language format;
● normalizing the query into an internal intermediate representation (IR), thereby standardizing diverse query inputs;
● decomposing the intermediate representation into subqueries targeting relational, graph, vector, and semantic engines;
● executing the subqueries in parallel across the underlying engines to ensure efficiency and low latency;
● merging the subquery results into a unified query graph by performing entity resolution, schema alignment, deduplication, and provenance tracking;
● ranking the fused results using a hybrid scoring model combining BM25 for text relevance, semantic similarity, and graph centrality signals;
● materializing the ranked results into a unified graph representation in formats such as JSON, RDF, or property graph;
● outputting the unified query graph for consumption by downstream applications, including large language models, analytics dashboards, or application services;
● applying governance and security policies at query time, including role-based access control and masking of sensitive information;
● caching partial results for repeated queries to improve efficiency and responsiveness;
● maintaining audit logs of all queries and results together with provenance to ensure compliance; and
● updating the vector index and ranking models through a continuous feedback loop based on client relevance signals.

2. The system and method as claimed in claim 1, wherein the query API and planner module (21) accepts high-level queries in formats such as GraphQL, REST, or natural language, normalize the received queries into an internal intermediate representation (IR) thereby converting diverse query inputs into a standard form; and the planner decomposes the intermediate representation into subqueries for relational, graph, vector, and semantic engines, while applying policy checks through governance before dispatching subqueries.

3. The system and method as claimed in claim 1, wherein the subquery executor module (22) translates subqueries into engine-specific dialects; such that the relational subqueries are expressed in SQL for relational databases; the graph subqueries are expressed in Cypher or PGQL for property graph stores; the vector subqueries are executed as approximate nearest neighbor queries; and the semantic subqueries include full-text or neural retrieval methods such as BM25 or transformer-based retrievers; all executed in parallel across the underlying engines.

4. The system and method as claimed in claim 1, wherein the fusion and alignment module (23) merges heterogeneous results into a single query graph by entity resolution, schema alignment, deduplication, and provenance tracking; wherein the entity resolution merge duplicates across engines, and the schema alignment maps attributes from different sources into a unified ontology such that the deduplication eliminates overlaps and ensures unique representations across systems.

5. The system and method as claimed in claim 1, wherein the ranking module (24) applies a hybrid scoring model including signals such as BM25 for text relevance, semantic similarity derived from embeddings, and graph centrality for structural importance; such that the signals are combined using configurable fusion weights to produce a ranked unified graph view.

6. The system and method as claimed in claim 1, wherein the materializer module (25) exposes the unified query graph for consumption by downstream applications; supports multiple formats, including JSON, RDF, or property graph representations; such that the unified graph can be fine-tuned by large language models (LLMs) for retrieval-augmented generation, enterprise analytics dashboards, or application services such as recommendation systems and fraud detection.

7. The system and method as claimed in claim 1, wherein the governance, security, and feedback loop module (26) provides integrated compliance and optimization features; wherein security includes role-based access control, row/column-level policies, and masking of personally identifiable information; a cache mechanism stores partial results with a time-to-live (TTL) to reduce latency for repeated queries; a feedback loop captures client relevance signals, both implicit and explicit, and feeds them into vector re-indexing and ranker fine-tuning.

Dated this 2nd September 2025

Documents

Application Documents

#	Name	Date
1	202521083404-STATEMENT OF UNDERTAKING (FORM 3) [02-09-2025(online)].pdf	2025-09-02
2	202521083404-POWER OF AUTHORITY [02-09-2025(online)].pdf	2025-09-02
3	202521083404-FORM 1 [02-09-2025(online)].pdf	2025-09-02
4	202521083404-FIGURE OF ABSTRACT [02-09-2025(online)].pdf	2025-09-02
5	202521083404-DRAWINGS [02-09-2025(online)].pdf	2025-09-02
6	202521083404-DECLARATION OF INVENTORSHIP (FORM 5) [02-09-2025(online)].pdf	2025-09-02
7	202521083404-COMPLETE SPECIFICATION [02-09-2025(online)].pdf	2025-09-02
8	Abstract.jpg	2025-09-26
9	202521083404-FORM-9 [26-09-2025(online)].pdf	2025-09-26
10	202521083404-FORM 18 [01-10-2025(online)].pdf	2025-10-01