Abstract: The invention describes a system and method for policy-aware dynamic data pipeline orchestration using a knowledge graph. The system comprises of a request processor; a policy-aware knowledge graph module; a dynamic pipeline assembly module; visualization and auditing module, learning and optimization module and a workflow execution engine. The method comprising the steps of receiving a data request specifying user, sources, and purpose; traversing the graph to validate access permissions; extracting required transformations; dynamically assembling a data pipeline; and executing said pipeline via a workflow engine. The system helps in dynamically constructing and executing data pipelines that strictly adhere to an organization’s data access policies. The knowledge graph serves as the core intelligence layer, encoding entities such as users, roles, policies, data sources, schemas, and required transformations. The knowledge graph contains nodes for data sources, users, roles, schemas, policies, and transformation.
Description:FIELD OF INVENTION
The present invention relates to software development. More specifically, it relates to a system and method for orchestrating data pipelines based on organizational data access policies using a knowledge graph.
BACKGROUND
Organizations often operate with multiple data sources due to various factors, including different departments using separate systems, legacy systems and the need to access data from external partners or APIs, which collectively allow for a more comprehensive and accurate analysis of information. Conventional data pipeline systems are often not equipped to handle this level of complexity effectively. These systems typically rely on manually constructed workflows, which can be inflexible and time-consuming to maintain or adapt to new requirements.
Furthermore, they lack dynamic policy enforcement capabilities, making it difficult to consistently apply data governance rules and ensure compliance with internal policies or regulatory standards. Another major limitation is their inability to support real-time orchestration of data flows. As organizational roles, user permissions, and business priorities evolve, static pipelines fail to adapt accordingly. This rigidity hampers the agility of modern enterprises that require data systems to be responsive and context-aware. As a result, there's a growing need for intelligent, policy-driven data orchestration platforms that can dynamically manage data access and processing in real time, aligning with changing organizational needs and ensuring secure, efficient, and compliant data operations.
The present invention addresses this gap by introducing a dynamic and minimal human intervention orchestration engine powered by a knowledge graph that represents users, roles, policies, data sources, transformations, and workflows. The invention discloses how data sources integration, access control enforcement, and transformation logic can be dynamically constructed, executed, and monitored via intelligent traversal and reasoning over a graph- based data policy model.
PRIOR ART
US patent document US20230094742A1 describes system Systems and methods for integrated orchestration of machine learning operations. The similarity between both the inventions is that both utilize knowledge graphs to manage metadata and dynamically construct pipelines however our invention uses them to enforce organizational policies for data governance and compliance while the referenced invention optimizes AI pipelines by selecting pre-trained models based on task requirements.
US patent document US11178182B2 describes an automated access control management for computing systems. The present invention relates to relates to computer security and specifically, to the automation, verification and management of access control mechanisms for computer infrastructure including distributed computing infrastructure. Our invention introduces a novel method for dynamically constructing and executing data pipelines that strictly adhere to an organization’s data access policies. A knowledge graph serves as the core intelligence layer, encoding entities such as users, roles, policies, data sources, schemas, and required transformations. The similarity between both the inventions is that both systems utilize knowledge graphs for policy representation and reasoning.
CN patent document CN112565193B describes a network security policy conflict decomposition method, system, storage medium and device. The present invention belongs to the technical field of software defined network security, and particularly relates to a network security policy conflict resolution method, a system, a storage medium and equipment. Our invention introduces a novel method for dynamically constructing and executing data pipelines that strictly adhere to an organization’s data access policies. A knowledge graph serves as the core intelligence layer, encoding entities such as users, roles, policies, data sources, schemas, and required transformations. The similarity between both the inventions is that both systems utilize knowledge graphs for policy representation and reasoning. Both inventions utilize knowledge graphs and structured policy representations to address conflicts in dynamic systems. Our invention introduces a novel method for dynamically constructing and executing data pipelines that strictly adhere to an organization’s data access policies. A knowledge graph serves as the core intelligence layer, encoding entities such as users, roles, policies, data sources, schemas, and required transformations.
DEFINITIONS:
The expression “system” used hereinafter in this specification refers to an ecosystem comprising, but is not limited to a system with a user, input and output devices, processing unit, plurality of mobile devices, a mobile device-based application to identify dependencies and relationships between diverse businesses, a visualization platform, and output; and is extended to computing systems like mobile, laptops, computers, PCs, etc.
The expression “input unit” used hereinafter in this specification refers to, but is not limited to, mobile, laptops, computers, PCs, keyboards, mouse, pen drives or drives.
The expression “output unit” used hereinafter in this specification refers to, but is not limited to, an onboard output device, a user interface (UI), a display kit, a local display, a screen, a dashboard, or a visualization platform enabling the user to visualize, observe or analyse any data or scores provided by the system.
The expression “processing unit” refers to, but is not limited to, a processor of at least one computing device that optimizes the system.
The expression “knowledge graph” used hereinafter in this specification refers to a knowledge base that uses a graph-structured data model or topology to represent and operate on data. Knowledge graphs are often used to store interlinked descriptions of entities – objects, events, situations or abstract concepts – while also encoding the free-form semantics or relationships underlying these entities.
OBJECTS OF THE INVENTION
The primary object of the invention is to provide a system and method for policy-aware dynamic data pipeline orchestration using a knowledge graph.
Further object of the invention is to provide a system and method that discloses how data source integration, access control enforcement, and transformation logic can be dynamically constructed, executed, and monitored via intelligent traversal and reasoning over a graph-based data policy model.
Another object of the invention is to provide a system and method that introduces dynamic and automated orchestration engine powered by a Knowledge Graph (KG) that represents users, roles, policies, data sources, transformation, and workflows.
SUMMARY
Before the present invention is described, it is to be understood that the present invention is not limited to specific methodologies and materials described, as these may vary as per the person skilled in the art. It is also to be understood that the terminology used in the description is for the purpose of describing the particular embodiments only and is not intended to limit the scope of the present invention.
The present invention describes a system and method for policy-aware dynamic data pipeline orchestration using a knowledge graph. It describes a system and method for orchestrating data pipelines dynamically based on organizational data access policies using a knowledge graph. The invention discloses how data source integration, access control enforcement, and transformation logic can be dynamically constructed, executed, and monitored via intelligent traversal and reasoning over a graph-based data policy model.
According to an aspect of the present invention, the system comprises of an input unit, a processing unit configured to execute instructions, a memory storing invoice data, rate cards, and usage logs; and output unit, wherein the processing unit further comprises of a request processor; a policy-aware knowledge graph module; a dynamic pipeline assembly module; visualization and auditing module, learning and optimization module and a workflow execution engine.
According to an aspect of the present invention, when a data request is received in the system, the system traverses the knowledge graph to identify the user’s role and associated permissions, validates if the user can access the requested data sources, identifies transformation obligations from the graph, dynamically builds a pipeline incorporating necessary extraction, transformation, and loading steps and executes and logs the workflow while ensuring compliance with organizational rules.
BRIEF DESCRIPTION OF DRAWINGS
A complete understanding of the present invention may be made by reference to the following detailed description which is to be taken in conjugation with the accompanying drawing. The accompanying drawing, which is incorporated into and constitutes a part of the specification, illustrates one or more embodiments of the present invention and, together with the detailed description, it serves to explain the principles and implementations of the invention.
FIG. 1 illustrates the workflow of the method of the present invention;
FIG. 2 illustrates the graph substructure of the present invention; and
FIG.3 illustrates a system architecture flow diagram showing key components as described in the present invention.
DETAILED DESCRIPTION OF INVENTION:
Before the present invention is described, it is to be understood that this invention is not limited to methodologies described, as these may vary as per the person skilled in the art. It is also to be understood that the terminology used in the description is for the purpose of describing the particular embodiments only and is not intended to limit the scope of the present invention. Throughout this specification, the word “comprise”, or variations such as “comprises” or “comprising”, will be understood to imply the inclusion of a stated element, integer or step, or group of elements, integers or steps, but not the exclusion of any other element, integer or step, or group of elements, integers or steps. The use of the expression “at least” or “at least one” suggests the use of one or more elements or ingredients or quantities, as the use may be in the embodiment of the invention to achieve one or more of the desired objects or results. Various embodiments of the present invention are described below. It is, however, noted that the present invention is not limited to these embodiments, but rather the intention is that modifications that are apparent are also included.
The present invention describes a system and method for policy-aware dynamic data pipeline orchestration using a knowledge graph. It describes a system and method for orchestrating data pipelines dynamically based on organizational data access policies using a knowledge graph. The invention discloses how data source integration, access control enforcement, and transformation logic can be dynamically constructed, executed, and monitored via intelligent traversal and reasoning over a graph-based data policy model. The invention introduces a novel method for dynamically constructing and executing data pipelines that strictly adhere to an organization’s data access policies. A knowledge graph serves as the core intelligence layer, encoding entities such as users, roles, policies, data sources, schemas, and required transformations.
According to the embodiment of the present invention, as described in FIG. 3, the system comprises of an input unit, a processing unit configured to execute instructions, a memory storing invoice data, rate cards, and usage logs; and output unit, wherein the processing unit further comprises of a request processor; a policy-aware knowledge graph module; a dynamic pipeline assembly module; visualization and auditing module, learning and optimization module and a workflow execution engine.
According to the embodiment of the present invention, as described in FIG.1, when a data request is received in the system, the system:
1. Traverses the knowledge graph to identify the user’s role and associated permissions.
2. Validates if the user can access the requested data sources.
3. Identifies transformation obligations (e.g., masking sensitive data) from the graph.
4. Dynamically builds a pipeline incorporating necessary extraction, transformation, and loading steps.
5. Executes and logs the workflow while ensuring compliance with organizational rules.
According to the embodiment of the present invention, the Knowledge Graph module of the system consists of nodes (entities) and edges (relationships). The knowledge graph contains nodes for data sources, users, roles, schemas, policies, and transformations. The transformation requirements are inferred from knowledge graph edges. The nodes are :
• User: Entity representing request initiator.
• Role: Defines access tier (e.g., Analyst, Engineer).
• Data Source: Structured or unstructured repository (e.g., PostgreSQL, API).
• Data Schema: Columns, fields, tables.
• Policy Rule: Access conditions and transformation requirements.
• Transformation: ETL actions (masking, join, filter).
• Workflow: Historical or reusable pipeline object.
The edges (Relationships) are:
• HAS_ROLE(User, Role)
• CAN_ACCESS(Role, Data Source)
• CONTAINS(Data Source, Data Schema)
• REQUIRES_TRANSFORMATION(Data Schema, Transformation)
• USES(Workflow, Data Source)
• CONTAINS_POLICY(Data Source, Policy Rule)
According to the embodiment of the present invention, the Dynamic Pipeline Assembly functions in the following manner:
• Extract data from cloud storage- Salesforce (CustomerEmail, CustomerChurn)
• Apply transformation: Mask(CustomerEmail)
• Extract data from Billing database (CustomerSpend)
• Join datasets on CustomerID
• Load into FeatureStore
Pipeline Representation:
pipeline = [
Extract(source='Salesforce', fields=['CustomerEmail', 'CustomerChurn']),
Mask(field='CustomerEmail'),
Extract(source='BillingDB', fields=['CustomerSpend']),
Join(on='CustomerID'),
Load(target='FeatureStore')
]
According to the embodiment of the present invention, the visualization and auditing module generates audit logs from traversed graph, shows compliance with data privacy policies and tags and versions the workflow for future reuse. As described in FIG.2, is an Example Subgraph for Audit (Mermaid)
graph TD
U[User: Alice] --> R[Role: Data Scientist]
R --> S[Salesforce]
R --> B[BillingDB]
CE[CustomerEmail] --> T[REQUIRES_TRANSFORMATION] --> M[Masking]
According to the embodiment of the present invention, the learning and optimization module embeds nodes using Node2Vec to find similar workflows, Cache frequent request pipelines in the knowledge graph, and uses what-if simulation to test impact of new policy rules.
According to the embodiment of the present invention, the system also comprises of a mechanism for logging and versioning generated workflows in the knowledge graph and the system uses graph-based embeddings to recommend reusable pipeline components.
According to the embodiment of the present invention, the system and method for policy-aware dynamic data pipeline orchestration using a knowledge graph has the following novel features:
• Graph-Driven Policy Enforcement: A knowledge graph is used as a live enforcement layer to determine access and transformations, enabling real-time compliance.
• Transformation Inference via Graph Traversal: Required transformations (e.g., anonymization) are dynamically identified by traversing edges like REQUIRES_TRANSFORMATION in the graph.
• Dynamic Pipeline Assembly Per User Context: Pipelines are not static; they are constructed on-demand based on user roles, purposes, and policy edges in the graph.
• Integrated Access Validation During Pipeline Construction: Unlike traditional systems, access checks happen in line with pipeline generation, allowing seamless policy enforcement.
• Reusable Workflows Embedded in Knowledge Graph: Every pipeline execution is recorded as a reusable Workflow node, enabling traceability and optimization.
• Embedding-Based Recommendations: Use of graph embedding techniques (e.g., Node2Vec) to recommend similar pipelines or reusable ETL components.
• Automated Audit Graphs and Visual Explanations: The graph structure enables auto-generation of audit trails and human-readable subgraphs explaining why specific transformations were applied.
• What-If Policy Simulation: Policy simulation through temporary graph traversal allows organizations to preview the effect of new policy rules without deploying them.
• No-Code Policy Modification via Graph UI: Policies can be changed or extended by adding/editing graph nodes and edges, removing the need for code changes.
• Full Graph-Based Lineage Tracking: Every data movement and transformation is traceable through graph edges, enabling full lineage tracking.
According to the embodiment of the present invention, the method for policy-aware dynamic data pipeline orchestration using a knowledge graph, comprising the steps of:
• receiving a data request specifying user, sources, and purpose;
• traversing the graph to validate access permissions;
• extracting required transformations;
• dynamically assembling a data pipeline;
• executing said pipeline via a workflow engine
According to the embodiment of the present invention, the algorithm for policy-aware pipeline orchestration using a knowledge graph comprises of :
Inputs:
• Request = { requester_id, purpose, required_sources[] }
• Knowledge Graph = Graph containing nodes (Users, Roles, DataSources, Schemas, Policies, Transformations, Workflows)
Outputs:
• Executable Pipeline composed of ETL ( Extract, Transform, Load) steps
Step 1: Parse Request: Identify requester (e.g., Alice) and required data sources (e.g., Salesforce, BillingDB).
Step 2: Access Validation
• Traverse HAS_ROLE edge from requester to Role node.
• Traverse CAN_ACCESS edges from Role to DataSources.
• If any required data source lacks a CAN_ACCESS edge, deny access and abort.
Example:
• Alice → HAS_ROLE → Data Scientist
• Data Scientist → CAN_ACCESS → Salesforce, BillingDB → ✅ Proceed
Step 3: Identify Schema and Transformations
• For each DataSource, find its CONTAINS → Schema nodes.
• Traverse REQUIRES_TRANSFORMATION edges to identify policies for each schema element.
Example:
• Salesforce → CONTAINS → CustomerEmail → REQUIRES_TRANSFORMATION → Masking
• BillingDB → CONTAINS → CustomerSpend → No transformation required
Step 4: Build Pipeline Steps
1. Create Extract nodes for each accessible source.
2. For any field requiring transformation, insert appropriate ETL node (e.g., Mask).
3. Use Join, Filter, or Aggregate nodes based on request purpose.
4. Define Load destination (e.g., Feature Store, Dashboard).
Example Pipeline (in sequence):
1. Extract(CustomerEmail, CustomerChurn from Salesforce)
2. Mask(CustomerEmail)
3. Extract(CustomerSpend from BillingDB)
4. Join(on CustomerID)
5. Load(to FeatureStore)
Step 5: Execute and Monitor Pipeline
• Deploy pipeline using workflow engine (e.g., Prefect).
• Log transformation steps for auditing.
• Record entire pipeline as a new Workflow node in KG with backreferences to all involved policies.
Step 6: Audit and Reuse
• Tag workflow with metadata (user, time, sources, purpose).
• Enable queryable lineage and compliance verification.
• Use embeddings to recommend reusable pipeline fragments for similar future requests.
According to the embodiment of the present invention, the advantages of the present system and method are:
• Zero manual configuration for access validation.
• Policy compliance baked into pipeline construction.
• Traceability and repeatability of workflows.
Example:
Scenario: Alice, a data scientist, wants to access Salesforce and BillingDB data for churn prediction. The organizational policy mandates masking of customer email addresses.
Input Request:
{
"requester": "Alice",
"purpose": "Churn Prediction",
"required_sources": ["Salesforce", "BillingDB"]
}
Graph Traversal:
• Alice → Role → Data Source
• Check CAN_ACCESS for each required source
• Identify required transformations
Resulting Pipeline:
pipeline = [
Extract(source='Salesforce', fields=['CustomerEmail', 'CustomerChurn']),
Mask(field='CustomerEmail'),
Extract(source='BillingDB', fields=['CustomerSpend']),
Join(on='CustomerID'),
Load(target='FeatureStore')
]
Output: Fully executable, compliant pipeline instance that fulfils the request within the policy framework. Alice is authorized. Customer Email must be masked before use.
While considerable emphasis has been placed herein on the specific elements of the preferred embodiment, it will be appreciated that many alterations can be made and that many modifications can be made in preferred embodiment without departing from the principles of the invention. These and other changes in the preferred embodiments of the invention will be apparent to those skilled in the art from the disclosure herein, whereby it is to be distinctly understood that the foregoing descriptive matter is to be interpreted merely as illustrative of the invention and not as a limitation. , Claims:We claim,
1. A system and method for policy-aware dynamic data pipeline orchestration using a knowledge graph
characterized in that
the system comprises of an input unit, a processing unit configured to execute instructions, a memory storing invoice data, rate cards, and usage logs; and output unit, wherein the processing unit further comprises of a request processor; a policy-aware knowledge graph module; a dynamic pipeline assembly module; visualization and auditing module, learning and optimization module and a workflow execution engine;
the method for policy-aware dynamic data pipeline orchestration using a knowledge graph, comprising the steps of:
• receiving a data request specifying user, sources, and purpose;
• traversing the graph to validate access permissions;
• extracting required transformations;
• dynamically assembling a data pipeline;
• executing said pipeline via a workflow engine.
2. The system and method as claimed in claim 1, wherein when a data request is received in the system, the system traverses the knowledge graph to identify the user’s role and associated permissions, validates if the user can access the requested data sources, identifies transformation obligations from the graph, dynamically builds a pipeline incorporating necessary extraction, transformation, and loading steps and executes and logs the workflow while ensuring compliance with organizational rules.
3. The system and method as claimed in claim 1, wherein the knowledge graph module of the system consists of nodes for entities and edges for relationships; such that the nodes are for data sources, users, roles, schemas, policies, and transformations and the transformation requirements are inferred from knowledge graph edges.
4. The system and method as claimed in claim 1, wherein the dynamic pipeline assembly module extracts data from cloud storage, applies transformation such as mask data, extracts data from billing database, joins datasets on customer ID and loads into feature store.
5. The system and method as claimed in claim 1, wherein the visualization and auditing module generates audit logs from traversed graph, shows compliance with data privacy policies and tags and versions the workflow for future reuse.
6. The system and method as claimed in claim 1, wherein the learning and optimization module embeds nodes to find similar workflows, cache frequent request pipelines in the knowledge graph, and uses what-if simulation to test impact of new policy rules.
7. The system and method as claimed in claim 1, wherein the system comprises of a mechanism for logging and versioning generated workflows in the knowledge graph and the system uses graph-based embeddings to recommend reusable pipeline components.
8. The system and method as claimed in claim 1, wherein the graph structure enables auto-generation of audit trails and human-readable subgraphs explaining why specific transformations were applied.
9. The system and method as claimed in claim 1, wherein policy simulation through temporary graph traversal allows organizations to preview the effect of new policy rules without deploying them.
10. The system and method as claimed in claim 1, wherein every data movement and transformation is traceable through graph edges, enabling full lineage tracking.
| # | Name | Date |
|---|---|---|
| 1 | 202521036848-STATEMENT OF UNDERTAKING (FORM 3) [16-04-2025(online)].pdf | 2025-04-16 |
| 2 | 202521036848-POWER OF AUTHORITY [16-04-2025(online)].pdf | 2025-04-16 |
| 3 | 202521036848-FORM 1 [16-04-2025(online)].pdf | 2025-04-16 |
| 4 | 202521036848-FIGURE OF ABSTRACT [16-04-2025(online)].pdf | 2025-04-16 |
| 5 | 202521036848-DRAWINGS [16-04-2025(online)].pdf | 2025-04-16 |
| 6 | 202521036848-DECLARATION OF INVENTORSHIP (FORM 5) [16-04-2025(online)].pdf | 2025-04-16 |
| 7 | 202521036848-COMPLETE SPECIFICATION [16-04-2025(online)].pdf | 2025-04-16 |
| 8 | 202521036848-FORM-9 [26-09-2025(online)].pdf | 2025-09-26 |
| 9 | 202521036848-FORM 18 [01-10-2025(online)].pdf | 2025-10-01 |
| 10 | Abstract.jpg | 2025-10-08 |