Abstract: The invention describes a system and method for constructing a cross-project knowledge graph for multi-domain artifact integration. The domain agnostic system and method for constructing a persistent cross project knowledge graph integrates requirements, code, design modules, tickets, documentation, telemetry, and past releases across multiple projects. The system introduces Adaptive Embedding Fusion, entropic optimal transport for cross domain ontology alignment, causal graph analytics, risk aware survival modelling, and bandit/RL delivery policies. A multi objective score combines similarity, quality, risk, cost, and compliance to rank reuse candidates. Portfolio optimization selects high ROI recommendations under constraints, while generative models synthesize artifacts and workflows subject to safety and compliance. The invention reduces redundancy, accelerates delivery, and improves reliability across software, manufacturing, and other industries.
Description:FIELD OF INVENTION
The present invention relates to information technology, industrial process management and enterprise knowledge management. More specifically, it relates to a system and method for constructing a cross-project knowledge graph for multi-domain artifact integration and reuse recommendation.
BACKGROUND
A cross-project knowledge graph unifies and connects siloed data from multiple projects, offering benefits like enhanced data integration, improved search and discovery, deeper contextual understanding for artificial intelligence, faster data curation, and seamless collaboration across teams. By representing data as interconnected entities, these graphs provide a holistic, accessible, and explainable view of organizational knowledge, supporting better decision-making and driving innovation.
Organization frequency operates multiple projects within the same domain or across domains such as programming, industrial manufacturing or enterprise operations. Despite similarities in requirements, code components, design patterns and documentation, reuse across projects is minimal due to siloed information, lack of cross-project visibility and absence of a unified representation of knowledge. Existing solutions focus narrowly on within-project artifacts management (e.g., application lifecycle management tools, version control systems) or on specific domains (e.g., product lifecycle management in manufacturing).
PRIOR ARTS:
US9461876B2 discloses a system and method for providing ttx-based categorization services and a categorized commonplace of shared information. Currency of the contents is improved by a process called conjuring/concretizing wherein users' thoughts are rapidly infused into the Map. As a new idea is sought, a goal is created for a search. After the goal idea is found, a ttx is concretized and categorized. The needs met by such a Map are prior art searching, competitive environmental scanning, competitive analysis study repository management and reuse, innovation gap analysis indication, novelty checking, technology value prediction, investment area indication and planning, and product technology comparison and feature planning.
CN112541132B discloses a method based on multi-view knowledge representation. The method comprises the following steps: according to similar attributes of all items in different fields, integrating different items in a heterogeneous diagram form to form a plurality of views, taking each view as input of a diagram attention network, and obtaining initial knowledge representation of the items under each view through the diagram attention network; taking initial knowledge representation of the item under each view as input of a multi-head attention network respectively, and obtaining and integrating item representation vectors with user preferences under different views through the multi-head attention network to obtain final item representations with the user preferences; and recommending corresponding items in the target domain to the user according to the final representation of the items with the user preference and the information of the target domain.
Though the prior arts talk about knowledge graph, fuzzy concept, voting ontology crowd sourcing, cross domain recommendation based on multi-view knowledge representation. However, they do not provide a persistent, evolving knowledge graph that spans projects, a statistical and artificial intelligence driven recommendations for reuse, a generative artificial intelligence capability for suggesting novel integration, pattern, or optimizations, the workflow orchestration across projects to accelerate decision-making. This leads to wasted effort, redundant development, increased costs, and delayed delivery timelines.
DEFINITIONS
The expression “system” used hereinafter in this specification refers to an ecosystem comprising, but not limited to, system for automatically defining post-deployment success metrics with input and output devices, processing unit, plurality of mobile devices, a mobile device-based application. It is extended to computing systems like mobile phones, laptops, computers, PCs, and other digital computing devices.
The term “input unit” used hereinafter in this specification refers to, but is not limited to, mobile, laptops, computers, PCs, keyboards, mouse, pen drives or drives.
The term “processing unit” refers to the computational hardware or software that performs the data base analysis, generation of graphs, detection of dead code, processing, removal of dead code, and like. It includes servers, CPUs, GPUs, or cloud-based systems that handle intensive computations.
The term “output unit” used hereinafter in this specification refers to hardware or digital tools that present processed information to users including, but not limited to computer monitors, mobile screens, printers, or online dashboards.
The term “Artifact Integration” used hereinafter in this specification refers to the process of connecting different components, systems, or services by incorporating standardized software artifacts, such as pre-built code, configuration templates, or data models, to enable seamless data exchange and automated workflows.
The term “ontology alignment” used hereinafter in this specification refers to the process of establishing semantic correspondences between entities, such as concepts and relationships, in two or more different ontologies to identify common meanings and create a consistent knowledge base for tasks like data integration and query response.
The term “knowledge graph” used hereinafter in this specification refers to a semantic network that organizes information about real-world entities (like people, places, or concepts) as interconnected nodes and edges.
The term “orchestration” used hereinafter in this specification refers to the automated management and coordination of multiple complex tasks and systems to execute a larger, end-to-end workflow.
The term “Agnostic Operation” used hereinafter in this specification refers to a process, program, or function that is designed to be platform-agnostic, vendor-agnostic, or system-agnostic, meaning it can operate and integrate with different software environments, hardware platforms, data formats, or underlying technologies without needing specific adaptations.
OBJECTS OF THE INVENTION
The primary object of the present invention is to provide a system and method for constructing a cross-project knowledge graph for multi-domain artifact integration and reuse recommendation.
Another object of the invention is to provide a knowledge graph construction that automates ingestion pipelines extract structured and unstructured artifacts, normalize metadata and generate semantic.
Yet another object of the invention is to provide cross-project linkages that are establishing across domains using statistical similarity, graph algorithms and ontology alignment.
Yet another object of the invention is to provide artificial intelligence recommendations that reuses candidates are identified using machine learning, probabilistic modelling and generative artificial intelligence synthesis.
Yet another object of the present invention is to provide a workflow orchestration that automatically surface reuse recommendations during design, coding and release activities.
Yet another object of the invention is to provide persistence and evaluation of the graph dynamically that evolves with new projects, release cycles and organizational knowledge.
Yet another object of the invention is to provide a domain-agnostic operation that interfaces support heterogeneous artifact formats to extend beyond software to industries like manufacturing.
SUMMARY
Before the present invention is described, it is to be understood that the present invention is not limited to specific methodologies and materials described, as these may vary as per the person skilled in the art. It is also to be understood that the terminology used in the description is for the purpose of describing the particular embodiments only and is not intended to limit the scope of the present invention.
The present invention describes a system and method for constructing a cross-project knowledge graph for multi-domain artifact integration. The invention pertains to a persistent, domain agnostic knowledge graph that integrates artifacts across multiple projects including requirements, code, design modules, tickets, documentation, test results, telemetry, past releases, and other records, to drive intelligent reuse recommendations using algorithms, statistical modeling, workflow orchestration, and generative artificial intelligence.
According to an aspect of the present invention, the system introduces adaptive embedding fusion, entropic optimal transport for cross domain ontology alignment, causal graph analytics, risk aware survival modeling, and bandit/RL delivery policies. A multi objective score combines similarity, quality, risk, cost, and compliance to rank reuse candidates. Portfolio optimization selects high ROI recommendations under constraints, while generative models synthesize artifacts and workflows subject to safety and compliance. The invention reduces redundancy, accelerates delivery, and improves reliability across software, manufacturing, and other industries.
According to an aspect of the present invention, the present invention describes the Adaptive Multi-Project AI Graph for Reuse and Innovation. The Cross Project Knowledge Graph (CPKG) persistently integrates and maintains relationships among artifacts across projects and domains. The novel features of the system include: (a) a mathematically grounded multi signal similarity and risk aware reuse scoring function, (b) Adaptive Embedding Fusion (AEF) combining neural, structural, and symbolic features, (c) Cross Domain Ontology Alignment by Entropic Optimal Transport (OT), (d) Causal Graph Analytics for estimating reusability effects, (e) Bandit and RL driven delivery policies for recommendations in developer and PLM workflows, and (f) Generative Workflow Augmentation that synthesizes code, documents, and process templates subject to safety, quality, and compliance constraints.
BRIEF DESCRIPTION OF DRAWINGS
A complete understanding of the present invention may be made by reference to the following detailed description which is to be taken in conjugation with the accompanying drawing. The accompanying drawing, which is incorporated into and constitutes a part of the specification, illustrates one or more embodiments of the present invention and, together with the detailed description, it serves to explain the principles and implementations of the invention.
FIG. 1 is a block diagram of the CPKG architecture.
FIG. 2 is a workflow sequence for ingestion, scoring, and recommendation delivery.
FIG. 3 shows Adaptive Embedding Fusion and multi signal similarity.
FIG. 4 illustrates ontology alignment via entropic optimal transport.
FIG. 5 depicts causal graph analytics and effect estimation.
FIG. 6 shows bandit/RL policy for recommendation timing and target selection.
FIG. 7 plots ROI modeling and portfolio level selection under constraints.
DETAILED DESCRIPTION OF INVENTION:
Before the present invention is described, it is to be understood that this invention is not limited to methodologies described, as these may vary as per the person skilled in the art. It is also to be understood that the terminology used in the description is for the purpose of describing the particular embodiments only and is not intended to limit the scope of the present invention. Throughout this specification, the word “comprise”, or variations such as “comprises” or “comprising”, will be understood to imply the inclusion of a stated element, integer or step, or group of elements, integers or steps, but not the exclusion of any other element, integer or step, or group of elements, integers or steps. The use of the expression “at least” or “at least one” suggests the use of one or more elements or ingredients or quantities, as the use may be in the embodiment of the invention to achieve one or more of the desired objects or results. Various embodiments of the present invention are described below. It is, however, noted that the present invention is not limited to these embodiments, but rather the intention is that modifications that are apparent are also included.
The present invention describes a system and method for constructing a cross-project knowledge graph for multi-domain artifact integration. The invention pertains to a persistent, domain agnostic knowledge graph that integrates artifacts across multiple projects including requirements, code, design modules, tickets, documentation, test results, telemetry, past releases, and other records, to drive intelligent reuse recommendations using algorithms, statistical modeling, workflow orchestration, and generative artificial intelligence. The invention describes a domain agnostic system and method for constructing a persistent cross project knowledge graph that integrates requirements, code, design modules, tickets, documentation, telemetry, and past releases across multiple projects. The system introduces adaptive embedding fusion, entropic optimal transport for cross domain ontology alignment, causal graph analytics, risk aware survival modeling, and bandit/RL delivery policies. A multi objective score combines similarity, quality, risk, cost, and compliance to rank reuse candidates. Portfolio optimization selects high ROI recommendations under constraints, while generative models synthesize artifacts and workflows subject to safety and compliance. The invention reduces redundancy, accelerates delivery, and improves reliability across software, manufacturing, and other industries.
According to the embodiment of the present invention, as described in FIG. 1, the system architecture comprises of an input unit , a processing unit and output unit , wherein the processing unit further comprises of ingestion layer module, normalization and featurization module, knowledge graph store module, analytics and recommendation engine module, policy and orchestration module, and safety and compliance module.
According to the embodiment of the present invention, in the Ingestion Layer module, connectors interface with heterogeneous systems including Version Control Systems (VCS) and Application Lifecycle Management (ALM) platforms like Git, SVN, Jira, PLM/CAD/BOM (STEP, IGES), automated test benches, CI/CD pipelines, Manufacturing Execution Systems (MES) and Supervisory Control and Data Acquisition (SCADA) systems , issue trackers, documentation repositories, wiki platforms, telemetry/IoT streams, and release/traceability logs. Data streams are processed in micro-batches to balance latency and throughput. Late or out-of-order arrivals are resolved through watermarking, thereby preserving event sequence integrity.
In the Normalization and Featurization module, parsers convert raw inputs into structured fields; natural language processing (NLP) encoders transform unstructured text into embeddings; static/dynamic program analysis produces Abstract Syntax Trees (AST) and Control Flow Graphs (CFG) based features; CAD/BOM inputs are mapped into attributed graphs; telemetry and sensor data are summarized via statistical time-series descriptors and embedding models. The result is a harmonized, multi-modal feature space suitable for unified processing.
According to the embodiment of the present invention, in the Knowledge Graph Store module, the processed artifacts are persisted in a heterogeneous, temporal, and attributed graph G=(V,E,T,X), where nodes, edges, and intervals carry semantic attributes and lineage metadata. The store supports versioning, provenance queries, and temporal reasoning across evolving datasets. In the Analytics and Recommendation Engine module, the analytical modules implement similarity/risk scoring, Approximate Edit Functions (AEF) -based matching, optimal transport alignment, causal inference estimators, and graph algorithms such as centrality, clustering, and community detection. Outputs are refined through ranking and optimization routines, yielding actionable recommendations across lifecycle tasks.
According to the embodiment of the present invention in the Policy and Orchestration module, Bandit algorithms and reinforcement-learning policies govern when, where, and how recommendations are surfaced. Integration hooks allow recommendations to be embedded in Integrated Development Environments ( IDEs), code review systems, PLM approval gates, MES work instructions, and release management checklists, ensuring seamless adoption into existing workflows. In the Safety & Compliance module, ingestion and feedback are filtered for Personally Identifiable Information (PII) and Protected Health Information (PHI) signals, software license scanning is enforced using SPDX-based mechanisms, and differential privacy is applied to user feedback channels to protect sensitive usage data.
According to the embodiment of the present invention, the method for constructing a cross-project knowledge graph for multi-domain artifact integration and reuse recommendation comprises the steps of:
• ingesting heterogeneous data sources and aligning ontologies across domains to establish a coherent semantic representation;
• building or refreshing a heterogeneous, temporal, and attributed graph that encodes artifacts, dependencies, and versioning information;
• computing embeddings and features by applying natural language encoders to text, program analysis to code, graph embeddings to CAD/BOM structures, and statistical or learned models to telemetry, wherein Approximate Edit Functions (AEF) are employed for cross-domain alignment;
• scoring and ranking candidate items based on similarity, risk, and utility measures, wherein outputs are calibrated and causally adjusted to improve reliability;
• performing portfolio selection under budgetary, resource, and risk constraints using optimization techniques;
• delivering recommendations through contextual bandit or reinforcement learning policies into user environments including integrated development environments, code review portals, product lifecycle management gates, and manufacturing execution instructions;
• executing generative augmentation to propose alternative solutions or variants while preserving domain-specific constraints;
• verifying selected or augmented items through automated tests, simulations, and compliance checks;
• capturing user feedback from acceptance, rejection, or modification of recommendations and operational outcomes; and
• updating policy parameters and model weights in an online manner, wherein gate coefficients regulate feature contributions and weights adjust scoring and ranking models.
According to the embodiment of the present invention, the present invention describes an Adaptive Multi-Project AI Graph for Reuse and Innovation. The Cross Project Knowledge Graph (CPKG) persistently integrates and maintains relationships among artifacts across projects and domains. The novel features of the system include: (a) a mathematically grounded multi signal similarity and risk aware reuse scoring function, (b) Adaptive Embedding Fusion (AEF) combining neural, structural, and symbolic features, (c) Cross Domain Ontology Alignment by Entropic Optimal Transport (OT), (d) Causal Graph Analytics for estimating reusability effects, (e) Bandit and RL driven delivery policies for recommendations in developer and PLM workflows, and (f) Generative Workflow Augmentation that synthesizes code, documents, and process templates subject to safety, quality, and compliance constraints. The key aspects include:
• Knowledge Graph Construction: Automated ingestion pipelines extract structured and unstructured artifacts, normalize metadata, and generate semantic embeddings.
• Cross-Project Linking: Relationships are established across domains using statistical similarity, graph algorithms, and ontology alignment.
• AI-Driven Recommendations: Reuse candidates (e.g., code modules, design templates, workflow patterns) are identified using machine learning, probabilistic modeling, and generative AI synthesis.
• Workflow Orchestration: Intelligent workflows automatically surface reuse recommendations during design, coding, and release activities.
• Persistence and Evolution: The graph dynamically evolves with new projects, release cycles, and organizational knowledge.
• Domain-Agnostic Operation: Interfaces support heterogeneous artifact formats (source code, CAD files, tickets, compliance reports) to extend beyond software to industries like manufacturing.
According to the embodiment of the present invention, the generative workflow augmentation of the present invention comprises multiple modes. Constrained generation is implemented as RAG over a CPKG subgraph, with decoding subject to constraints including license and safety, and with a penalty applied in the objective function if a constraint is violated. Program synthesis operates by suggesting refactoring diffs, verifying the suggested changes via tests, and deriving confidence levels from test pass probabilities. Process templates (manufacturing) are generated in the form of work instructions, validated against tolerance and specification nodes, and further simulated to estimate takt time improvements.
According to embodiment of the present invention, the domain agnostic extensions include software artifacts such as modules, APIs, tests, and CI logs; manufacturing artifacts including CAD parts, BOM, routings, work instructions, SPC charts, and failure modes (FMEA), with reuse of fixtures and process steps; and healthcare/finance artifacts comprising templates, controls, and models, with risk and compliance nodes (HIPAA/SOX) integrated into scoring D. Complexity and deployment considerations include AEF fusion, Sinkhorn OT per iteration with GPU batching, graph operations executed via sparse kernels, and streaming updates that amortize computational costs. The system is deployed as microservices with transactional graph snapshots to enable reproducible decisions.
, Claims:We claim,
1. A system and method for constructing a cross-project knowledge graph for multi-domain artifact integration
characterized in that
the agnostic knowledge graph integrates artifacts across multiple projects including requirements, code, design modules, tickets, documentation, test results, telemetry, past releases, to drive intelligent reuse recommendations using algorithms, statistical modeling, workflow orchestration, and generative artificial intelligence;
the system architecture comprises of an input unit , a processing unit and output unit , wherein the processing unit comprises of ingestion layer module, normalization and featurization module, knowledge graph store module, analytics and recommendation engine module, policy and orchestration module, and safety and compliance module;
and the method for constructing a cross-project knowledge graph for multi-domain artifact integration and reuse recommendation comprises the steps of:
• ingesting heterogeneous data sources and aligning ontologies across domains to establish a coherent semantic representation;
• building or refreshing a heterogeneous, temporal, and attributed graph that encodes artifacts, dependencies, and versioning information;
• computing embeddings and features by applying natural language encoders to text, program analysis to code, graph embeddings to CAD/BOM structures, and statistical or learned models to telemetry, wherein Approximate Edit Functions (AEF) are employed for cross-domain alignment;
• scoring and ranking candidate items based on similarity, risk, and utility measures, wherein outputs are calibrated and causally adjusted to improve reliability;
• performing portfolio selection under budgetary, resource, and risk constraints using optimization techniques;
• delivering recommendations through contextual bandit or reinforcement learning policies into user environments including integrated development environments, code review portals, product lifecycle management gates, and manufacturing execution instructions;
• executing generative augmentation to propose alternative solutions or variants while preserving domain-specific constraints;
• verifying selected or augmented items through automated tests, simulations, and compliance checks;
• capturing user feedback from acceptance, rejection, or modification of recommendations and operational outcomes; and
• updating policy parameters and model weights in an online manner, wherein gate coefficients regulate feature contributions and weights adjust scoring and ranking models.
2. The system and method as claimed in claim 1, wherein in the Ingestion Layer module, connectors interface with heterogeneous systems including Version Control Systems and Application Lifecycle Management platforms like Git, SVN, Jira, PLM/CAD/BOM, automated test benches, CI/CD pipelines, Manufacturing Execution Systemsand Supervisory Control and Data Acquisition systems, issue trackers, documentation repositories, wiki platforms, telemetry/IoT streams, and release/traceability logs.
3. The system and method as claimed in claim 1, wherein in the Normalization and Featurization module, parsers convert raw inputs into structured fields; natural language processing encoders transform unstructured text into embeddings; static/dynamic program analysis produces Abstract Syntax Trees and Control Flow Graphs based features; CAD/BOM inputs are mapped into attributed graphs; telemetry and sensor data are summarized via statistical time-series descriptors and embedding models to obtain a harmonized, multi-modal feature space suitable for unified processing.
4. The system and method as claimed in claim 1, wherein in the Knowledge Graph Store module, the processed artifacts are persisted in a heterogeneous, temporal, and attributed graph, where nodes, edges, and intervals carry semantic attributes and lineage metadata and the store supports versioning, provenance queries, and temporal reasoning across evolving datasets.
5. The system and method as claimed in claim 1, wherein in the Analytics and Recommendation Engine module, the analytical modules implement similarity/risk scoring, Approximate Edit Functions based matching, optimal transport alignment, causal inference estimators, and graph algorithms such as centrality, clustering, and community detection and outputs are refined through ranking and optimization routines, yielding actionable recommendations across lifecycle tasks.
6. The system and method as claimed in claim 1, wherein in the Policy and Orchestration module, bandit algorithms and reinforcement-learning policies govern when, where, and how recommendations are surfaced. Integration hooks allow recommendations to be embedded in Integrated Development Environments, code review systems, PLM approval gates, MES work instructions, and release management checklists, ensuring seamless adoption into existing workflows.
7. The system and method as claimed in claim 1, wherein in the Safety & Compliance module, ingestion and feedback are filtered for Personally Identifiable Information and Protected Health Information signals, software license scanning is enforced using SPDX-based mechanisms, and differential privacy is applied to user feedback channels to protect sensitive usage data.
8. The system and method as claimed in claim 1, wherein the system include mathematically grounded multi signal similarity and risk aware reuse scoring function, Adaptive Embedding Fusion combining neural, structural, and symbolic features, Cross Domain Ontology Alignment by Entropic Optimal Transport, Causal Graph Analytics for estimating reusability effects, Bandit and RL driven delivery policies for recommendations in developer and PLM workflows, and Generative Workflow Augmentation that synthesizes code, documents, and process templates subject to safety, quality, and compliance constraints.
9. The system and method as claimed in claim 1, wherein the system reduces redundancy, accelerates delivery, and improves reliability across software, manufacturing, and other industries.
| # | Name | Date |
|---|---|---|
| 1 | 202521083409-STATEMENT OF UNDERTAKING (FORM 3) [02-09-2025(online)].pdf | 2025-09-02 |
| 2 | 202521083409-POWER OF AUTHORITY [02-09-2025(online)].pdf | 2025-09-02 |
| 3 | 202521083409-FORM 1 [02-09-2025(online)].pdf | 2025-09-02 |
| 4 | 202521083409-FIGURE OF ABSTRACT [02-09-2025(online)].pdf | 2025-09-02 |
| 5 | 202521083409-DRAWINGS [02-09-2025(online)].pdf | 2025-09-02 |
| 6 | 202521083409-DECLARATION OF INVENTORSHIP (FORM 5) [02-09-2025(online)].pdf | 2025-09-02 |
| 7 | 202521083409-COMPLETE SPECIFICATION [02-09-2025(online)].pdf | 2025-09-02 |
| 8 | 202521083409-FORM-9 [26-09-2025(online)].pdf | 2025-09-26 |
| 9 | 202521083409-FORM 18 [01-10-2025(online)].pdf | 2025-10-01 |
| 10 | Abstract.jpg | 2025-10-08 |