Abstract: ABSTRACT Title: A CONTEXT-SENSITIVE REQUIREMENT ENGINE SYSTEM AND METHOD WITH GRAPH-BASED BIDIRECTIONAL TRACEABILITY A context-sensitive requirement engine system and method with graph-based bidirectional traceability; disclosing a context-sensitive requirement engine system (100) for automated extraction, normalization, and traceability of requirements using hybrid artificial intelligence. The system (100) comprises source connectors (110) configured for receiving raw artifacts from diverse sources such as emails, tickets, chats, documents, and version-controlled code; an ingestion engine (120) for performing content normalization, OCR, neural machine translation, and metadata enrichment; and an extraction module (130) incorporating a statistical NLP layer (131) and a generative AI layer (132) to generate structured requirement objects with fused confidence scores. A normalization and canonicalization engine (140) clusters duplicate spans, maps canonical requirements to ontology concepts, and standardizes metrics. A graph persistence layer (150) stores requirements, source artifacts, and code entities in a property graph with weighted edges. Change detection agents (160), a user interaction module (170), and a reinforcement learning loop (180) enable adaptive updates, user feedback integration, and continuous retraining, ensuring self-healing, context-aware requirement traceability.
Description:FIELD OF INVENTION:
The present invention relates to requirements engineering and a computer-based application development lifecycle. More specifically, it relates to a context-sensitive requirement engine system and method with graph-based bidirectional traceability using generative-AI, statistical-natural language processing (NLP), and adaptive normalization.
BACKGROUND OF THE INVENTION:
In modern software development, requirement engineering remains one of the most critical yet error prone procedures, wherein the systematic and disciplined process of software development process and also of other complex systems where conventional methods heavily relies on manual entry of requirements into repositories or management tools.
The existing tools rely on manual entry of requirements, which is error prone, time consuming, and fails to keep pace with rapid code changes. The traditional, fragmented traceability matrices such as static spreadsheets or unidirectional links rapidly become outdated as the codebase evolves. Furthermore, the conventional natural language processing (NLP) pipelines often use static rule-based parsers that cannot handle the variability of free form text across emails, chat, and legacy PDFs, thereby limiting the natural language understanding. The prior systems fail adaptive learning from user corrections, code base modifications, or domain specific vocabularies, resulting in low precision/recall over time. Consequently, organizations face high maintenance costs, poor compliance, and increased defect rates, all of which diminish ROI on software development investments.
Prior art:
US20160188299A1 is a system, methods and software product that automatically extract software design, from a recruitment document Subsequent to which the recruitments hierarchical decomposition table is generated that defines plurality of decomposition levels. Though it is similar with the present invention in terms of system architecture requirements, it lacks multisource artifact ingestion, real-time synchronization with evolving codebase, AI-driven continuous learning, property-graph persistence.
WO2009033953A2 discloses the process of refining and structuring requirement documents that are originally written in natural language There is automatic transformation of the natural language requirement specification document (RSD) to a structured requirement document (SRD) The structured inquiry document uses a template based format which clearly as defined attributes, Though the prior art comprises of automatic extraction of structured requirements from natural language document It lacks handling heterogeneous inputs like email chats tickets semantic clustering with ontology alignment generative AI augmentation dynamic graph based traceability.
US12288192B2 discloses a traceability tool for managing and monitoring the software development lifecycle using machine learning and multi-source data integration. Though it has machine learning driven software development lifecycle traceability with slash progress analytics it lacks automated requirement extraction from free form artifacts, ontology-based normalization, reinforcement-learning feedback loop, continuously evolving property-graph model.
Although the existing systems do provide software design structure from requirements, automatic extraction of structured requirements from natural language, and machine learning driven software development lifecycle traceability with slash progress analytics, but it fails to provide multisource artifact ingestion, real-time synchronization with evolving codebase, AI-driven continuous learning, property-graph persistence and handling heterogeneous inputs like email chats tickets semantic clustering with ontology alignment generative AI augmentation and dynamic Graph based traceability. Thus, there is a requirement of an optimized system that overcomes the drawbacks of the existing system.
The present invention thus provides a self-evolving, context-sensitive system that can automatically extract, normalize, and continuously synchronize requirements with the evolving codebase while reducing manual effort and ensuring reliable end-to-end traceability. The present invention thereby uses a context sensitive requirement engine that not only automates the requirement extraction and also implements reinforcement learning.
DEFINITIONS
The expression “system” used hereinafter in this specification refers to an ecosystem comprising, but is not limited to a system with a user, input and output devices, processing unit, plurality of mobile devices, a mobile device-based application to identify dependencies and relationships between diverse businesses, a visualization platform, and output; and is extended to computing systems like mobile, laptops, computers, PCs, etc.
The expression “input unit” used hereinafter in this specification refers to, but is not limited to, mobile, laptops, computers, PCs, keyboards, mouse, pen drives or drives.
The expression “output unit” used hereinafter in this specification refers to, but is not limited to, an onboard output device, a user interface (UI), a display kit, a local display, a screen, a dashboard, or a visualization platform enabling the user to visualize, observe or analyse any data or scores provided by the system.
The expression “processing unit” used hereinafter in this specification refers to, but is not limited to, a processor of at least one computing device that optimizes the system.
The expression “large language model (LLM)” used hereinafter in this specification refers to a type of machine learning model designed for natural language processing tasks such as language generation. LLMs are language models with many parameters, and are trained with self- supervised learning on a vast amount of text.
The expression “requirements engineering (RE)” used hereinafter in this specification refers to the process of defining, documenting, and managing the needs and expectations of stakeholders for a system or software.
The expression “natural language processing (NLP)” used hereinafter in this specification refers to the processing of natural language information by a computer. NLP is related to information retrieval, knowledge representation, computational linguistics, and more broadly with linguistics.
The expression “systems development life cycle (SDLC)” used hereinafter in this specification refers to the typical phases and progression between phases during the development of a computer-based system; from inception to retirement.
OBJECT OF THE INVENTION:
The primary object of the present invention provides a context-sensitive requirement engine system and method with graph-based bidirectional traceability.
Another object of the invention is to provide a context-sensitive requirement engine that automatically extracts functional and non-functional requirements from heterogeneous free-form artifacts and persists them as a property graph synchronized with an evolving codebase.
Another object of the invention is to enable bidirectional traceability between natural language artifacts and source code through edges encoding derivation, implementation, and temporal validity.
A further object of the invention is to apply a hybrid pipeline combining statistical natural language processing with a fine-tuned generative AI model, along with semantic clustering, ontology alignment, and unit standardization, to generate canonical requirement nodes.
An additional object of the invention is to provide change-detection agents that monitor version-control events, recompute edge weights, and raise drift alerts.
Yet a next object of the invention is to deliver an interactive user interface that visualizes the traceability graph and incorporates user feedback into a reinforcement-learning loop to continuously improve extraction accuracy.
Yet a further object of the invention is to capture user feedback and use it in a reinforcement-learning loop to continuously improve extraction accuracy.
Yet a next object of the invention is to enable real-time impact analysis, reduce manual effort, and deliver measurable Return of Investment (ROI) by early defect detection and automated compliance traceability.
SUMMARY:
Before the present invention is described, it is to be understood that the present invention is not limited to specific methodologies and materials described, as these may vary as per the person skilled in the art. It is also to be understood that the terminology used in the description is for the purpose of describing the particular embodiments only and is not intended to limit the scope of the present invention.
The present invention discloses a context-sensitive requirement engine system with graph-based bidirectional traceability using generative AI, statistical natural language processing (NLP), and adaptive normalization. The system comprises source connectors, an ingestion engine, an extraction module, a normalization and canonicalization engine, a graph persistence layer, change detection agents, a user interaction and feedback loop, a learning loop, and an analytics and ROI dashboard. The extraction module further incorporates a hybrid pipeline with statistical NLP, generative AI, and confidence scoring, which together enable the system to construct and maintain a self-healing, context-aware requirement knowledge graph that remains synchronized with evolving codebases and source artifacts.
The components of the system operate in a coordinated manner. The source connectors collect raw artifacts from diverse platforms such as emails, tickets, chats, documents, and code repositories, streaming them to the ingestion engine. The ingestion engine normalizes content, performs OCR and translation, and enriches artifacts with metadata before storage. The extraction module applies statistical NLP and generative AI to transform unstructured text into structured requirement objects, with confidence scoring ensuring reliability. The normalization and canonicalization engine clusters duplicates, maps requirements to ontologies, and standardizes metrics for consistency. The graph persistence layer stores requirements, artifacts, code entities, and edges in a property graph with weighted relationships. Change detection agents monitor repositories for updates and trigger incremental re-extractions, while the user interface allows visualization, feedback, and corrections. The feedback loop and learning engine fine-tune models continuously, and the analytics dashboard computes metrics on impact, drift, and ROI.
The method comprises receiving and normalizing raw artifacts, extracting requirement spans through statistical NLP, enriching and structuring them with generative AI, and computing confidence scores through a fusion mechanism. High-confidence requirements proceed through normalization and canonicalization, where semantic clustering, ontology alignment, and metric standardization are applied. The graph persistence layer then stores the refined requirements and creates traceability edges with confidence-weighted attributes. Change detection agents dynamically re-extract requirements upon repository updates, flagging drifts and updating edge weights. The user interaction module enables visualization, feedback capture, and reinforcement learning integration, while nightly retraining enhances statistical and generative models. Over time, this process yields an adaptive, continuously improving traceability graph that ensures synchronized alignment of requirements, artifacts, and code.
BRIEF DESCRIPTION OF THE DRAWINGS:
A complete understanding of the present invention may be made by reference to the following detailed description which is to be taken in conjugation with the accompanying drawing. The accompanying drawing, which is incorporated into and constitutes a part of the specification, illustrates one or more embodiments of the present invention and, together with the detailed description, it serves to explain the principles and implementations of the invention.
FIG.1. illustrates the schematic of the system architecture.
FIG. 2. illustrates high level system architecture of the context sensitive requirement engine (CS RE).
FIG 3. illustrates the method or workflow employed by the context sensitive requirement engine (CS RE).
DETAILED DESCRIPTION OF THE INVENTION:
Before the present invention is described, it is to be understood that this invention is not limited to methodologies described, as these may vary as per the person skilled in the art. It is also to be understood that the terminology used in the description is for the purpose of describing the particular embodiments only and is not intended to limit the scope of the present invention. Throughout this specification, the word “comprise”, or variations such as “comprises” or “comprising”, will be understood to imply the inclusion of a stated element, integer or step, or group of elements, integers or steps, but not the exclusion of any other element, integer or step, or group of elements, integers or steps. The use of the expression “at least” or “at least one” suggests the use of one or more elements or ingredients or quantities, as the use may be in the embodiment of the invention to achieve one or more of the desired objects or results. Various embodiments of the present invention are described below. It is, however, noted that the present invention is not limited to these embodiments, but rather the intention is that modifications that are apparent are also included.
The present invention discloses a context-sensitive requirement engine system and method with graph-based bidirectional traceability using generative-AI, statistical-natural language processing (NLP), and adaptive normalization. The said context-sensitive requirement engine system (100) comprises of one or more source connectors (110), an ingestion engine (120), an extraction module (130), a normalization and canonicalization engine (140), a graph persistence layer (150), one or more change detection agents (160), a user interaction and feedback loop (170), a learning loop (180), and an analytics and ROI dashboard (190). The extraction module (130) further comprises of a hybrid pipeline including a statistical NLP layer (131), a generative AI layer (132) and a confidence scoring (133); wherein the interaction among the components yields a self-healing, context aware requirement knowledge graph that stays synchronized with the evolving codebase and source artifacts.
In an embodiment of the invention, the source connectors (110) include adapters for email servers (IMAP/SMTP), ticketing systems (Jira, ServiceNow), chat platforms (Slack, MS Teams), document repositories (SharePoint, Confluence), and code repositories (Git) such that each connector streams raw artifacts to the ingestion engine (120); where the ingestion engine (120) normalizes MIME types, extracts embedded text via OCR (Tesseract or deep learning based OCR), performs language detection, and stores the raw text in a staging store (125).
In an embodiment of the invention, the ingestion and pre-processing module comprises a set of coordinated sub-modules that prepare heterogeneous artifacts for downstream requirement normalization; where various modules connected to a staging store (125) include:
a. a source identification layer configured for source identification; wherein each incoming artifact passes through a connector registry which registers a source type identifier (e.g., “email”, “jira ticket”, “slack message”) which preserves source context and enables domain-specific parsing rules;
b. a content extraction module, configured for content extraction wherein direct parsing is applied for digital text sources; and for PDFs and scanned images, an OCR sub-module executes a CNN-based text detector (EAST) followed by a Transformer recognizer (TrOCR), enabling robust transcription across varied layouts and noisy inputs;
c. a language normalization unit; wherein the system applies language detection (fastText), such that if the detected language is non-English, the artifact is passed through a neural machine translation (NMT) service, ensuring all artifacts are harmonized into the system’s working language (e.g. English); and
d. a metadata enrichment service; wherein the artifact is augmented before storage with timestamps, author identifiers, thread IDs, and provenance hashes, ensuring traceability and audit compliance; whereafter the enriched artifact is then persisted in the staging store (125).
In a next embodiment of the invention, the extraction module (130) provides a multi-layer hybrid pipeline that integrates statistical natural language processing (NLP) with generative AI techniques to reliably extract and structure requirements from unstructured textual sources. The module consists of three major sub-layers such as:
a. A statistical NLP layer (131) comprises of a WordPiece tokenizer configured with a 30 k sub-token vocabulary, such that it performs foundational linguistic analysis to segment and annotate the text; that begins with tokenization, which breaks down raw text into individual tokens including words, punctuation, or symbols); a feature engineering unit that generates linguistic and statistical features for each token, including: part-of-speech (POS) tag (using spaCy), dependency label (syntactic relation), character n-grams, and term frequency–inverse document frequency (TF-IDF) of the surrounding 5-token context. Further, a conditional random field (CRF) model, sequence labeling model trained on a 150 k-statement corpus, or alternatively, a BiLSTM-CRF neural model is trained on a labeled requirement corpus to identify and tag requirement corpus categorized into domain-specific classes such as annotated with requirement categories like Functional Requirements (FUNC ≈ 30 %), Non-Functional Requirements (NFR ≈ 40 %), Constraints and Risks (CONSTRAINT, RISK≈ 30 %). This step establishes the initial structured boundaries for candidate requirements in the text. The Statistical NLP Layer outputs BIO tags (begin–inside–outside format) indicating requirement spans. Each token is associated with a posterior probability (pₜ), representing statistical confidence of belonging to a requirement span.
b. Generative AI Layer (132); comprising a prompt construction module that generates a structured prompt template for each span identified by the statistical layer; such that the outputs from the statistical layer (i.e., requirement copus and their surrounding context windows) are fed into a fine-tuned Large Language Model (LLM) decoding engine (e.g. GPT-4 Turbo or domain-specialized variants of LLaMA) adapted with LoRA fine-tuning (Δ = 2 M parameters) on the same labeled corpus where the inference uses top-p sampling (p = 0.9) and low temperature (0.2) for deterministic yet context-aware outputs. The LLM use these annotated corpuses to generate a structured requirement object expressed in JSON-LD format using a structured output formatter, where each requirement object typically includes the fields like:
- id: a unique identifier for the requirement;
- type: the class of the requirement (FUNC, NFR, etc.);
- description: a natural language summary of the requirement;
- rationale: a justification or reasoning behind the requirement;
- acceptance criteria: explicit conditions that define when the requirement is satisfied; and/or
- source ref: traceability information pointing back to the original document or section; such that this layer adds semantic richness and contextual interpretation, turning statistically extracted corpuses into machine-readable, semantically grounded requirement objects.
c. Confidence Scoring (133) or confidence fusion; where
a statistical confidence calculator compute c₁ = mean(pₜ) across all tokens in a span;
a LLM confidence calculator derives c₂ = 1 – (average token log-probability normalized to [0,1]), reflecting model certainty in generated output; and
Bayesian fusion unit combines c₁ and c₂ using weighted averaging: c = (α·c₁ + β·c₂) / (α + β), where α = 0.6, β = 0.4; such that the module ensure robustness and trustworthiness of extracted requirements, where a confidence metric is computed for each generated object which is achieved by combining three complementary signals including the CRF posterior probabilities that include statistical likelihoods assigned to the identified spans; LLM token-level log probabilities that enable generative model’s internal measure of certainty over the produced text; and the calibrated Bayesian estimator which is a probabilistic calibration layer that integrates both statistical and generative signals into a normalized confidence score; resulting into a per-requirement confidence metric c ∈ [0,1], which indicates the system’s level of certainty about the correctness and completeness of the extracted requirement.
In yet another embodiment of the invention, the normalization & canonicalization engine (140) is responsible for harmonizing diverse, redundant, and heterogeneous requirement expressions into a consistent and standardized representation ensuring the requirements originating from multiple authors, sources, or formats can be compared, queried, and reasoned over without ambiguity. The model (140) operates as a multi-stage pipeline using the sub-modules such as:
a. semantic clustering (141); that identifies semantically equivalent or near-duplicate requirements expressed in varied natural language by using sentence-BERT embeddings to capture semantic meaning at the sentence level and applies HDBSCAN (hierarchical density-based spatial clustering of applications with noise) to group semantically duplicate requirements.
b. ontology mapping (142); wherein the clustered requirements are aligned to a domain-specific ontology to enforce terminological consistency and ensure compatibility with established frameworks such as ISO/IEC/IEEE 42010, ITIL, or OWL-based taxonomies; such that the alignment is achieved using graph-based semantic similarity techniques, combining Jaccard similarity (shared term overlap) with edge weight propagation (ontology structure traversal) to map free-text requirement terms
c. unit & metric standardization (143); that ensures uniform representation by employing a unit conversion microservice (e.g., Pint) to normalize quantitative attributes like heterogeneous units into a canonical form; enabling direct comparison, automated validation, and consistency across requirement sets.
For example, requirements like “The system must respond within 200 ms” and “Response time shall not exceed 0.2 seconds” are clustered together as duplicates. This reduces redundancy and provides a consolidated view of requirement intent.
In yet a next embodiment of the invention, the graph persistence layer (150) stores the requirement nodes (R), artifact nodes (A), code nodes (C), and edge types (E) (e.g., DERIVED_FROM, IMPLEMENTS, TESTED_BY, IMPACTS); such that each edge carries weight; where w = c × decay_factor (temporal decay based on code change timestamps). The graph is persisted in a property graph DB that supports ACID transactions and graph analytics such as PageRank, community detection).
In yet a next embodiment of the invention, the change detection agents (160) subscribes to Git webhooks; on each push, performs AST diff (using tree sitter) to identify added/removed/modified symbols; runs incremental re-extraction only on affected artifacts, updates edge weights, and flags drift alerts when w falls below a pre-defined threshold (e.g., 0.4).
In yet a next embodiment of the invention, the user interaction and feedback UI (170) enables graph visualizer such as D3.js or Cytoscape that renders nodes/edges with confidence heat maps; whereafter the correction panel allows users to accept, reject, or edit generated requirements. The edits are stored as feedback records (FR) and fed to a reinforcement learning (RL) trainer /loop (180). The said learning loop (180) enables supervised fine tuning, where the aggregated feedback records (FRs) form a continuously expanding labeled dataset; the LLM and CRF models are fine-tuned nightly using parameter efficient transfer learning (e.g., LoRA adapters); and enables the policy gradient RL wherein the system treats the confidence c as a reward signal; such that a policy network learns to adjust prompt engineering and sampling temperature to maximize downstream traceability accuracy.
In yet a next embodiment of the invention, the analytics & ROI dashboard (190) computes impact analysis metrics including the number of code entities linked per requirement, average confidence, and drift rate; further estimating the cost avoidance using industry benchmarks (e.g., $X per defect found early) and time to market reduction (hours saved per automated traceability query).
In yet a next preferred embodiment, the context-sensitive requirement engine system (100) employs a method with graph-based bidirectional traceability comprising the steps of;
- receiving raw artifact by the system, provided by the user from any source using at least one input device;
- identifying source type (email, ticket, chat, doc, VCS) by the source connectors (110);
- pushing artifact to the ingestion module (120) by source connectors (110);
- normalizing MIME/encoding by the ingestion module (120);
- extracting embedded text using OCR and NMT for non-English by the ingestion module (120);
- storing normalized text in the staging repository by the ingestion module (120);
- identifying requirement spans using statistical NLP (CRF) by the extraction module (130);
- passing spans to Generative AI (LLM + LoRA) by the extraction module (130);
- generating structured requirement JSON by the LLM in the extraction module (130);
- computing confidence using posterior probability from CRF, token log-prob from LLM, and Bayesian fuse using the extraction module (130);
- checking whether confidence score is greater than minimum threshold using the extraction module (130);
- proceeding to normalization and canonicalization engine (140) if above threshold is reached or discarding low-confidence extraction by the extraction module (130);
- computing sentence-BERT embeddings by the normalization and canonicalization engine (140);
- clustering embeddings using HDBSCAN by the normalization and canonicalization engine (140);
- selecting canonical requirement node by the normalization and canonicalization engine (140);
- creating ALIAS_OF edges for duplicates by the normalization and canonicalization engine (140);
- mapping canonical requirement to OWL Ontology concepts by the normalization and canonicalization engine (140);
- standardizing units using Pint micro-service normalization and canonicalization engine (140);
- opening transaction with property-graph DB by the graph persistence layer (150);
- upserting requirement node with provenance hash by the graph persistence layer (150);
- upserting source artifact node by the the graph persistence layer (150);
- upserting code entity nodes (if code symbols are present) by the graph persistence layer (150);
- creating edges using DERIVED_FROM (Req–Artifact), IMPLEMENTS (Req–Code), TESTED_BY (Req–Ontology Concept) by the graph persistence layer (150);
- computing edge weight as confidence score × decay factor by the graph persistence layer (150);
- committing transaction by the graph persistence layer (150);
- listening to VCS webhook (Git push) by the change detection agents (160),
- parsing changed files using AST diff (tree-sitter) by the change detection agents (160),
- identifying impacted code entity nodes by the change detection agents (160),
- re-running Extraction + Normalization only on changed symbols by the change detection agents (160),
- re-computing confidence score by the change detection agents (160),
- updating IMPLEMENTS edge weight with exponential decay by the change detection agents (160),
- checking if edge weight < drift threshold by the change detection agents (160),
- creating DRIFT_ALERT edge if threshold is crossed by change detection agents (160),
- notifying users via UI/Slack by the change detection agents (160),
- rendering graph with color-coded confidence and heatmaps by the user interaction and feedback UI (170);
- selecting nodes/edges for detail view by the user interaction and feedback UI (170);
- providing feedback (accept/reject/edit) by the user interaction and feedback UI (170);
- creating feedback node linked to requirement and artifact by the user interaction and feedback UI (170);
- pushing feedback to reinforcement learning loop (180) by the user interaction and feedback UI (170);
- translating feedback into reward (+1/+0.5/-1) by the reinforcement learning loop (180);
- updating LLM prompt parameters, LoRA adapter weights, and CRF feature weights via PPO agent by reinforcement learning loop (180);
- storing updated model weights in the Model Store by the reinforcement learning loop (180);
- scheduling nightly batch retraining by the reinforcement learning loop (180);
- Gathering all feedback records by the analytics & ROI dashboard (190);
- building augmented training set (original corpus + new examples) by the analytics & ROI dashboard (190);
- retraining CRF model in supervised mode by the analytics & ROI dashboard (190);
- fine-tuning LLM (LoRA) on augmented data by the analytics & ROI dashboard (190);
- validating new models on hold-out set by the analytics & ROI dashboard (190);
- persisting retrained models to the Model Store by the analytics & ROI dashboard (190).
While considerable emphasis has been placed herein on the specific elements of the preferred embodiment, it will be appreciated that many alterations can be made and that many modifications can be made in preferred embodiment without departing from the principles of the invention. These and other changes in the preferred embodiments of the invention will be apparent to those skilled in the art from the disclosure herein, whereby it is to be distinctly understood that the foregoing descriptive matter is to be interpreted merely as illustrative of the invention and not as a limitation.
, Claims:CLAIMS:
We claim,
1. A context-sensitive requirement engine system and method with graph-based bidirectional traceability;
wherein the context-sensitive requirement engine system (100) comprises of one or more source connectors (110), an ingestion engine (120), an extraction module (130) with a statistical NLP layer (131), a generative AI layer (132) and a confidence scoring (133), a normalization and canonicalization engine (140) , a graph persistence layer (150), one or more change detection agents (160), a user interaction and feedback loop (170), a learning loop (180), and an analytics and ROI dashboard (190). The extraction module (130) further comprises of a hybrid pipeline including a statistical NLP layer (131), a generative AI layer (132) and a confidence scoring (133); such that the interaction among all the components yields a self-healing, context aware requirement knowledge graph that stays synchronized with the evolving codebase and source artifacts;
characterized in that:
the context-sensitive requirement engine system (100) employs a method with graph-based bidirectional traceability comprising the steps of;
- receiving raw artifact by the system, provided by the user from any source using at least one input device;
- identifying source type (email, ticket, chat, doc, VCS) by the source connectors (110);
- pushing artifact to the ingestion module (120) by source connectors (110);
- normalizing MIME/encoding by the ingestion module (120);
- extracting embedded text using OCR and NMT for non-English by the ingestion module (120);
- storing normalized text in the staging repository by the ingestion module (120);
- identifying requirement spans using statistical NLP (CRF) by the extraction module (130);
- passing spans to Generative AI (LLM + LoRA) by the extraction module (130);
- generating structured requirement JSON by the LLM in the extraction module (130);
- computing confidence using posterior probability from CRF, token log-prob from LLM, and Bayesian fuse using the extraction module (130);
- checking whether confidence score is greater than minimum threshold using the extraction module (130);
- proceeding to normalization and canonicalization engine (140) if above threshold is reached or discarding low-confidence extraction by the extraction module (130);
- computing sentence-BERT embeddings by the normalization and canonicalization engine (140);
- clustering embeddings using HDBSCAN by the normalization and canonicalization engine (140);
- selecting canonical requirement node by the normalization and canonicalization engine (140);
- creating ALIAS_OF edges for duplicates by the normalization and canonicalization engine (140);
- mapping canonical requirement to OWL Ontology concepts by the normalization and canonicalization engine (140);
- standardizing units using Pint micro-service normalization and canonicalization engine (140);
- opening transaction with property-graph DB by the graph persistence layer (150);
- upserting requirement node with provenance hash by the graph persistence layer (150);
- upserting source artifact node by the the graph persistence layer (150);
- upserting code entity nodes (if code symbols are present) by the graph persistence layer (150);
- creating edges using DERIVED_FROM (Req–Artifact), IMPLEMENTS (Req–Code), TESTED_BY (Req–Ontology Concept) by the graph persistence layer (150);
- computing edge weight as confidence score × decay factor by the graph persistence layer (150);
- committing transaction by the graph persistence layer (150);
- listening to VCS webhook (Git push) by the change detection agents (160),
- parsing changed files using AST diff (tree-sitter) by the change detection agents (160),
- identifying impacted code entity nodes by the change detection agents (160),
- re-running Extraction + Normalization only on changed symbols by the change detection agents (160),
- re-computing confidence score by the change detection agents (160),
- updating IMPLEMENTS edge weight with exponential decay by the change detection agents (160),
- checking if edge weight < drift threshold by the change detection agents (160),
- creating DRIFT_ALERT edge if threshold is crossed by change detection agents (160),
- notifying users via UI/Slack by the change detection agents (160),
- rendering graph with color-coded confidence and heatmaps by the user interaction and feedback UI (170);
- selecting nodes/edges for detail view by the user interaction and feedback UI (170);
- providing feedback (accept/reject/edit) by the user interaction and feedback UI (170);
- creating feedback node linked to requirement and artifact by the user interaction and feedback UI (170);
- pushing feedback to reinforcement learning loop (180) by the user interaction and feedback UI (170);
- translating feedback into reward (+1/+0.5/-1) by the reinforcement learning loop (180);
- updating LLM prompt parameters, LoRA adapter weights, and CRF feature weights via PPO agent by reinforcement learning loop (180);
- storing updated model weights in the Model Store by the reinforcement learning loop (180);
- scheduling nightly batch retraining by the reinforcement learning loop (180);
- Gathering all feedback records by the analytics & ROI dashboard (190);
- building augmented training set (original corpus + new examples) by the analytics & ROI dashboard (190);
- retraining CRF model in supervised mode by the analytics & ROI dashboard (190);
- fine-tuning LLM (LoRA) on augmented data by the analytics & ROI dashboard (190);
- validating new models on hold-out set by the analytics & ROI dashboard (190);
- persisting retrained models to the Model Store by the analytics & ROI dashboard (190).
2. The system and method as claimed in claim 1, wherein the source connectors (110) include adapters for email servers, ticketing systems, chat platforms, document repositories, and code repositories such that each connector streams raw artifacts to the ingestion engine (120).
3. The system and method as claimed in claim 1, where the ingestion engine (120) normalizes MIME types, extracts embedded text via OCR, performs language detection, and stores the raw text; using various modules connected to a staging store (125) including:
a. a source identification layer configured for source identification; wherein each incoming artifact passes through a connector registry which registers a source type identifier which preserves source context and enables domain-specific parsing rules;
b. a content extraction module wherein, for digital text sources direct parsing is applied, and for PDFs and scanned images an OCR sub-module executes a CNN-based text detector (EAST) followed by a Transformer recognizer (TrOCR), enabling robust transcription across varied layouts and noisy inputs;
c. a language normalization unit; wherein the system applies language detection where if the detected language is non-English, the artifact is passed through a neural machine translation (NMT) service, ensuring all artifacts are harmonized into the system’s working language; and
d. a metadata enrichment service; wherein the artifact is augmented before storage with timestamps, author identifiers, thread IDs, and provenance hashes, ensuring traceability and audit compliance.
4. The system and method as claimed in claim 1, wherein the extraction module (130) provides a multi-layer hybrid pipeline comprising:
a. a statistical NLP layer (131) comprising a WordPiece tokenizer configured with a 30 k sub-token vocabulary, performs foundational linguistic analysis to segment, annotate the text, enables tokenization which breaks down raw text into individual tokens; a feature engineering unit that generates linguistic and statistical features for each token, including: part-of-speech tag, dependency label, character n-grams, and term frequency–inverse document frequency (TF-IDF) of the surrounding 5-token context; and a conditional random field (CRF) model, a sequence labeling model trained on a 150 k-statement corpus, or alternatively, a BiLSTM-CRF neural model trained on a labeled requirement corpus to identify and tag requirement corpus categorized into domain-specific classes such as annotated with requirement categories like Functional Requirements (FUNC ≈ 30 %), Non-Functional Requirements (NFR ≈ 40 %), Constraints and Risks (CONSTRAINT, RISK≈ 30 %); such that the Statistical NLP Layer outputs BIO tags indicating requirement spans;
b. a generative AI Layer (132) comprising a prompt construction module that generates a structured prompt template for each span identified by the statistical layer; such that the outputs from the statistical layer are fed into a fine-tuned Large Language Model (LLM) decoding engine adapted with LoRA fine-tuning (Δ = 2 M parameters) on the same labeled corpus where the inference uses top-p sampling (p = 0.9) and low temperature (0.2) for deterministic yet context-aware outputs; thereby using these annotated corpuses to generate a structured requirement object expressed in JSON-LD format using a structured output formatter, where each requirement object typically includes the fields like id, type, description, rationale, acceptance criteria, and source ref, thereby adding semantic richness and contextual interpretation, turning statistically extracted corpuses into machine-readable, semantically grounded requirement objects;
c. confidence scoring (133) or confidence fusion; where a statistical confidence calculator compute c₁ = mean(pₜ) across all tokens in a span; a LLM confidence calculator derives c₂ = 1 – (average token log-probability normalized to [0,1]), reflecting model certainty in generated output; and Bayesian fusion unit combines c₁ and c₂ using weighted averaging: c = (α·c₁ + β·c₂) / (α + β), where α = 0.6, β = 0.4; such that the module ensure robustness and trustworthiness of extracted requirements.
5. The system and method as claimed in claim 1, wherein the normalization & canonicalization engine (140) enables harmonizing diverse, redundant, and heterogeneous requirement expressions into a consistent and standardized representation ensuring the requirements originating from multiple authors, sources, or formats can be compared, queried, and reasoned over without ambiguity; operated as a multi-stage pipeline using the sub-modules such as:
a. semantic clustering (141); that identifies semantically equivalent or near-duplicate requirements expressed in varied natural language by using sentence-BERT embeddings to capture semantic meaning at the sentence level and applies HDBSCAN to group semantically duplicate requirements;
b. ontology mapping (142); wherein the clustered requirements are aligned to a domain-specific ontology to enforce terminological consistency and ensure compatibility with established frameworks; such that the alignment is achieved using graph-based semantic similarity techniques, combining Jaccard similarity with edge weight propagation to map free-text requirement terms;
c. unit & metric standardization (143); that ensures uniform representation by employing a unit conversion microservice to normalize quantitative attributes; enabling direct comparison, automated validation, and consistency across requirement sets.
6. The system and method as claimed in claim 1, wherein the graph persistence layer (150) stores the requirement nodes (R), artifact nodes (A), code nodes (C), and edge types (E) such that each edge carries weight; where w = c × decay_factor; the graph is persisted in a property graph DB that supports ACID transactions and graph analytics.
7. The system and method as claimed in claim 1, wherein the change detection agents (160) subscribe to Git webhooks; performs AST diff (using tree sitter) on each push to identify added/removed/modified symbols; runs incremental re-extraction only on affected artifacts, updates edge weights, and flags drift alerts when w falls below a pre-defined threshold.
8. The system and method as claimed in claim 1, wherein the invention, the user interaction and feedback UI (170) enables graph visualizer that renders nodes/edges with confidence heat maps; whereafter the correction panel allows users to accept, reject, or edit generated requirements further stored as feedback records (FR) and fed to a reinforcement learning (RL) trainer /loop (180).
9. The system and method as claimed in claim 1, wherein the said learning loop (180) enables supervised fine tuning, the aggregated feedback records (FRs) form a continuously expanding labeled dataset; the LLM and CRF models are fine-tuned nightly using parameter efficient transfer learning; and enables the policy gradient RL wherein the system treats the confidence c as a reward signal; such that a policy network learns to adjust prompt engineering and sampling temperature to maximize downstream traceability accuracy.
10. The system and method as claimed in claim 1, wherein the analytics & ROI dashboard (190) computes impact analysis metrics including the number of code entities linked per requirement, average confidence, and drift rate; further estimating the cost avoidance using industry benchmarks and time to market reduction.
| # | Name | Date |
|---|---|---|
| 1 | 202521083410-STATEMENT OF UNDERTAKING (FORM 3) [02-09-2025(online)].pdf | 2025-09-02 |
| 2 | 202521083410-POWER OF AUTHORITY [02-09-2025(online)].pdf | 2025-09-02 |
| 3 | 202521083410-FORM 1 [02-09-2025(online)].pdf | 2025-09-02 |
| 4 | 202521083410-FIGURE OF ABSTRACT [02-09-2025(online)].pdf | 2025-09-02 |
| 5 | 202521083410-DRAWINGS [02-09-2025(online)].pdf | 2025-09-02 |
| 6 | 202521083410-DECLARATION OF INVENTORSHIP (FORM 5) [02-09-2025(online)].pdf | 2025-09-02 |
| 7 | 202521083410-COMPLETE SPECIFICATION [02-09-2025(online)].pdf | 2025-09-02 |
| 8 | 202521083410-FORM-9 [26-09-2025(online)].pdf | 2025-09-26 |
| 9 | 202521083410-FORM 18 [01-10-2025(online)].pdf | 2025-10-01 |
| 10 | Abstract.jpg | 2025-10-08 |