System And Method For Offline External Knowledge Augmentation Via

< Back

System And Method For Offline External Knowledge Augmentation Via Intent Driven Web Curation And Knowledge Packetization

Abstract: The present invention describes a system and method for offline external knowledge augmentation via intent-driven web curation and knowledge packetization. The system comprises subsystems including Intent Extraction Algorithm as closed system , Guided Web Curation Using Knowledge Graphs as open system, Validation, Rebalancing, and Bias Correction, Knowledge Packet Generation and Archiving, and Reintegration and Inference as closed system. The method comprising receive user input in natural language within the closed system; extract intent by applying semantic parsing and embedding-based summarization; generate an intent signature; transfer the intent signature securely to the open system via a stub or intermediate enclave; expand entities and traverse knowledge graphs to identify semantically related and authoritative external content; retrieve and filter documents using probabilistic scoring functions; validate and rebalance curated content by detecting duplicates, applying entropy-based sampling, and correcting bias; compress, encrypt, and version the knowledge packets; and reintegrate the packets into the closed system.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

29 August 2025

Publication Number

40/2025

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

Persistent Systems

Bhageerath, 402, Senapati Bapat Rd, Shivaji Cooperative Housing Society, Gokhale Nagar, Pune - 411016, Maharashtra, India

Inventors

1. Mr. Nitish Shrivastava

10764 Farallone Dr, Cupertino, CA 95014-4453, United States.

2. Mr. Pradeep Kumar Sharma

20200 Lucille Ave Apt 62 Cupertino CA 95014, United States

Specification

Description:FIELD OF THE INVENTION
The present invention relates to methods for securely integrating more curated internet-derived knowledge into air-gapped environments. More specifically it refers to a structured and guided system and method of capturing user intent from an isolated system, curating relevant external data using advanced knowledge graph algorithms, validating and balancing the information, and reintegrating it into the closed system via structured, compressed knowledge packets for use in AI-based inferencing.

BACKGROUND OF THE INVENTION
Air-gapped networks are designed to safeguard critical systems, particularly those essential to the stock market, government infrastructures, defenses, and industrial power industries. These networks operate by physically isolating sensitive IT systems from the broader internet, which traditionally provides a high level of security. However, this isolation comes with inherent risks, particularly when data is transferred across the air-gap or connected systems. While air-gapped systems offer excellent protection for data-at-rest, they are not immune to compromise, especially when the systems require interaction with the internet or other connected systems.
The principle behind air-gapping is straightforward, by eliminating internet connectivity, the attack surface is drastically reduced, making remote hacking, malware infiltration, and ransomware attacks significantly more difficult. However, while this separation offers substantial protection for data at rest, it also introduces complex operational challenges. Maintaining an air-gapped environment demands meticulous protocols for transferring information because direct electronic communication with the outside world is impossible. Consequently, organizations must rely on controlled, manual data exchanges using physical media like USB drives, external hard drives, or other removable storage devices. Though these methods appear secure, they often represent the most vulnerable point of entry for cyber attackers aiming to bypass the air-gap.
Prior Art:
For instance, US8935275B2 describes a distributed peer-to-peer system where Human Operating System (HOS) applications enable the transmission and accumulation of knowledge packets among user nodes, using taxonomically classified data structures, search macros, and UKID (Universal Knowledge Information Databases) identifiers.
US8935275B2, a continuation of above, operates on the use of knowledge packets, structured knowledge classification, distributed access, and user-driven search; it lacks a dual environment model and secured ingestion. The NLP (natural language driven) intent extraction and intent generation is also not present.
US11182416B2 discloses a system and method that leverages machine learning and cognitive computing to process files by converting data points into vector representations and enriching them with synonymous or related terms. It analyses word frequencies, applies them to similar terms, and uses these vectorized forms to match incoming communications with previously classified files based on semantic similarity. In continuation of the above, which is primarily focused on internal file classification and message mapping, it lacks a broader, more advanced architecture, such as, intent-driven web knowledge acquisition, bias-aware validation, structured knowledge packetization, and integration with large language models (LLMs) for inference, reducing its functionality and scope.
Although the existing systems do provide an operating systems using knowledge packets and structured knowledge classification where the vector based representation is also inculcated, they lack to provide a dual environment model and secured ingestion which is intent driven, moreover the present invention comprises a secure system, structured and compressed knowledge packetization that use AI (Artificial intelligence) based inferencing that can help the isolated data system to interact with public internet with secure and intelligent pipeline having task specific intent.

DEFINITIONS
The expression “system” used hereinafter in this specification refers to an ecosystem comprising, but is not limited to a system with a user, input and output devices, processing unit, plurality of mobile devices, a mobile device-based application to identify dependencies and relationships between diverse businesses, a visualization platform, and output; and is extended to computing systems like mobile, laptops, computers, PCs, etc.
The expression “input unit” used hereinafter in this specification refers to, but is not limited to, mobile, laptops, computers, PCs, keyboards, mouse, pen drives or drives.
The expression “output unit” used hereinafter in this specification refers to, but is not limited to, an onboard output device, a user interface (UI), a display kit, a local display, a screen, a dashboard, or a visualization platform enabling the user to visualize, observe or analyze any data or scores provided by the system.
The expression “processing unit” refers to, but is not limited to, a processor of at least one computing device that optimizes the system.
The expression “large language model (LLM)” used hereinafter in this specification refers to a type of machine learning model designed for natural language processing tasks such as language generation. LLMs are language models with many parameters, and are trained with self-supervised learning on a vast amount of text.
The expression “Offline Environment/ Closed system” it refers to a computing system or network that operates without internet connectivity, often used in sensitive sectors like defense, healthcare, and finance, where data isolation is essential for security and confidentiality.
The expression “Online Environment/ Open System” it refers to a system that is connected to the internet and performs data acquisition, validation, and knowledge packet generation based on requests from the offline system.
The expression “Topic Modeling” it refers to a technique that uses algorithms like Latent Dirichlet Allocation (LDA) identify key themes or topics within text data.
The expression “Knowledge Graph” refers to a structured graph where nodes represent entities and edges represent relationships. Used for semantic traversal and identifying connected, relevant content during web curation.
The expression “Provenance Metadata” refers to Information that describes the origin, trustworthiness, and context of each piece of content, including source URLs, timestamps, and relevance scores.
The expression “Knowledge Packet” refers to a multi-layered, structured data package containing Raw web content, Summaries, Embeddings, Metadata.
The expression “MinHash and Jaccard Distance” refers to Algorithms used for duplicate detection, especially to identify near-identical content in large datasets by measuring text similarity.
The expression “Smart Crawler” refers to an automated web crawler that respects domain filters and robots.txt, and collects content only from vetted and domain-specific sources.
The expression “Community Clustering” refers to the process of grouping related data into clusters using algorithms like Leiden or Louvain, ensuring diverse viewpoints are included in the final dataset.
OBJECTS OF THE INVENTION
The primary object of the present invention is to provide a secure, structured, and intelligent system and method for the integration of internet-derived knowledge into offline, air-gapped, or isolated computing environments.
Another object of the invention is to enable automated extraction of user intent in the offline system using natural language processing, topic modelling, and embedding-based summarization.
A further object of the invention is to securely transmit the intent signature to an online, connected system, which performs guided web curation through knowledge graph traversal, relevance scoring, and smart crawling.
Yet another object of the invention is to perform validation, redundancy removal, entropy-based rebalancing, and adversarial debiasing on the curated content to improve quality and minimize ideological or regional biases.
Another object of the invention is to package the curated content into multi-layer encrypted knowledge packets, incorporating raw data, LLM-generated summaries, semantic embeddings, and provenance metadata.
Another object of the invention is to provide a scalable solution to support secure, bidirectional knowledge transfer, enhancing the functionality of air-gapped AI systems, especially in domains such as defence, research and where real-time, relevant, and validated information is critical yet inaccessible directly.
SUMMARY
Before the present invention is described, it is to be understood that the present invention is not limited to specific methodologies and materials described, as these may vary as per the person skilled in the art. It is also to be understood that the terminology used in the description is for the purpose of describing the particular embodiments only and is not intended to limit the scope of the present invention.
The present invention describes a system and method for offline external knowledge augmentation via intent-driven web curation and knowledge packetization. The system comprises of an input unit, a processing unit and output unit, wherein the processing unit further comprises of subsystems including Intent Extraction Algorithm as closed system , Guided Web Curation Using Knowledge Graphs as open system, Validation, Rebalancing, and Bias Correction, Knowledge Packet Generation and Archiving, and Reintegration and Inference as closed system. The Intent Extraction Algorithm sub system implements a multi-layered intent extraction algorithm and consists of natural language parsing module, topic modeling module, embedding-based summarization module and intent signature generation module. The intent signature generated by the Intent Extraction Algorithm sub-system, serves as the input for the Guided Web Curation using Knowledge Graphs subsystem, which performs knowledge-driven web curation through the Entity Expansion and Graph Traversal module, Relevance Scoring module, Smart Crawler Execution module, and Community Clustering module.

According to an aspect of the present invention, the Validation, Rebalancing, and Bias Correction subsystem ensures that the curated content is accurate, non-redundant, balanced across topics, and free from dominant bias and comprises of Redundancy Detection module, Probabilistic Content Scoring module, Entropy-Based Rebalancing module, and Adversarial Debiasing module. The Knowledge Packet Generation and Archiving subsystem comprises of Multilayer Packet Structure module, Compression and Encryption module, and Versioning and Traceability module and this subsystem packages the Validated and balanced content into structured, secure, and traceable knowledge packets. The Reintegration and Inference subsystem is a Closed System that reintegrates curated knowledge packets into the closed system for downstream inference and reasoning. It comprises of Packet Mounting and Indexing module, Contextual Joining with Local Data module, and LLM Inference Pipeline module.

According to an aspect of the present invention, the method for transforming repository data to MCP -compliant API’s and tools comprising receive user input in natural language within the closed system; extract intent by applying semantic parsing, topic modeling, and embedding-based summarization; generate an intent signature comprising entities, topics, semantic embeddings, and contextual summaries; transfer the intent signature securely to the open system via a stub or intermediate enclave; expand entities and traverse knowledge graphs to identify semantically related and authoritative external content; retrieve and filter documents using probabilistic scoring functions based on relevance, freshness, authority, and redundancy; validate and rebalance curated content by detecting duplicates, applying entropy-based sampling, and correcting bias through adversarial debiasing; package curated content into multilayer knowledge packets containing raw documents, summaries, embeddings, and provenance metadata; compress, encrypt, and version the knowledge packets using Zstandard, AES-256 encryption, semantic UUIDs, and hash chains for traceability; and reintegrate the packets into the closed system, index metadata and embeddings, and perform Retrieval-Augmented Generation (RAG)-based inference with both external and local data.
BRIEF DESCRIPTION OF DRAWINGS
A complete understanding of the present invention may be made by reference to the following detailed description which is to be taken in conjugation with the accompanying drawing. The accompanying drawing, which is incorporated into and constitutes a part of the specification, illustrates one or more embodiments of the present invention and, together with the detailed description, it serves to explain the principles and implementations of the invention.
FIG. 1 illustrates a flowchart of the workflow of the present invention.
FIG.2 illustrates the component diagram of the system of the present invention.
FIG.3 illustrates the flow diagram of the web curation logic of the system of the present invention.
DETAILED DESCRIPTION OF INVENTION
Before the present invention is described, it is to be understood that this invention is not limited to methodologies described, as these may vary as per the person skilled in the art. It is also to be understood that the terminology used in the description is for the purpose of describing the particular embodiments only and is not intended to limit the scope of the present invention. Throughout this specification, the word “comprise”, or variations such as “comprises” or “comprising”, will be understood to imply the inclusion of a stated element, integer or step, or group of elements, integers or steps, but not the exclusion of any other element, integer or step, or group of elements, integers or steps. The use of the expression “at least” or “at least one” suggests the use of one or more elements or ingredients or quantities, as the use may be in the embodiment of the invention to achieve one or more of the desired objects or results. Various embodiments of the present invention are described below. It is, however, noted that the present invention is not limited to these embodiments, but rather the intention is that modifications that are apparent are also included.
The present invention describes a system and method for offline external knowledge augmentation via intent-driven web curation and knowledge packetization. The present invention pertains to a structured and guided process of capturing user intent from an isolated system, curating relevant external data using advanced knowledge graph algorithms, validating and balancing the information, and reintegrating it into the closed system via structured, compressed knowledge packets for use in AI-based inferencing. The system and method of the present invention provides a secure, bidirectional, intelligent pipeline that can extract task-specific intent, gather and validate relevant online data, and reintegrate it efficiently and securely into an offline system for AI consumption.
According to the embodiment of the present invention, as described in FIG. 2, the system comprises of an input unit , a processing unit and output unit, wherein the processing unit further comprises of different subsystems including Intent Extraction Algorithm (closed system), Guided Web Curation Using Knowledge Graphs (open system), Validation, Rebalancing, and Bias Correction, Knowledge Packet Generation and Archiving, and Reintegration and Inference (closed system). Closed System (Offline Environment) is responsible for intent generation, secure packet ingestion, and inference and Open System (Connected Environment) is responsible for web data acquisition, validation, balancing, and packet creation.
According to the embodiment of the present invention, the Intent Extraction Algorithm sub system implements a multi-layered intent extraction algorithm and consists of natural language parsing module, topic modeling module, embedding-based summarization module and intent signature generation module. The Intent Extraction Algorithm is designed to identify and represent the underlying intent in a given input. The algorithm is implemented through a sequence of specialized modules, each performing a defined function to progressively refine the extracted intent.
Natural Language Parsing Module: This module analyzes the input text using dependency parsers (for example, spaCy or Stanford NLP). It identifies and extracts the key grammatical relationships, such as subject–verb–object triples. It further processes the text by normalizing words through lemmatization and standardizing entities using Named Entity Recognition (NER). This ensures that different forms of the same word or entity are treated uniformly.
Topic Modeling Module: Once the text is parsed, this module determines the main topics contained within the input. It applies machine learning techniques such as Latent Dirichlet Allocation (LDA) and BERTopic to discover recurring patterns and themes. The topics are stored as numerical vectors, which serve as part of the input for creating an intent signature.
Embedding-Based Summarization Module: This module converts the processed input (prompt and related context) into numerical embeddings using pre-trained language models such as e5-small-v2, Instructor-XL, or all-MiniLM-L6-v2. These embeddings capture the semantic meaning of the input. To represent the overall semantic intent, the system calculates the centroid (average position) of the embedding vectors, which becomes the semantic core representation.
Intent Signature Generation Module: Finally, this module compiles the results of the previous stages into a structured representation known as an intent signature. The intent signature is stored in a machine-readable format, such as a JSON object. This object may include extracted entities, identified topics, summarized semantic vectors, and other metadata, providing a compact and consistent representation of the input intent.
For example:
{
"topic_vector": [...],
"key_entities": ["EV", "battery", "India"],
"intent_type": "market comparison",
"required_sources": ["news", "research"],
"timeframe": "past 12 months"
}
According to the embodiment of the present invention, as described in FIG. 3, the intent signature generated by the Intent Extraction Algorithm sub-system, serves as the input for the Guided Web Curation Using Knowledge Graphs subsystem, which performs knowledge-driven web curation through the following modules: Entity Expansion and Graph Traversal module, Relevance Scoring module, Smart Crawler Execution module, and Community Clustering module.
Entity Expansion and Graph Traversal module: Entities from the intent signature are treated as seed nodes. A domain-specific knowledge graph is traversed using a weighted Breadth First Search (BFS). Edge weights are computed from a combination of semantic similarity and source authority, thereby ensuring traversal prioritizes semantically relevant and credible connections.
Relevance Scoring module: Each candidate content item retrieved from the web is evaluated by computing a composite relevance score. The score includes cosine similarity between the document embedding and the intent vector, source authority of the originating domain, freshness based on the date of publication, and a redundancy penalty to reduce overlap with existing corpus content. The weights assigned to these parameters are dynamically tuned through contextual reinforcement, enabling the system to improve accuracy based on performance feedback.
Smart Crawler Execution module: The domain-aware crawler module is used to gather data from vetted sources only. The crawler enforces domain filters, complies with robots.txt rules, and applies link heuristics to selectively expand its search within relevant and reliable boundaries.
Community Clustering module: The curated content is then organized using clustering algorithms such as Leiden or Louvain. These algorithms operate on co-citation graphs or entity-linking graphs to form communities of related information. This ensures that the curated results reflect a diversity of perspectives and do not overrepresent a single viewpoint.
According to the embodiment of the present invention, the Validation, Rebalancing, and Bias Correction subsystem ensures that the curated content is accurate, non-redundant, balanced across topics, and free from dominant bias. It consists of the following modules: Redundancy Detection module, Probabilistic Content Scoring module, Entropy-Based Rebalancing module, and Adversarial Debiasing module.
Redundancy Detection module: This module removes duplicate or near-duplicate documents. Techniques such as MinHash combined with Jaccard distance are used, and a threshold (e.g., less than 0.75) is applied to filter out repetitive content.
Probabilistic Content Scoring module: Each document is then evaluated using a logistic regression model trained on multiple features, such as keyword density, source reliability, text entropy (information richness), and topical alignment with the intent signature. The result is a probability score indicating the overall quality and relevance of the document.
Entropy-Based Rebalancing module: To maintain diversity, the module calculates topic-wise entropy using the formula:
H = -Σ (p_i * log(p_i)) where pᵢ represents the distribution of documents across topics. If entropy falls below a threshold (e.g., 0.65), the system detects underrepresented clusters and selectively resamples from them, ensuring balanced topic coverage.
Adversarial Debiasing module: This is a discriminator module that is trained to detect regional or ideological bias within the content. Documents predicted to exhibit stronger bias are downweighted, with reweighting applied inversely to the predicted bias probability. This reduces over-representation of dominant perspectives and promotes balanced outputs.
According to the embodiment of the present invention, the Knowledge Packet Generation and Archiving subsystem comprises of Multilayer Packet Structure module, Compression and Encryption module, and Versioning and Traceability module. The Validated and balanced content is then packaged into structured, secure, and traceable knowledge packets in this subsystem.
Multilayer Packet Structure module: Each packet is created with multiple layers, including Raw Content Layer (original HTML, text, PDF documents), Summary Layer (summaries generated by large language models), Embedding Layer (vector embeddings generated by models such as OpenAI Ada or BGE-base), and Provenance Metadata Layer (source domain, timestamp, relevance score, and other attributes).
Compression and Encryption module: To ensure efficiency and security, packets are archived using Zstandard compression and structured with TAR formatting. They are further encrypted using AES-256 encryption, with cryptographic hash values stored for integrity verification.
Versioning and Traceability module: Each packet is assigned a unique version ID and a semantic UUID derived from the intent vector. A hash chain is maintained across versions, enabling complete auditability and secure traceability of content updates and transfers.
According to the embodiment of the present invention, the Reintegration and Inference subsystem is a Closed System that reintegrates curated knowledge packets into the closed system for downstream inference and reasoning. It comprises of Packet Mounting and Indexing module, Contextual Joining with Local Data module, and LLM Inference Pipeline module.
Packet Mounting and Indexing module: The Packets are mounted as read-only file systems or virtual data sources. Metadata is parsed and indexed into vector databases such as FAISS or Weaviate.
Contextual Joining with Local Data module: When a user query is received, the system performs k-nearest neighbor (k-NN) searches over the embedding index to retrieve the most relevant documents. External documents from the packets are combined with locally stored data for richer context.
LLM Inference Pipeline module: The final step integrates user prompts with matched documents from both packet and local sources. A Retrieval-Augmented Generation (RAG) pipeline produces summaries and answers, along with confidence scores. This ensures responses are both contextually accurate and verifiable against the curated knowledge base.
According to the embodiment of the present invention, FIG. 1 describes the method for offline external knowledge augmentation via intent-driven web curation and knowledge packetization and comprising:
• Receive user input in natural language within the closed system;
• Extract intent by applying semantic parsing, topic modeling, and embedding-based summarization;
• Generate an intent signature comprising entities, topics, semantic embeddings, and contextual summaries;
• Transfer the intent signature securely to the open system via a stub or intermediate enclave;
• Expand entities and traverse knowledge graphs to identify semantically related and authoritative external content;
• Retrieve and filter documents using probabilistic scoring functions based on relevance, freshness, authority, and redundancy;
• Validate and rebalance curated content by detecting duplicates, applying entropy-based sampling, and correcting bias through adversarial debiasing;
• Package curated content into multilayer knowledge packets containing raw documents, summaries, embeddings, and provenance metadata.
• Compress, encrypt, and version the knowledge packets using Zstandard, AES-256 encryption, semantic UUIDs, and hash chains for traceability;
• Reintegrate the packets into the closed system, index metadata and embeddings, and perform Retrieval-Augmented Generation (RAG)-based inference with both external and local data.
While considerable emphasis has been placed herein on the specific elements of the preferred embodiment, it will be appreciated that many alterations can be made and that many modifications can be made in preferred embodiment without departing from the principles of the invention. These and other changes in the preferred embodiments of the invention will be apparent to those skilled in the art from the disclosure herein, whereby it is to be distinctly understood that the foregoing descriptive matter is to be interpreted merely as illustrative of the invention and not as a limitation.
, Claims:We claim,
1. A system and method for offline external knowledge augmentation via intent-driven web curation and knowledge packetization
characterized in that
the system comprises of an input unit , a processing unit and output unit , wherein the processing unit further comprises of subsystems including Intent Extraction Algorithm as closed system , Guided Web Curation Using Knowledge Graphs as open system, Validation, Rebalancing, and Bias Correction, Knowledge Packet Generation and Archiving, and Reintegration and Inference as closed system,
such that the Intent Extraction Algorithm sub system implements a multi-layered intent extraction algorithm and consists of natural language parsing module, topic modeling module, embedding-based summarization module and intent signature generation module;
the intent signature generated by the Intent Extraction Algorithm sub-system, serves as the input for the Guided Web Curation using Knowledge Graphs subsystem, which performs knowledge-driven web curation through the Entity Expansion and Graph Traversal module, Relevance Scoring module, Smart Crawler Execution module, and Community Clustering module;
the Validation, Rebalancing, and Bias Correction subsystem ensures that the curated content is accurate, non-redundant, balanced across topics, and free from dominant bias and comprises of Redundancy Detection module, Probabilistic Content Scoring module, Entropy-Based Rebalancing module, and Adversarial Debiasing module;
the Knowledge Packet Generation and Archiving subsystem comprises of Multilayer Packet Structure module, Compression and Encryption module, and Versioning and Traceability module and this subsystem packages the Validated and balanced content into structured, secure, and traceable knowledge packets;
the Reintegration and Inference subsystem is a Closed System that reintegrates curated knowledge packets into the closed system for downstream inference and reasoning. It comprises of Packet Mounting and Indexing module, Contextual Joining with Local Data module, and LLM Inference Pipeline module;
and the method for offline external knowledge augmentation via intent-driven web curation and knowledge packetization comprising:
• receive user input in natural language within the closed system;
• extract intent by applying semantic parsing, topic modeling, and embedding-based summarization;
• generate an intent signature comprising entities, topics, semantic embeddings, and contextual summaries;
• transfer the intent signature securely to the open system via a stub or intermediate enclave;
• expand entities and traverse knowledge graphs to identify semantically related and authoritative external content;
• retrieve and filter documents using probabilistic scoring functions based on relevance, freshness, authority, and redundancy;
• validate and rebalance curated content by detecting duplicates, applying entropy-based sampling, and correcting bias through adversarial debiasing;
• package curated content into multilayer knowledge packets containing raw documents, summaries, embeddings, and provenance metadata;
• compress, encrypt, and version the knowledge packets using Zstandard, AES-256 encryption, semantic UUIDs, and hash chains for traceability;
• reintegrate the packets into the closed system, index metadata and embeddings, and perform Retrieval-Augmented Generation (RAG)-based inference with both external and local data.

2. The system and method as claimed in claim 1, wherein the Closed System (Offline Environment) is responsible for intent generation, secure packet ingestion, and inference and Open System (Connected Environment) is responsible for web data acquisition, validation, balancing, and packet creation.

3. The system and method as claimed in claim 1, wherein in the Intent Extraction Algorithm sub system, the Natural Language Parsing Module analyses the input text using dependency parsers and identifies and extracts the key grammatical relationships and processes the text by normalizing words through lemmatization and standardizing entities; Topic Modeling Module determines the main topics contained within the input and applies machine learning techniques to discover recurring patterns and themes, Embedding-Based Summarization Module converts the processed input-prompt and related context into numerical embeddings using pre-trained language models and these embeddings capture the semantic meaning of the input; Intent Signature Generation Module compiles the results of the previous stages into a structured representation known as an intent signature that is stored in a machine-readable format.

4. The system and method as claimed in claim 1, wherein the intent signature generated by the Intent Extraction Algorithm sub-system, serves as the input for the Guided Web Curation Using Knowledge Graphs subsystem.

5. The system and method as claimed in claim 1, wherein in the Guided Web Curation Using Knowledge Graphs subsystem, the Entity Expansion and Graph Traversal module treats Entities from the intent signature as seed nodes and a domain-specific knowledge graph is traversed using a weighted Breadth First Search (BFS), in Relevance Scoring module, each candidate content item retrieved from the web is evaluated by computing a composite relevance score and the score includes cosine similarity between the document embedding and the intent vector, source authority of the originating domain, freshness based on the date of publication, and a redundancy penalty to reduce overlap with existing corpus content; the Smart Crawler Execution module is used to gather data from vetted sources only and the crawler enforces domain filters, complies with robots.txt rules, and applies link heuristics to selectively expand its search within relevant and reliable boundaries, the Community Clustering module organizes the curated content using clustering algorithms that operate on co-citation graphs or entity-linking graphs to form communities of related information.

6. The system and method as claimed in claim 1, wherein in the Validation, Rebalancing, and Bias Correction subsystem, the Redundancy Detection module removes duplicate or near-duplicate documents, the Probabilistic Content Scoring module evaluates each document using a logistic regression model trained on multiple features, such as keyword density, source reliability, text entropy and topical alignment with the intent signature and the result is a probability score indicating the overall quality and relevance of the document, the Entropy-Based Rebalancing module calculates topic-wise entropy using the formula, and Adversarial Debiasing module is trained to detect regional or ideological bias within the content such that the documents predicted to exhibit stronger bias are down weighted, with reweighting applied inversely to the predicted bias probability.

7. The system and method as claimed in claim 1, wherein in the Knowledge Packet Generation and Archiving subsystem, the Multilayer Packet Structure module creates each packet with multiple layers, including Raw Content Layer, Summary Layer, Embedding Layer, and Provenance Metadata Layer; Compression and Encryption module archives the packets using Zstandard compression and structured with TAR formatting and are encrypted using AES-256 encryption, with cryptographic hash values stored for integrity verification; Versioning and Traceability module assigns each packet a unique version ID and a semantic UUID derived from the intent vector and a hash chain is maintained across versions, enabling complete auditability and secure traceability of content updates and transfers.

8. The system and method as claimed in claim 1, wherein in the Reintegration and Inference subsystem, the Packet Mounting and Indexing module mounts the Packets as read-only file systems or virtual data sources; Contextual Joining with Local Data module performs k-nearest neighbor (k-NN) searches over the embedding index to retrieve the most relevant documents and external documents from the packets are combined with locally stored data for richer context; LLM Inference Pipeline module integrates user prompts with matched documents from both packet and local sources and a Retrieval-Augmented Generation (RAG) pipeline produces summaries and answers, along with confidence scores.

9. The system and method as claimed in claim 1, wherein the system provides a structured and guided process of capturing user intent from an isolated system, curating relevant external data using advanced knowledge graph algorithms, validating and balancing the information, and reintegrating it into the closed system via structured, compressed knowledge packets for use in AI-based inferencing.

Documents

Application Documents

#	Name	Date
1	202521082256-STATEMENT OF UNDERTAKING (FORM 3) [29-08-2025(online)].pdf	2025-08-29
2	202521082256-POWER OF AUTHORITY [29-08-2025(online)].pdf	2025-08-29
3	202521082256-FORM 1 [29-08-2025(online)].pdf	2025-08-29
4	202521082256-FIGURE OF ABSTRACT [29-08-2025(online)].pdf	2025-08-29
5	202521082256-DRAWINGS [29-08-2025(online)].pdf	2025-08-29
6	202521082256-DECLARATION OF INVENTORSHIP (FORM 5) [29-08-2025(online)].pdf	2025-08-29
7	202521082256-COMPLETE SPECIFICATION [29-08-2025(online)].pdf	2025-08-29
8	Abstract.jpg	2025-09-19
9	202521082256-FORM-9 [26-09-2025(online)].pdf	2025-09-26
10	202521082256-FORM 18 [01-10-2025(online)].pdf	2025-10-01