System And Method Of Dynamic Large Scale Context Mining For Specific

< Back

System And Method Of Dynamic Large Scale Context Mining For Specific Multi Agentic Tasks

Abstract: The present invention describes a system and method for dynamic large scale context mining for specific multi-agentic tasks. The system comprises of an input unit , a processing unit and output unit , wherein the processing unit comprises of agentic layer, knowledge layer and context mining brain layer. The system enables agents to fully utilize large and diverse data while conforming to LLM limitations. The context mining brain layer dynamically constructs token-bounded, semantically enriched, task-specific knowledge packets tailored for downstream task execution using LLMs and it comprises of modules including task analyser, data scooper, linkage engine, compressor and packet builder. The system has a dynamic task-driven design that ensures more relevant and coherent LLM inputs, It also supports scalable multi-agentic applications and provides reliability and traceability via interlinked packets. The system enables agents to fully utilize large and diverse data while conforming to LLM limitations.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

11 April 2025

Publication Number

41/2025

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

Persistent Systems

Bhageerath, 402, Senapati Bapat Rd, Shivaji Cooperative Housing Society, Gokhale Nagar, Pune - 411016, Maharashtra, India

Inventors

1. Mr. Nitish Shrivastava

10764 Farallone Dr, Cupertino, CA 95014-4453, United States.

2. Mr. Pradeep Sharma

20200 Lucille Ave Apt 62 Cupertino CA 95014, United States

Specification

Description:FIELD OF INVENTION
The present invention relates to a system and method for dynamic large scale context mining for specific multi-agentic tasks. More Particularly, it focuses on systems, methods, and architectures that enable intelligent agents to dynamically process large-scale, heterogeneous contextual data for efficient downstream execution using Large Language Models (LLMs), by generating optimized, task-specific context packets.

BACKGROUND
Large Language Models (LLMs), like the ones used in chatbots and AI tools, can do many things. They can generate text, translate languages, understand questions, classify information, analyze emotions, assist with coding, and even hold conversations. However, these models have a major limitation — they can only process a limited amount of text at once. If the input is too long, it has to be broken into smaller parts, which can cause the model to lose track of the bigger picture. This often results in responses that feel disconnected or incomplete, making it hard for the AI to give accurate answers for large or complex tasks. LLMs have inherent input size limitations (token constraints) that limit their ability to operate over large datasets like codebases, documentation, logs, and organizational knowledge. To deal with this issue, some techniques like document chunking and Retrieval-Augmented Generation (RAG) have been developed. Document chunking splits long content into smaller sections, while RAG improves accuracy by pulling in relevant information from external sources. But these methods still struggle to maintain a smooth and coherent understanding of the original content. Existing techniques, such as standard Retrieval-Augmented Generation (RAG), struggle to capture the full context necessary for high-fidelity task execution, often resulting in disjointed outputs, loss of semantic coherence, and poor reliability.
CN116756579B describes a training method of a large language model and a text processing method based on the large language model, relates to the fields of artificial intelligence, cloud technology, natural language processing, machine learning and the like, and particularly relates to a language model in a pre-training language model. The method comprises the following steps: acquiring a training set and a pre training language model corresponding to each task in a plurality of natural language processing tasks in the same target field, acquiring a second feature extraction network corresponding to each task, repeatedly executing training operation on the second feature extraction network corresponding to each task based on the training set corresponding to the task for each task until a training ending condition is met, acquiring a trained second feature extraction network corresponding to the task, and acquiring a target large language model of the target field based on the pre-training language model and the trained second feature extraction networks corresponding to each task. Based on the method, the accuracy of the text processing result output by the large language model can be improved.
US10360304B1describes language processing interface to configure a user interface of a device to receive an input, understand an intent from the input, and send a second set of instructions to the device to operate the device to configure the user interface to display a feedback request and receive a second input, and a control system to select a model to transform the intent to an action to influence physical conditions of a building, determine a state of one or more components to perform the action, and send instructions to the one or more components to alter their operations to achieve the state.
Therefore, there is a need of a solution to overcome the limitations of current LLM-based systems that can operate over large datasets like codebases, documentation, logs, and organizational knowledge.
DEFINITIONS:
The expression “system” used hereinafter in this specification refers to an ecosystem comprising, but is not limited to a system with a user, input and output devices, processing unit, plurality of mobile devices, a mobile device-based application to identify dependencies and relationships between diverse businesses, a visualization platform, and output; and is extended to computing systems like mobile, laptops, computers, PCs, etc.
The expression “input unit” used hereinafter in this specification refers to, but is not limited to, mobile, laptops, computers, PCs, keyboards, mouse, pen drives or drives.
The expression “output unit” used hereinafter in this specification refers to, but is not limited to, an onboard output device, a user interface (UI), a display kit, a local display, a screen, a dashboard, or a visualization platform enabling the user to visualize, observe or analyse any data or scores provided by the system.
The expression “processing unit” refers to, but is not limited to, a processor of at least one computing device that optimizes the system.
The expression “large language model (LLM)” used hereinafter in this specification refers to a type of machine learning model designed for natural language processing tasks such as language generation. LLMs are language models with many parameters, and are trained with self-supervised learning on a vast amount of text.
The expression “agents” used hereinafter in this specification refer to software programs designed to interact with their environment, collect data, and autonomously perform tasks to achieve predetermined goals, with humans setting the goals but the agent independently choosing the best actions.
The expression “packets” used hereinafter in this specification refer to a small, self-contained unit of data that is transmitted across a network, like the internet, broken down from a larger piece of data, along with header information for routing and reassembly.
The expression “token” used hereinafter in this specification refer to a basic unit of text that the model processes, like a word or part of a word, and a context window is the number of tokens the model considers at any given time, determining how much of the conversation it can "remember".
The expression “large scale” used hereinafter in this specification refer to any data set that is very large, for example a repository that is 3 GB or a document that is 20MB. This is in context to the maximum context that any LLM can take. So, if that exceeds the limit (i.e. 128K tokens in most of the case), it falls under this category.
The expression “multi-agent framework” or “multi-agent task” used hereinafter in this specification refer to a structure or architecture that allows for the creation and management of multiple AI agents. These agents can be simple or complex, and they can interact with each other and their environment to achieve a common goal.
The expression “attention scoring” used hereinafter in this specification refer to a method where a model assigns weights or scores to different parts of an input sequence, indicating how much attention should be paid to each element when making a prediction or generating output.
OBJECTS OF THE INVENTION:
The primary object of the present invention is to provide a system and method for dynamic large scale context mining for specific multi-agentic tasks.
Yet another object of the present invention is to provide a system and method for dynamic large scale context mining that enables agents to dynamically process large-scale, heterogeneous contextual data for efficient downstream execution using Large Language Models (LLMs).
Yet another object of the present invention is to provide a system and method for dynamic large scale context mining that enables agents to effectively collaborate and perform downstream tasks by maintaining context fidelity across code, tools, services, and datasets.
Yet another object of the present invention is to provide a system and method for dynamic large scale context mining whose dynamic task-driven design ensures more relevant and coherent LLM inputs.
Yet another object of the present invention is to provide a system and method for dynamic large scale context mining that maintains semantic integrity across diverse data types and supports scalable multi-agentic applications.
Yet another object of the present invention is to provide a system and method for dynamic large scale context mining that provides reliability and traceability via interlinked packets.

SUMMARY
Before the present invention is described, it is to be understood that the present invention is not limited to specific methodologies and materials described, as these may vary as per the person skilled in the art. It is also to be understood that the terminology used in the description is for the purpose of describing the particular embodiments only and is not intended to limit the scope of the present invention.

The present invention describes a system and method for dynamic large scale context mining for specific multi-agentic tasks. The system comprises of an input unit , a processing unit and output unit , wherein the processing unit comprises of agentic layer, knowledge layer and context mining brain layer. The system enables agents to fully utilize large and diverse data while conforming to LLM limitations. The Context Mining Brain delivers domain- and task-sensitive representations of knowledge, dramatically improving the outcome quality of multi-agent LLM-based systems.
According to an aspect of the present invention, the context mining brain layer dynamically constructs token-bounded, semantically enriched, task-specific knowledge packets tailored for downstream task execution using LLMs and it comprises of modules including task analyser, data scooper, linkage engine, compressor and packet builder.

According to an aspect of the present invention, the method comprises the steps of providing task description as input; analysing task by task analyser for finding intent in the input; scooping data by data scooper for finding relations; linking entities by linkage engine to establish relationships; compress context by compressor to reduce the size and tokens; build context packet and optimized context packet by packet builder that takes data and constructs the packets. store content and paths in database and read from database to feed to LLM and get structured format, insert structured data into graph database and analyze graph to detect dead code by using LLM to clean false positives and mark unreferenced nodes as dead code to remove dead code iteratively and compile and link the code and check the build of context packets and submit the final code.

BRIEF DESCRIPTION OF DRAWINGS
A complete understanding of the present invention may be made by reference to the following detailed description which is to be taken in conjugation with the accompanying drawing. The accompanying drawing, which is incorporated into and constitutes a part of the specification, illustrates one or more embodiments of the present invention and, together with the detailed description, it serves to explain the principles and implementations of the invention.

FIG. 1 illustrates a flowchart of the workflow of the present invention.

DETAILED DESCRIPTION OF INVENTION:
Before the present invention is described, it is to be understood that this invention is not limited to methodologies described, as these may vary as per the person skilled in the art. It is also to be understood that the terminology used in the description is for the purpose of describing the particular embodiments only and is not intended to limit the scope of the present invention. Throughout this specification, the word “comprise”, or variations such as “comprises” or “comprising”, will be understood to imply the inclusion of a stated element, integer or step, or group of elements, integers or steps, but not the exclusion of any other element, integer or step, or group of elements, integers or steps. The use of the expression “at least” or “at least one” suggests the use of one or more elements or ingredients or quantities, as the use may be in the embodiment of the invention to achieve one or more of the desired objects or results. Various embodiments of the present invention are described below. It is, however, noted that the present invention is not limited to these embodiments, but rather the intention is that modifications that are apparent are also included.
The present invention describes a system and method of dynamic large scale context mining for specific multi-agentic tasks. This system enables autonomous agents to effectively collaborate and perform downstream tasks by maintaining context fidelity across code, tools, services, and datasets. The system helps to dynamically constructs token-bounded, semantically enriched, task-specific knowledge packets tailored for downstream task execution using LLMs. The key features of the invention comprise of:
1. A multi-agent framework.
2. A tailored task-guided context-building pipeline.
3. Algorithms for semantic linking, context compression, and inter-dataset fusion.
4. Iterative context packet generation with intelligent pruning.
5. Token-window optimization strategies
The system comprises of an input unit , a processing unit and output unit , wherein the processing unit further comprises of agentic layer, knowledge layer and context mining brain /layer. The processing layer is a non-transitory machine-readable medium storing instruction to perform the method of dynamic large scale context mining for specific multi-agentic tasks. The Agentic Layer comprises of agents that utilize LLMs, tools, and APIs. The agents communicate over an agentic workflow protocol. Agents talk to each other or perform a task using a defined workflow which constitutes the protocol. There are various different agents in the system such as agent to write code, review data, summarize content, book flight, etc. Example: A coding agent generating code connects with a QA agent that validates the logic.
The Knowledge Layer is a database or knowledge base or graph database. It is a unified storage layer including structured (e.g., JIRA), semi-structured (e.g., JSON configs), and unstructured data (e.g., docs, code, logs).This layer supports search, retrieval, embedding, and versioning.
According to the present invention, the Context Mining Brain (CMB) dynamically constructs token-bounded, semantically enriched, task-specific knowledge packets and it comprises of different modules. Input for Context Mining Brain is selected from any text or data. Output from Context Mining Brain is a packet. The modules of Context Mining Brain are:
a. Task Analyzer: Classifies task type using few-shot classification and sets parameters. Classification is the task of assigning data points to pre-defined categories, while parameters are the variables that a model learns from data to make predictions, like weights and biases in a neural network.
b. Data Scooper: Identifies data needed using graph traversal (Graph traversal is a mechanism to find relationships between datasets), keyword expansion, and embedding-based search.
c. Linkage Engine: Uses dependency graphs and entity linking (e.g., code identifiers ↔ tickets).
d. Compressor: Summarizes, deduplicates, and ranks using:
• Graph-based summarization (e.g., Text Rank)
• Task-aware pruning via attention scoring
• Reinforcement-learning-based summarizers
e. Packet Builder: Packs context and validates token length and coherence. The packet builder takes data and constructs the packets. Example there is a large corpus of legal documents in access to 10GB. This process will reduce the data without losing the value and then puts that in a packet for LLM communication.

Example:
A multi-agent system performs root cause analysis of a failed deployment. The agents first access deployment logs then retrieve code diffs, link tickets and developer notes and finally build a compressed, semantically connected packet to present to the LLM.
According to the embodiment of the present invention, the algorithm of the context mining brain module are as follows:
1. Task Analyzer Algorithm:
# Input: Task Description
def analyze_task(task_description):
label = classify_task_type(task_description)
params = set_hyperparameters(label)
return label, params

2. Data Scoper Algorithm:
# Use hybrid retrieval: keyword, embedding, graph-based
def scope_data(task_embedding, knowledge_graph):
scope_nodes = traverse_relevant_nodes(task_embedding, knowledge_graph)
return scope_nodes

3. Linkage Engine Algorithm:
# Link code functions to JIRA tickets or documentation
def link_entities(entity_map, metadata):
links = []
for e in entity_map:
related = find_cross_refs(e, metadata)
links.append((e, related))
return links

4. Compressor Algorithm:
# Summarize cluster using TextRank and filter using attention weights
def compress_context(clusters):
compressed = []
for c in clusters:
summary = textrank_summary(c)
if is_task_relevant(summary):
compressed.append(summary)
return compressed

5. Packet Builder Algorithm:
# Build final packet based on token limit
def build_packet(compressed_segments, token_limit):
packet = []
count = 0
for seg in compressed_segments:
if count + token_count(seg) <= token_limit:
packet.append(seg)
count += token_count(seg)
return packet

According to an embodiment of the present invention, the method of dynamic large scale context mining for specific multi-agentic tasks as described in FIG. 1 comprises the steps of :
• Providing task description – Input is provided by the user.
• Analysing task- by Task Analyzer for finding Intent in the input
• Scooping data by Data Scooper for finding relations
• Linking entities by Linkage Engine to establish relationships.
• Compress Context by Compressor to reduce the size and tokens
• Build Context Packet and Optimized Context Packet by Packet Builder which packs context and validates token length and coherence. The packet builder takes data and constructs the packets.
• Crawl Repository- the system crawls the repository or database to retrieve, validate, and compare additional content if needed. This step helps enrich the context packet by incorporating verified and updated information.
• Store Content and Paths in Database- The processed content, along with its associated paths, is stored securely in the database. This ensures that the information is preserved in an organized manner and can be easily accessed for future tasks or reference
• Read from database- The system then retrieves the stored content and paths from the database. This data is extracted to prepare for processing in the next steps. Reading the stored information ensures that the required data is available for further analysis
• Feed to LLM- The retrieved data is fed into a Large Language Model (LLM) for analysis and processing. The LLM interprets the content, identifies patterns, and extracts meaningful information to facilitate structured data transformation.
• Get Structured JSON- The LLM processes the data and converts it into structured JSON format. This structured format organizes the data in a machine-readable way, ensuring that the information is ready for insertion into the graph database. (JSON, or JavaScript Object Notation, is a standard, text-based format for storing and exchanging data, often used in web applications to transmit data between a server and a web application.)
• Insert JSON Data into Graph Database- The structured JSON data is inserted into a graph database. This step involves transforming the extracted data into nodes and relationships that form the basis of the graph, making it easier to analyze connections between various entities.
• Analyze Graph to Detect Dead Code- Once the data is in the graph database, the system analyses the graph to identify dead code. Dead code refers to unreferenced or unused portions of the codebase that do not contribute to the functionality of the system. This analysis helps optimize the code and improve system efficiency.
• Use LLM to Clean False Positives- The Large Language Model (LLM) is used again to clean any false positives identified during the dead code detection process. This step ensures that only genuinely unreferenced or obsolete code is marked for removal, reducing errors and maintaining code integrity.
• Mark Unreferenced Nodes as Dead Code- The system marks unreferenced nodes as dead code. These nodes, representing obsolete or unnecessary code, are flagged for potential removal, helping maintain a cleaner and more efficient codebase
• Remove Dead Code Iteratively:
• Compile and Link the Code- After removing the dead code, the system compiles and links the remaining code. This step ensures that all necessary dependencies are correctly included and that the modified code is ready for execution. Compilation and linking verify that the system is error-free and ready to proceed.
• If Build Fails? Undo Changes (If Build Fails)- At this decision point, the system checks whether the build process succeeds or fails. If the build fails, corrective measures are taken to restore functionality. If the build succeeds, the final code is submitted. If the build fails, the system undoes the recent changes to revert the codebase to its previous state. This step helps prevent any faulty or incomplete modifications from affecting the system’s stability.
• Retry
• Submit Final Code (If Build Succeeds)- If the build succeeds, the system proceeds to submit the final code. This step involves sending the optimized and error-free code for further deployment or integration.

Example 1: Software Debug Task
• Task: "Fix bug in login module."
• Task Analyzer tags as Debug
• Data Scooper pulls login module code, related commits, tickets
• Linkage Engine connects stack traces with commits
• Compressor reduces stack traces to top calls, removes boilerplate
• Packet Builder constructs a 3,500-token packet for GPT-4

Example 2: Research Task (Competitive Benchmarking)
• Task: "Find product-market fit comparisons for XYZ."
• Task Analyzer detects research intent
• Data Scooper retrieves internal docs + crawled web data
• Linkage Engine maps internal vs. external claims
• Compressor summarises findings into insight pairs
• Output packet serves research agent for strategy planning

ADVANTAGES:
The system and method for dynamic large scale context mining for specific multi-agentic tasks has several advantages such as:
• Dynamic task-driven design ensures more relevant and coherent LLM inputs.
• Maintains semantic integrity across diverse data types.
• Supports scalable multi-agentic applications.
• Provides reliability and traceability via interlinked packets.
This invention enables agents to fully utilize large and diverse data while conforming to LLM limitations. The Context Mining Brain delivers domain- and task-sensitive representations of knowledge, dramatically improving the outcome quality of multi-agent LLM-based systems.
While considerable emphasis has been placed herein on the specific elements of the preferred embodiment, it will be appreciated that many alterations can be made and that many modifications can be made in preferred embodiment without departing from the principles of the invention. These and other changes in the preferred embodiments of the invention will be apparent to those skilled in the art from the disclosure herein, whereby it is to be distinctly understood that the foregoing descriptive matter is to be interpreted merely as illustrative of the invention and not as a limitation.
, Claims:We claim,
1. A system and method for dynamic large scale context mining for specific multi-agentic tasks
characterised in that
the system comprises of an input unit , a processing unit and output unit , wherein the processing unit comprises of agentic layer, knowledge layer and context mining brain layer,
such that the context mining brain layer dynamically constructs token-bounded, semantically enriched, task-specific knowledge packets tailored for downstream task execution using LLMs and it comprises of modules including task analyser, data scooper, linkage engine, compressor and packet builder;
and the method comprises the steps of
• providing task description as input;
• analysing task by task analyser for finding intent in the input;
• scooping data by data scooper for finding relations;
• linking entities by linkage engine to establish relationships;
• compress context by compressor to reduce the size and tokens;
• build context packet and optimized context packet by packet builder that takes data and constructs the packets.
• store content and paths in database and read from database to feed to LLM and get structured format
• insert structured data into graph database and analyze graph to detect dead code by using LLM to clean false positives and mark unreferenced nodes as dead code to remove dead code iteratively;
• compile and link the code and check the build of context packets and submit the final code.

2. The system and method as claimed in claim 1, wherein the agentic layer comprises of agents that utilize LLMs, tools, and APIs and the agents communicate over an agentic workflow protocol and the knowledge layer is a unified storage layer including structured, semi-structured and unstructured data and this layer supports search, retrieval, embedding, and versioning.

3. The system and method as claimed in claim 1, wherein the task analyser of context mining brain layer classifies task type using few-shot classification and sets parameters; data scooper identifies data needed using graph traversal, keyword expansion, and embedding-based search; linkage engine uses dependency graphs and entity linking.

4. The system and method as claimed in claim 1, wherein the compressor of context mining brain layer summarizes, deduplicates, and ranks using graph-based summarization; task-aware pruning via attention scoring and reinforcement-learning-based summarizers.

5. The system and method as claimed in claim 1, wherein packet builder of the context mining brain layer takes data and constructs context packets and validates token length and coherence.

6. The system and method as claimed in claim 1, wherein packet builder ensures semantic coherence while adhering to token constraints and the context packets are recursively refined by connecting datasets from multiple systems.

7. The system and method as claimed in claim 1, wherein the processing layer is a non-transitory machine-readable medium storing instruction to perform the method of dynamic large scale context mining for specific multi-agentic tasks.

8. The system and method as claimed in claim 1, wherein the system enables agents to fully utilize large and diverse data while conforming to LLM limitations and the context mining brain layer delivers domain- and task-sensitive representations of knowledge, dramatically improving the outcome quality of multi-agent LLM-based systems.

Documents

Application Documents

#	Name	Date
1	202521036190-STATEMENT OF UNDERTAKING (FORM 3) [11-04-2025(online)].pdf	2025-04-11
2	202521036190-POWER OF AUTHORITY [11-04-2025(online)].pdf	2025-04-11
3	202521036190-FORM 1 [11-04-2025(online)].pdf	2025-04-11
4	202521036190-FIGURE OF ABSTRACT [11-04-2025(online)].pdf	2025-04-11
5	202521036190-DRAWINGS [11-04-2025(online)].pdf	2025-04-11
6	202521036190-DECLARATION OF INVENTORSHIP (FORM 5) [11-04-2025(online)].pdf	2025-04-11
7	202521036190-COMPLETE SPECIFICATION [11-04-2025(online)].pdf	2025-04-11
8	202521036190-FORM-9 [26-09-2025(online)].pdf	2025-09-26
9	202521036190-FORM 18 [01-10-2025(online)].pdf	2025-10-01
10	Abstract.jpg	2025-10-07