System And Method For Adaptive Tool And Model Selection In Agentic

< Back

System And Method For Adaptive Tool And Model Selection In Agentic Workflows

Abstract: ABSTRACT Title: SYSTEM AND METHOD FOR ADAPTIVE TOOL AND MODEL SELECTION IN AGENTIC WORKFLOWS A system and method for adaptive tool and model selection in agentic workflows using contextual graph reasoning and evolutionary bandit optimization; the system (10) comprises an input unit (1) to receive task data, user ID, and organizational ID; a processing unit (2) featuring a context representation layer (CRL) (21) with nodes (21.1), edges (21.2), a graph attention network (GAT) (21.3), and a context vector (21.4) to model task-tool-user relationships. A selector engine layer (SEL) (22) applies contextual multi-armed bandit logic (22.2) and Thompson sampling (22.1) to identify optimal tool-model pairs. An evolutionary feedback layer (EFL) (23) executes the selected pair, evaluates output (23.3), updates historical priors (23.4), and applies time-decay (23.5). The system enables context-aware, probabilistic, and evolving decision-making, offering advantages over rigid, rule-based systems.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

17 July 2025

Publication Number

40/2025

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

Persistent Systems

Bhageerath, 402, Senapati Bapat Rd, Shivaji Cooperative Housing Society, Gokhale Nagar, Pune - 411016, Maharashtra, India.

Inventors

1. Mr. Nitish Shrivastava

10764 Farallone Dr, Cupertino, CA 95014-4453, United States.

Specification

Description:FIELD OF INVENTION
The present invention relates to the field of artificial intelligence and a system for intelligent orchestration and selection of computational tools and models within agentic workflows. More specifically, the invention relates to a system and method for adaptive tool and model selection in agentic workflows using contextual graph reasoning and evolutionary bandit optimization.

BACKGROUND
In modern intelligent systems, the ability to dynamically select the most appropriate computational models and tools for a given task is critical to ensuring optimal performance, scalability, and user satisfaction. Contextual reasoning over structured data representations, such as graphs allows systems to capture complex relationships among tasks, tools, users, and historical outcomes. Graph-based methods enable the identification of contextually significant patterns and dependencies. As user preferences and task requirements change over time, making decisions under uncertainty requires using probabilistic methods like Bayesian multi-armed bandits.
Existing systems used to select tools or models for completing tasks in agent-based workflows often rely on fixed rules or decisions that do not adapt well to changing or unpredictable environments. Newer methods use machine learning or contextual bandit techniques to improve selection, where they usually work in isolation and do not consider the full task context. These methods fail to adapt to dynamic environments or leverage valuable historical data from users and organizations. These systems also lack a unified way to represent different types of entities, such as users, tools, models, and organizational settings. As a result, current approaches are often less accurate, rigid, and not well-suited for handling complex workflows that require flexibility and personalization. As the complexity and diversity of tools and models grow, a new intelligent mechanism is needed to ensure optimal orchestration based on context, performance history, and user satisfaction.
Prior Arts:
US20230297861A1 discloses a system for graph recommendations for optimal model configurations; comprising a computing device may access a graph comprising one or more model nodes, one or more dataset nodes, and one or more edges, the model nodes having a plurality of features. The device may add one or more test dataset nodes and test edges to the graph; perform a series of iterative steps until a threshold is reached. For each iterative step: a selection probability is determined, the selection probability being based at least in part on a plurality of selection criteria; a particular model node is selected, the particular model node being selected based at least in part on the selection probability; the selection criterion is updated based at least in part on the particular model; and the plurality of features are updated based at least in part on the particular model. The device may provide the particular model node selected in the last iterative step.
202411068974 discloses an automatic keyword classification system, including a neural network (110) having an attention-based architecture (112) configured to process input textual data; wherein a preprocessing module (120) adapted to perform tokenization, removal of stop words, and stemming on the textual data to produce processed data; a training module (130) configured to train the neural network (110) utilizing a labelled dataset; the keywords within the dataset may be associated with predefined categories, and the attention mechanism may be optimized to emphasize contextually significant keywords; an inference module (150) configured to classify keywords within new, unlabelled textual data into corresponding categories based on the trained neural network (110) model; an output module (140) configured to generate and output a set of classified keywords, each associated with a respective category.
Whereas the prior arts focus on selecting model nodes through iterative probabilistic updates within a graph comprising models and datasets; and on a neural network-based keyword classification system utilizing an attention mechanism optimized for textual data ; they are limited to static inference and does not support dynamic decision-making across diverse task-tool-model interactions. None of the prior arts address the need for a unified, real-time system that employs contextual graph reasoning, probabilistic tool-model selection using Bayesian bandit optimization, and evolutionary feedback to adaptively orchestrate intelligent workflows in high-variability environments.
To overcome these drawbacks, there is a need for a novel, deterministic, tool orchestration system that can intelligently represent contextual relationships among tasks, tools, models, and users, and dynamically select optimal tool-model combinations through probabilistic reasoning and adaptive feedback mechanisms within real-time agentic workflows.

DEFINITIONS
The expression “system” used hereinafter in this specification refers to an ecosystem comprising, but not limited to, system for adaptive tool and model selection; with input and output devices, processing unit, plurality of mobile devices, a mobile device-based application. It is extended to computing systems like mobile phones, laptops, computers, PCs, and other digital computing devices.
The expression “input unit” used hereinafter in this specification refers to, but is not limited to, mobile, laptops, computers, PCs, keyboards, mouse, pen drives or drives.
The expression “output unit” used hereinafter in this specification refers to, but is not limited to, an onboard output device, a user interface (UI), a display unit, a local display, a screen, a dashboard, or a visualization platform enabling the user to visualize the graphs provided as output by the system.
The expression “bandit optimization” used hereinafter in this specification refers to, the class of online optimization problems with limited feedback, namely, a decision maker uses only the objective value at the current point to make a new decision and does not have access to the gradient of the objective function.
The expression “Thompson sampler” used hereinafter in this specification refers to, a Bayesian approach to decision-making, particularly in situations involving sequential choices with uncertain outcomes, like the multi-armed bandit problem. It works by maintaining probability distributions representing the potential rewards of each action and then randomly sampling from these distributions to guide which action to take.
The expression “a multi-armed bandit” or “MAB” used hereinafter in this specification refers to a classic problem in probability theory and machine learning that model’s decision-making under uncertainty; it is called a “multi-armed bandit” because it is inspired by a gambler facing multiple slot machines (“one-armed bandits”) and needing to decide which one to play to maximize their total reward. Further, in machine learning & AI, MAB is used in online recommendation systems (e.g., Netflix, Amazon); A/B/n testing and adaptive experimentation; hyperparameter tuning; and adaptive agent workflows where tools/models are selected dynamically.

OBJECTS OF THE INVENTION
The primary object of the invention is to provide a system and method for adaptive tool and model selection in agentic workflows using contextual graph reasoning and evolutionary bandit optimization.
Another object of the invention is to provide a system and method that represents tasks, tools, models, users and organization in contextual graph.
Yet another object of the invention is to provide a system and method that uses graph attention networks (GATs) to compute task-aware embeddings.
Yet another object of the invention is to provide a system and method that applies Thompson sampling in a Bayesian contextual multi-armed bandit setup for candidate selection.
Yet another object of the invention is to provide a system and method that incorporates an evolutionary learning mechanism that prioritizes recent, successful strategies while gradually forgetting outdated or ineffective ones.
Yet another object of the invention is to provide a system and method that learns and evolves optimal tool-model pairs based on performances signals such as latency, accuracy, user overrides and acceptance.

SUMMARY
Before the present invention is described, it is to be understood that the present invention is not limited to specific methodologies and materials described, as these may vary as per the person skilled in the art. It is also to be understood that the terminology used in the description is for the purpose of describing the particular embodiments only and is not intended to limit the scope of the present invention.
The present invention discloses a system and method for adaptive tool and model selection in agentic workflows using contextual graph reasoning and evolutionary bandit optimization; wherein the system comprises an input unit to receive task information, user ID, and Org ID. The processing unit includes a context representation layer (CRL) consisting of nodes for entities like tasks, tools, users, models, organizations; edges capturing interaction outcomes; a graph attention network (GAT) that dynamically weighs contextual relationships; and a context vector as the task context embedder. The selector engine layer (SEL) includes Thompson sampling and contextual multi-armed bandit (MAB) models for optimal tool-model selection. The evolutionary feedback layer (EFL) comprises steps like execution, result storage, reward evaluation, alpha-beta update, and time-decay. The output unit generates the task result.
In an aspect, the method begins with inputting task, user, and organization data, which is structured by the CRL into a graph connecting tasks, tools, models, and users. The GAT extracts task-aware context to generate a context vector. This vector guides the selection of tool-model pairs using a contextual MAB framework, with Thompson Sampling estimating reward distributions. The pair with the highest expected reward is executed via the EFL. The output is evaluated for performance, and feedback is used to update the bandit’s success-failure statistics. A decay mechanism reduces the impact of outdated data while adjusting to user preferences over time, enabling continuous learning and adaptation.
In yet another aspect, the system surpasses traditional rule-based or static decision tree models by integrating dynamic context reasoning, probabilistic decision-making, and real-time learning. It overcomes the limitations of conventional MAB systems that ignore task semantics and user context. By unifying statistical decision theory, contextual graph reasoning, and evolutionary feedback, the system delivers highly adaptive and precise tool-model selection. Its modular, extensible architecture supports diverse applications such as agentic AI orchestration, intelligent developer tools, workflow automation, and personalized enterprise recommendation systems, making it suitable for complex, evolving decision-making environments.

BRIEF DESCRIPTION OF DRAWINGS
A complete understanding of the present invention may be made by reference to the following detailed description which is to be taken in conjugation with the accompanying drawing. The accompanying drawing, which is incorporated into and constitutes a part of the specification, illustrates one or more embodiments of the present invention and, together with the detailed description, it serves to explain the principles and implementations of the invention.
FIG.1. illustrates a schematic representation of the structural and functional components of the system.
FIG.2. illustrates an overall system architecture flowchart.
FIG.3. illustrates a stepwise method for adaptive tool and model selection in agentic workflows, employed by the system.

DETAILED DESCRIPTION OF INVENTION
Before the present invention is described, it is to be understood that this invention is not limited to methodologies described, as these may vary as per the person skilled in the art. It is also to be understood that the terminology used in the description is for the purpose of describing the particular embodiments only and is not intended to limit the scope of the present invention. Throughout this specification, the word “comprise”, or variations such as “comprises” or “comprising”, will be understood to imply the inclusion of a stated element, integer or step, or group of elements, integers or steps, but not the exclusion of any other element, integer or step, or group of elements, integers or steps. The use of the expression “at least” or “at least one” suggests the use of one or more elements or ingredients or quantities, as the use may be in the embodiment of the invention to achieve one or more of the desired objects or results. Various embodiments of the present invention are described below. It is, however, noted that the present invention is not limited to these embodiments, but rather the intention is that modifications that are apparent are also included.
The present invention describes a system and method for adaptive tool and model selection in agentic workflows using contextual graph reasoning and evolutionary bandit optimization. The system (10) comprises of input unit (1); configured to input task information, user identification and Org ID; a processing unit (2) comprising context representation layer CRL (21) representing nodes (21.1), edges (21.2), a user graph attention network (GAT) (21.3) and a context vector (21.4); a selector engine layer SEL (22) comprising a Thompson sampling (22.1) and a multi arm bandit (22.2); an evolutionary feedback layer EFL (23) employing a multi-step workflow; and an output unit (3).
In an embodiment of the invention, an input unit (1) comprises of, but is not limited to task information adding user identification and Org identification ID. Further embodiment of the invention discloses a processing unit (2) with a context representation layer CRL (21) comprising nodes (21.1) that represent entities such as tasks, tools, models, users and organizations; represents different components of the task ecosystem as graph nodes and edges (21.2) that captures interaction outcomes and metadata. Yet another component of the CRL (21) is a user graph attention network (GAT) (21.3) that learns dynamically contextual information of all entities, weigh important relationships and produce context vector C_task (21.4) that is a context embedder.
In a next embodiment of the invention, the selector engine layer SEL (22) uses each tool-model pair which is considered an arm in a contextual multi-armed bandit (22.2) problem; wherein the Thompson sampling (22.1) refers to a statistical method used to estimate the reward distribution of each multi arm bandit (22.2) using historical alpha/beta priors per tool, model, context and sampled θ_i ~ Beta(α_i, β_i) to balance exploration and exploitation, further the decision is made by selecting the pair with highest sampled θ_i using selector tool (22.3). It is further explained as the sampling maintains α/β statistics per tool-model-context, further the user samples θᵢ from Beta (αᵢ, βᵢ) to decide which to use. This strategy balances exploitation of well-performing tools/model and exploration of under-tested or emerging candidates. The reward (R) represents a quantitative score assigned to a tool-model pair after it completes a task; which reflects how well that pair performed — and is used to update the alpha (α) and beta (β) values in the Bayesian multi-armed bandit, guiding future decisions; wherein the reward (R) is calculated using a formula:
R=w1⋅Accuracy+w2⋅(1−LatencyMaxLatency)+w3⋅AcceptanceRate+w4⋅PreferenceScoreR
= w_1 \cdot Accuracy + w_2 \cdot \left(1 - \frac{Latency}{MaxLatency}\right) + w_3 \cdot AcceptanceRate + w_4 \cdot PreferenceScore.
In yet another embodiment of the invention, the “bandit (22.2)” and the “Thompson Sampling (22.1)” refers to two components working together in the selector engine layer (SEL) of the system for adaptive tool-model selection; wherein the bandit (22.2)” refers to a system implementation of a contextual multi-armed bandit (MAB) where each “arm” depicts a possible tool-model pair, has a context that includes the current task, user, organization, preferences; and the MAB manages many such arms, tracks their performance, and chooses among them; such that each tool-model combo tries to prove it is the best for a task. Further, the “Thompson Sampling (22.1)” refers to pre-defined instructions used to decide which arm to pull (i.e., which tool-model pair to run); wherefor each arm (tool-model-context combo), the system maintains a Beta distribution of success probability as: —α (number of good outcomes) and—β (number of bad outcomes); wherefrom this Beta distribution, it samples a possible reward score such as — θ_i \sim \text{Beta}(α_i, β_i); followed by selecting the arm with the highest sampled 𝜃𝑖.
In yet a next embodiment of the invention, the evolutionary feedback layer (EFL) (23) comprises an execute pair (23.1) step that involves executing the selected tool-model pair on the given task to generate a result. This result is then captured in the output result (23.2) phase, which stores the output generated during execution. The evaluate reward (23.3) step follows, assessing the quality, performance, and utility of the output to determine how effective the chosen pair was. Based on this evaluation, the update alpha beta (23.4) step updates the success and failure statistics used for Thompson sampling, thereby improving future pair selection by refining probability estimates. The apply decay (23.5) step is a separate but critical process that gradually reduces the influence of outdated or less relevant data over time. This ensures that the system prioritizes recent and more relevant experiences when making decisions or predictions.
In preferred embodiment of the invention, the system employs a method for adaptive tool and model selection in agentic workflows using contextual graph reasoning and evolutionary bandit optimization as illustrated in FIG.2., comprises the steps of:
- inputting the task info, user ID and organizational identification (Org. ID) using at least one input unit (1),
- receiving the task details along with the user and organization identifiers by the system,
- transforming raw inputs into a structured representation for processing using the context representation layer CRL (21),
- connecting tasks, tools, models, and users in a unified relational graph by the knowledge graph,
- applying attention mechanisms to extract task-aware context from the graph using the graph attention (21.2) network,
- generating a context vector encapsulating relevant task and user information,
- utilizing the context vector to guide the selection of optimal tool-model pairs by the selector engine (22),
- implementing probabilistic exploration and exploitation to balance selection strategy by the Thompson sampling (22.1) bandit (22.2),
- choosing the most promising tool and model combination for the task using the select best tool-model pair (22.3),
- activating the execution engine that is an evolutionary feedback-layer EFL (23),
- executing the selected pair on the given task by the execution result (23.2) that captures and stores the output generated by the execution,
- evaluating the execution output using the evaluation feedback mechanism to assess quality and performance update historical priors thereby updating success and failure statistics based on feedback for future learning,
- applying time decay and preference weight (23.5) reduces influence of older data and adjusts weights to reflect user preferences over time.
According to yet another embodiment, the system and method of the present invention offers substantial advantages such as eliminating reliance on rigid, manually crafted rules and static decision trees. Unlike conventional multi-armed bandit (MAB) systems that overlook task semantics and user-specific contexts, the present system integrates context reasoning, probabilistic inference, and real-time learning within a scalable, extensible framework; using statistical decision theory, graph-based reasoning, and evolutionary learning to enable adaptive, high-precision decision-making. These capabilities make it particularly valuable for applications such as agentic AI orchestration systems, intelligent developer assistants, workflow automation platforms, and enterprise-level personalized recommendation engines. Its modular design and ability to evolve with usage context render it a powerful tool for optimizing complex, dynamic agentic workflows across diverse operational environments.
While considerable emphasis has been placed herein on the specific elements of the preferred embodiment, it will be appreciated that many alterations can be made and that many modifications can be made in preferred embodiment without departing from the principles of the invention. These and other changes in the preferred embodiments of the invention will be apparent to those skilled in the art from the disclosure herein, whereby it is to be distinctly understood that the foregoing descriptive matter is to be interpreted merely as illustrative of the invention and not as a limitation.
, Claims:CLAIMS:
We claim,
1. A system and method for adaptive tool and model selection in agentic workflows using contextual graph reasoning and evolutionary bandit optimization;
wherein the system (10) comprises of input unit (1); a processing unit (2) comprising context representation layer CRL (21) representing nodes (21.1), edges (21.2), a user graph attention network (GAT) (21.3) and a context vector (21.4); a selector engine layer SEL (22) comprising a Thompson sampling (22.1) and a multi arm bandit (22.2); an evolutionary feedback layer EFL (23) employing a multi-step workflow; and an output unit (3);

characterized in that:
the method for adaptive tool and model selection in agentic workflows comprises the steps of;
- inputting the task info, user ID and organizational identification (Org. ID) using at least one input unit (1),
- receiving the task details along with the user and organization identifiers by the system,
- transforming raw inputs into a structured representation for processing using the context representation layer CRL (21),
- connecting tasks, tools, models, and users in a unified relational graph by the knowledge graph,
- applying attention mechanisms to extract task-aware context from the graph using the graph attention (21.2) network,
- generating a context vector encapsulating relevant task and user information,
- utilizing the context vector to guide the selection of optimal tool-model pairs by the selector engine (22),
- implementing probabilistic exploration and exploitation to balance selection strategy by the Thompson sampling (22.1) bandit (22.2),
- choosing the most promising tool and model combination for the task using the select best tool-model pair (22.3),
- activating the execution engine that is an evolutionary feedback-layer EFL (23),
- executing the selected pair on the given task by the execution result (23.2) that captures and stores the output generated by the execution,
- evaluating the execution output using the evaluation feedback mechanism to assess quality and performance update historical priors thereby updating success and failure statistics based on feedback for future learning,
- applying time decay and preference weight (23.5) reduces influence of older data and adjusts weights to reflect user preferences over time.

2. The system and method as claimed in claim 1, wherein the input unit (1) comprises of, but is not limited to task information adding user identification and Org identification ID.

3. The system and method as claimed in claim 1, wherein the context representation layer CRL (21) includes:
- nodes (21.1) that represent entities such as tasks, tools, models, users and organizations;
- edges (21.2) that captures interaction outcomes and metadata;
- a user graph attention network (GAT) (21.3) that learns dynamically contextual information of all entities, weigh important relationships and produce context vector C_task (21.4); and
- the context vector (21,4) that acts as a context embedder.

4. The system and method as claimed in claim 1, wherein the selector engine layer SEL (22) uses each tool-model pair which is considered an arm in a contextual multi-armed bandit (22.2) problem; such that the Thompson sampling (22.1) is used to estimate the reward distribution of each multi arm bandit (22.2) using historical alpha/beta priors per tool, model, context and sampled θ_i ~ Beta(α_i, β_i) to balance exploration and exploitation, further the decision is made by selecting the pair with highest sampled θ_i using selector tool (22.3).
5. The system and method as claimed in claim 1, wherein the reward (R) represents a quantitative score assigned to a tool-model pair after it completes a task; which reflects how well that pair performed — and is used to update the alpha (α) and beta (β) values in the Bayesian multi-armed bandit, guiding future decisions.

6. The system and method as claimed in claim 1, wherein the evolutionary feedback layer (EFL) (23) employs a stepwise workflow comprising:
- the execute pair (23.1) step, involves executing the selected tool-model pair on the given task to generate a result;
- the output result (23.2) phase, captures the results and stores the output generated during execution;
- the evaluate reward (23.3) step, enables assessing the quality, performance, and utility of the output to determine the effectivity of the chosen pair;
- the update alpha beta (23.4) step, updates the success and failure statistics used for Thompson sampling based on the evaluation, thereby improving future pair selection by refining probability estimates;
- the apply decay (23.5) step, gradually reduces the influence of outdated or less relevant data over time, ensuring that the system prioritizes recent and more relevant experiences when making decisions or predictions.
Dated this 17th day of July, 2025.

Documents

Application Documents

#	Name	Date
1	202521068260-STATEMENT OF UNDERTAKING (FORM 3) [17-07-2025(online)].pdf	2025-07-17
2	202521068260-POWER OF AUTHORITY [17-07-2025(online)].pdf	2025-07-17
3	202521068260-FORM 1 [17-07-2025(online)].pdf	2025-07-17
4	202521068260-FIGURE OF ABSTRACT [17-07-2025(online)].pdf	2025-07-17
5	202521068260-DRAWINGS [17-07-2025(online)].pdf	2025-07-17
6	202521068260-DECLARATION OF INVENTORSHIP (FORM 5) [17-07-2025(online)].pdf	2025-07-17
7	202521068260-COMPLETE SPECIFICATION [17-07-2025(online)].pdf	2025-07-17
8	Abstract.jpg	2025-08-04
9	202521068260-FORM-9 [26-09-2025(online)].pdf	2025-09-26
10	202521068260-FORM 18 [01-10-2025(online)].pdf	2025-10-01