System And Method For Model Orchestration In Automated Inferencing

< Back

System And Method For Model Orchestration In Automated Inferencing Workflows

Abstract: The present invention provides a system and method for automated orchestration of machine learning models during inferencing workflows in agentic environments. The system analyzes user intent, constructs a goal-tree, and decomposes the request into intermediate tasks organized as a directed acyclic graph. For each task, a model fitness score is computed using a weighted formula based on performance, cost, latency, benchmark scores, and historical success rates. The top-ranked models are executed in parallel through an agentic execution layer. Their outputs are evaluated by collaborative validation agents using predefined metrics, and the most suitable result is selected based on consensus. A feedback and learning loop logs all orchestration decisions and outcomes, enabling continuous refinement of model selection strategies. The architecture is modular, model-agnostic, and capable of operating across various domains, providing a reliable, adaptive, and high-quality inferencing framework suitable for real-time, multi-agent artificial intelligence workflows.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

17 July 2025

Publication Number

40/2025

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

Persistent Systems

Bhageerath, 402, Senapati Bapat Rd, Shivaji Cooperative Housing Society, Gokhale Nagar, Pune - 411016, Maharashtra, India

Inventors

1. Mr. Nitish Shrivastava

10764 Farallone Dr, Cupertino, CA 95014-4453, United States

2. Mr. Pradeep Kumar Sharma

20200 Lucille Ave Apt 62 Cupertino CA 95014, United States.

Specification

Description:FIELD OF THE INVENTION
The present invention relates to the field of artificial intelligence and agent-based systems. More particularly, it pertains to a system and method for automated orchestration of models during inferencing workflows.
BACKGROUND OF THE INVENTION
As artificial intelligence technologies like large language models (LLMs) become more advanced, many companies and systems are now using automated workflows also known as agentic workflows to handle tasks such as writing content, analyzing data, making decisions, and managing systems. These workflows are designed to work on their own, with little or no human help.
However, as these systems become more widely used, they also deal with important or sensitive information and are expected to give reliable results. The problem is that most existing systems follow a fixed process or use a set path of models. They don’t adjust based on what the user wants, how complex the task is, or which model works best for a specific situation. They also don’t check or compare the model’s answers before using them, which can lead to mistakes or poor results.
Due to these issues, there is a need for a smarter and more flexible system that can understand the user's intent, break the task into smaller steps, and choose the most suitable models for each step based on factors like speed, accuracy, and cost. Such a system should also be able to run multiple models at once, compare their outputs, and select the best result through a validation process.
Prior Art:
For instance, US11900169B1 discloses systems and methods for inference flow orchestration. It presents a dynamic pipeline that routes outputs from one machine learning task to another in response to an analysis request. While this system allows for sequential and interdependent ML task execution, it primarily focuses on managing pre-defined workflows rather than dynamically selecting models based on evolving performance metrics, user intent, or task-specific needs. It lacks a multi-model validation mechanism or a feedback-driven orchestration strategy that adapts over time.
US20230196201A1 describes a system for managing artificial intelligence inference platforms, where orchestrators handle various model executions in response to contextual triggers like sensor data. Although the system includes model execution orchestration, it is limited in terms of generalized task decomposition and does not include a dynamic scoring system for model selection. It also does not propose agent-based validation of multiple outputs or consensus-based decision-making to optimize inference quality, cost, or latency.
WO2023018815A1 outlines methods for artificial intelligence inference orchestration using historical performance data to guide model selection. While the system includes dynamic model routing and performance monitoring, it does not decompose complex user inputs into multiple subtasks, nor does it use a per-task fitness scoring formula that accounts for accuracy, cost, speed, and historical outcomes. It also lacks a structured, agentic execution layer where multiple model outputs are compared and validated collaboratively before reaching a final output.
Although existing systems enable some level of model orchestration or dynamic task routing, they do not offer a comprehensive framework that includes intent recognition, task graph planning, multi-model execution, and collaborative validation across all stages of an inference workflow. Current approaches also miss out on adaptive scoring mechanisms, agent-based output comparison, and reinforcement-driven feedback loops that refine orchestration strategies over time.
DEFINITIONS
The expression “system” used hereinafter in this specification refers to an ecosystem comprising, but is not limited to a system with a user, input and output devices, processing unit, plurality of mobile devices, a mobile device-based application to identify dependencies and relationships between diverse businesses, a visualization platform, and output; and is extended to computing systems like mobile, laptops, computers, PCs, etc.
The expression “input unit” used hereinafter in this specification refers to, but is not limited to, mobile, laptops, computers, PCs, keyboards, mouse, pen drives or drives.
The expression “output unit” used hereinafter in this specification refers to, but is not limited to, an onboard output device, a user interface (UI), a display kit, a local display, a screen, a dashboard, or a visualization platform enabling the user to visualize, observe or analyse any data or scores provided by the system.
The expression “processing unit” refers to, but is not limited to, a processor of at least one computing device that optimizes the system.
The expression “large language model (LLM)” used hereinafter in this specification refers to a type of machine learning model designed for natural language processing tasks such as language generation. LLMs are language models with many parameters, and are trained with self-supervised learning on a vast amount of text.
The expression “agentic workflow”, as used hereinafter in this specification, refers to an autonomous execution process involving one or more artificial intelligence agents, wherein the agents independently perform tasks such as code generation, summarization, decision-making, or orchestration by invoking inference or fine-tuning operations without human intervention.
The expression “intent analyzer”, as used hereinafter in this specification, refers to the system component responsible for parsing the user request using prompt analysis and goal-tree construction to determine the core intent, objective, output type, and applicable success criteria.
The expression “model fitness score”, as used hereinafter in this specification, refers to the weighted score computed for each candidate model per task, based on factors such as performance, cost, speed, historical success rate, and benchmark evaluation scores, and used to rank and select models.
The expression “orchestration decision engine”, as used hereinafter in this specification, refers to the system module that calculates model fitness scores and selects the top candidate models for each task, dynamically adjusting weights based on task-specific requirements such as latency, accuracy, or cost sensitivity.
The expression “agentic execution layer”, as used hereinafter in this specification, refers to the layer of the system wherein selected models are executed through independent agents, each equipped with input preprocessing, model invocation, and output formatting capabilities.
The expression “collaborative validation agents”, as used hereinafter in this specification, refers to the set of agents that compare outputs from multiple candidate models, apply evaluation metrics, and select the most suitable result using consensus rules or fallback strategies.
OBJECTS OF THE INVENTION
The primary object of the present invention is to provide a system for automated orchestration of multiple models during inferencing workflows in agentic environments.
Another object of the invention is to analyze user intent and decompose complex requests into structured intermediate tasks for dynamic model execution.
Yet another object of the invention is to select and prioritize multiple candidate models per task using a weighted fitness scoring formula based on performance, cost, speed, historical accuracy, and benchmark evaluation.
A further object of the invention is to execute selected models through agentic wrappers and validate outputs using collaborative comparison to ensure high-quality and reliable results.
An additional object of the invention is to continuously learn from execution outcomes through a feedback loop, thereby refining future model selection and orchestration strategies.
SUMMARY
Before the present invention is described, it is to be understood that the present invention is not limited to specific methodologies and materials described, as these may vary as per the person skilled in the art. It is also to be understood that the terminology used in the description is for the purpose of describing the particular embodiments only and is not intended to limit the scope of the present invention.
The present invention relates to a system and method for dynamic model orchestration during inferencing requests in agentic workflows. The invention introduces an adaptive orchestration engine that analyzes user intent, decomposes it into intermediate tasks, and selects appropriate models based on a weighted fitness formula using performance, cost, latency, benchmark scores, and historical success rates. Selected models are executed through agentic wrappers, and their outputs are validated using collaborative agents that apply evaluation metrics and decision rules. Unlike traditional fixed pipelines, the system enables real-time, goal-aware model selection and includes a feedback and learning loop that improves orchestration performance over time.
According to an aspect of the invention, the process flow begins when a user submits a request to the input unit of the system. The request is first analyzed by the user intent analyzer, which parses the prompt and identifies the user’s goal, intent, expected output type, and success criteria. This information is passed to the task planner, which converts the goal into a structured set of intermediate tasks arranged as a directed acyclic graph (DAG). Each task is tagged with a required sub-skill, output format, and evaluation metric. For every task, the orchestration decision engine computes a model fitness score for candidate models using weighted parameters such as performance, cost, speed, benchmark scores, and historical success rate. The top three models are selected and executed through the agentic execution layer. Their outputs are passed to the collaborative validation module, where they are evaluated and compared. The best result is selected through consensus, and the final output is delivered via the output unit.
BRIEF DESCRIPTION OF DRAWINGS
A complete understanding of the present invention may be made by reference to the following detailed description which is to be taken in conjugation with the accompanying drawing. The accompanying drawing, which is incorporated into and constitutes a part of the specification, illustrates one or more embodiments of the present invention and, together with the detailed description, it serves to explain the principles and implementations of the invention.
FIG. 1 illustrates the architecture of the system for automated model orchestration during inferencing workflows.
FIG. 2 illustrates the flow diagram representing the stepwise orchestration and validation process.
FIG. 3 illustrates the sequence diagram showing interactions among components during model selection and output validation.

DETAILED DESCRIPTION OF INVENTION:
Before the present invention is described, it is to be understood that this invention is not limited to methodologies described, as these may vary as per the person skilled in the art. It is also to be understood that the terminology used in the description is for the purpose of describing the particular embodiments only and is not intended to limit the scope of the present invention. Throughout this specification, the word “comprise”, or variations such as “comprises” or “comprising”, will be understood to imply the inclusion of a stated element, integer or step, or group of elements, integers or steps, but not the exclusion of any other element, integer or step, or group of elements, integers or steps. The use of the expression “at least” or “at least one” suggests the use of one or more elements or ingredients or quantities, as the use may be in the embodiment of the invention to achieve one or more of the desired objects or results. Various embodiments of the present invention are described below. It is, however, noted that the present invention is not limited to these embodiments, but rather the intention is that modifications that are apparent are also included.
The present invention relates to a system and method for dynamic model orchestration during inferencing requests in agentic workflows. The invention introduces an adaptive orchestration engine that analyzes user intent, decomposes it into intermediate tasks, and selects appropriate models based on a weighted fitness formula using performance, cost, latency, benchmark scores, and historical success rates. Selected models are executed through agentic wrappers, and their outputs are validated using collaborative agents that apply evaluation metrics and decision rules. Unlike traditional fixed pipelines, the system enables real-time, goal-aware model selection and includes a feedback and learning loop that improves orchestration performance over time.
To address these limitations, the present invention introduces a dynamic and goal-aware orchestration system for agentic workflows involving inferencing tasks. The system analyzes each user request by detecting intent, constructing a goal tree, and converting the goal into a sequence of intermediate tasks. For each task, a model selection formula evaluates candidate models based on performance, cost, latency, benchmark scores, and historical success rates. The top-ranked models are executed through agentic wrappers, and their outputs are evaluated by collaborative validation agents. The system selects the best output using consensus rules or fallback logic. All orchestration decisions, model scores, and outcomes are logged, enabling continuous learning through a feedback loop and supporting auditability and system improvement over time.
According to the embodiment of the present invention, the system comprises an input unit, a processing unit, and an output unit, wherein the processing unit further includes a user intent analyzer, a task planner, a model repository and benchmark cluster, an orchestration decision engine, an agentic execution layer, a collaborative validation module, and a feedback and learning loop. The input unit receives user requests that initiate agentic workflows for tasks such as summarization, extraction, generation, or analysis. The user intent analyzer parses the prompt, identifies the goal and objective, and outputs the intent, desired output type, and success criteria. The task planner converts the goal into a set of intermediate tasks organized as a directed acyclic graph (DAG), tagging each task with required sub-skills and evaluation metrics. The orchestration decision engine selects the top candidate models for each task using a dynamic fitness scoring formula based on performance, cost, latency, benchmark scores, and historical success rates. These models are executed through the agentic execution layer, and their outputs are evaluated by collaborative validation agents. The final output is selected through consensus or fallback rules and returned via the output unit. The feedback and learning loop logs model decisions and outcomes, enabling ongoing improvement of orchestration strategies.
The orchestration decision engine computes a model fitness score for each candidate model associated with a given task. This score is based on multiple weighted parameters including model performance, cost, speed (latency), historical success rate, and benchmark evaluation scores. Each task is evaluated using a dynamic weight assignment strategy that adjusts the importance of each factor depending on the nature of the task for example, prioritizing speed for latency-sensitive tasks or accuracy for factual outputs. The top three models with the highest fitness scores are selected for execution. These models are grouped and retrieved from the benchmark cluster, which maintains metadata such as model type, benchmark scores (e.g., ROUGE, BLEU, F1), cost per token, context window constraints, and historical task success rates. The model selection process ensures that the most contextually appropriate and efficient models are chosen dynamically for each step in the workflow.
The collaborative validation module compares the outputs generated by the top three selected models for each task. These outputs are evaluated using predefined metrics such as BLEU, ROUGE, or other task-specific evaluation scores. A decision rule is then applied based on the level of agreement between the outputs. If at least two of the three model outputs reach consensus or meet the evaluation threshold, the result is selected and passed to the next stage in the workflow. If no consensus is reached or the outputs fall below the required quality threshold, the system either escalates the task to a fallback model or re-triggers the orchestration process for that task. This validation mechanism ensures that unreliable or low-quality model outputs are filtered out, contributing to a more accurate and trustworthy inference result.
According to an embodiment of the invention, the process flow begins when a user submits a request to the input unit of the system. The request is first analyzed by the user intent analyzer, which parses the prompt and identifies the user’s goal, intent, expected output type, and success criteria. This information is passed to the task planner, which converts the goal into a structured set of intermediate tasks arranged as a directed acyclic graph (DAG). Each task is tagged with a required sub-skill, output format, and evaluation metric. For every task, the orchestration decision engine computes a model fitness score for candidate models using weighted parameters such as performance, cost, speed, benchmark scores, and historical success rate. The top three models are selected and executed through the agentic execution layer. Their outputs are passed to the collaborative validation module, where they are evaluated and compared. The best result is selected through consensus, and the final output is delivered via the output unit.

According to an aspect of the present invention, the method for dynamically orchestrating models during inferencing requests in agentic workflows, as illustrated in FIG. 1, comprises the following steps:
● Receiving the user request at the input unit (FIG. 1): The process begins when a user initiates a request, such as summarization, content generation, or data analysis. This request is received by the input unit of the system and passed to the processing unit.
● Analyzing intent using the user intent analyzer (FIG. 1): The request is processed by the intent analyzer, which uses prompt parsing and goal-tree construction to identify the user’s core objective, expected output type, and success criteria.
● Planning tasks through the task planner (FIG. 1): The identified goal is decomposed into a sequence of intermediate tasks by the task planner. These tasks are structured as a directed acyclic graph (DAG), where each node is tagged with required sub-skills, evaluation metrics, and expected output formats.
● Selecting models using the orchestration decision engine (FIG. 2): For each task, the orchestration decision engine evaluates candidate models using a weighted fitness score formula. Parameters include performance, cost, speed, benchmark scores, and historical task success rates. The top three models with the highest fitness scores are selected for each task.
● Executing models in the agentic execution layer (FIG. 2): The selected models are executed via the agentic execution layer, where each model is run through its respective agent, including an input preprocessor, model wrapper, and output formatter.
● Validating outputs through collaborative validation agents (FIG. 3): The outputs from the top three models are passed to the collaborative validation agents, which compare the results using predefined evaluation metrics (e.g., BLEU, ROUGE). A decision rule is applied:
If two or more outputs reach consensus or exceed quality thresholds, the best result is selected;
If no valid consensus is reached, the system either escalates to a fallback model or re-orchestrates the task.
● Producing the final output through the output unit (FIG. 1): The selected validated output is forwarded to the output unit and delivered as the final response to the user.
● Learning and logging through the feedback and learning loop (FIG. 2): Each step, including model choices, scores, validation outcomes, and final decisions, is logged. This data is used by the feedback and learning loop to periodically update the orchestration logic using reinforcement learning, enabling improved model selection and task execution over time.
According to an embodiment of the invention, the present invention discloses a method for orchestrating machine learning models in automated inferencing workflows using a dynamic and adaptive multi-model decision system. The method begins by analyzing the user's request through the intent analyzer, which identifies the objective, output type, and success criteria. This analysis is followed by the task planner, which decomposes the goal into a sequence of intermediate tasks arranged in a directed acyclic graph structure. Each task is tagged with required sub-skills, evaluation metrics, and expected output types.
For every task, the orchestration decision engine computes a model fitness score for available models using a weighted formula that considers performance, cost, latency, historical accuracy, and benchmark scores. The top three models with the highest scores are selected for parallel execution. The outputs from these models are processed by the collaborative validation agents, which compare results based on predefined evaluation metrics such as BLEU or ROUGE. Based on this evaluation, the system determines whether the outputs meet the quality threshold for consensus selection. If they do, the best output is selected; if not, the task is either reprocessed with a fallback model or re-orchestrated.
All model selection decisions, scoring data, validation outcomes, and final outputs are recorded by the feedback and learning loop. This enables continuous refinement of orchestration strategies, improving future performance, cost-efficiency, and reliability in agentic workflows over time.
According to the embodiment of the present invention, in the given system and method, every request generated within an agentic workflow is processed by the orchestration system before model execution. Upon receiving a user request, the system first analyzes it using the intent analyzer to extract the underlying goal, output type, and success criteria. The task planner then converts this goal into a sequence of intermediate tasks structured as a directed acyclic graph (DAG). Each task is assigned relevant attributes such as required sub-skills, evaluation metrics, and output format.
The orchestration decision engine evaluates available models for each task using a dynamic model fitness scoring formula. This score considers performance, cost, latency, benchmark results, and historical success rate. The top three models with the highest scores are selected and executed in parallel through the agentic execution layer.
The outputs from these models are assessed by collaborative validation agents. If two or more outputs reach consensus or meet the required evaluation threshold, the best result is selected and passed forward. If consensus is not achieved or the output quality is low, the system either escalates the task to a fallback model or re-initiates orchestration for that step.
All key actions including user intent, model scores, validation results, and final output are logged by the feedback and learning loop. This enables ongoing improvement of orchestration decisions, contributing to higher quality, reliability, and cost-efficiency across dynamic, agentic inferencing workflows.

According to the embodiment of the present invention, the automated orchestration system performs a structured sequence of operations on incoming requests within agentic workflows to ensure that each step is executed by the most suitable models, and the outputs meet defined quality and performance criteria. The orchestration process includes:
1. Intent Analysis: The system begins by parsing the user’s prompt using the intent analyzer, which identifies the core objective, expected output type, and success criteria for the request. This ensures that downstream orchestration aligns with the user’s intended goal.
2. Task Planning: The task planner transforms the identified goal into a series of intermediate tasks arranged in a directed acyclic graph (DAG). Each node in the graph is annotated with task-specific attributes such as required sub-skills, evaluation metrics, and expected output format.
3. Model Fitness Scoring: For each task, the orchestration decision engine calculates a fitness score for available models using a dynamic formula that considers performance metrics, latency, cost, historical success rates, and benchmark scores. The top three models with the highest fitness scores are selected.
4. Agentic Execution: The selected models are executed via the agentic execution layer. Each model runs through a wrapper that handles preprocessing, execution, and output formatting.
5. Collaborative Validation: The outputs from the three models are evaluated by collaborative validation agents. Using predefined evaluation metrics (e.g., ROUGE, BLEU), the system determines if a consensus exists:
If two or more outputs agree or exceed quality thresholds, the best output is selected.
If no consensus is reached, the task is escalated to a fallback model or reprocessed.
6. Decision Logic: The orchestration logic selects the final output for each task based on consensus evaluation and output quality. This step ensures that unreliable or inconsistent results are filtered out before composing the final response.
7. Feedback and Logging: The system logs all orchestration actions including model scores, selection history, validation results, fallback invocations, and final outputs. These logs feed into a feedback and learning loop, which continuously refines the orchestration strategy using historical data and reinforcement learning techniques.
8. This modular and auditable orchestration framework ensures intelligent, high-quality, and adaptive model selection within agentic inferencing workflows.
Advantages:
The disclosed method enables real-time, dynamic orchestration of multiple machine learning models based on user intent and task-specific needs. It ensures accurate and efficient output generation through adaptive model selection using a weighted scoring formula that accounts for performance, cost, speed, and historical success.
By breaking down user goals into intermediate tasks and validating model outputs through collaborative agents, the system improves reliability and minimizes low-quality or inconsistent results. Its modular design allows seamless integration into existing artificial intelligence workflows, with minimal manual intervention.
The feedback loop continuously enhances orchestration performance by learning from past outcomes. As a model-agnostic and scalable framework, it supports diverse deployment environments and is applicable to workflow automation, multi-agent systems, and compliance-sensitive applications.
, Claims:We claim,
1. A system and method for model orchestration in automated inferencing workflows
characterised in that
the system comprises an input unit, a processing unit, and an output unit, wherein the processing unit further includes a user intent analyzer, a task planner, a model repository and benchmark cluster, an orchestration decision engine, an agentic execution layer, a collaborative validation module, and a feedback and learning loop;
the method for model orchestration in automated inferencing workflows comprises the steps of
• receiving the user request including summarization, content generation, or data analysis at the input unit;
• analyzing intent using the user intent analyzer that uses prompt parsing and goal-tree construction to identify the user’s core objective, expected output type, and success criteria;
• planning tasks through the task planner, where each task is structured as a directed acyclic graph, wherein each node is tagged with required sub-skills, evaluation metrics, and expected output formats;
• selecting models for each task using the orchestration decision engine that uses a weighted fitness score formula with parameters that include performance, cost, speed, benchmark scores, and historical task success rates and the top three models with the highest fitness scores are selected for each task;
• executing models in the agentic execution layer where each model is run through its respective agent, including an input preprocessor, model wrapper, and output formatter;
• validating outputs from top three models through collaborative validation agents which compare the results using predefined evaluation metrics and applying decision rules;
• producing the final output through the output unit and delivering as the final response to the user;
• learning and logging through the feedback and learning loop.

2. The system and method as claimed in claim 1, wherein the user intent analyzer parses the prompt, identifies the goal and objective, and outputs the intent, desired output type, and success criteria and the task planner converts the goal into a set of intermediate tasks organized as a directed acyclic graph (DAG), tagging each task with required sub-skills and evaluation metrics.

3. The system and method as claimed in claim 1, wherein the orchestration decision engine selects the top candidate models for each task using a dynamic fitness scoring formula based on performance, cost, latency, benchmark scores, and historical success rates.

4. The system and method as claimed in claim 1, wherein the orchestration decision engine computes a model fitness score for each candidate model associated with a given task and this score is based on multiple weighted parameters including model performance, cost, speed, historical success rate, and benchmark evaluation scores.

5. The system and method as claimed in claim 1, wherein each task is evaluated using a dynamic weight assignment strategy that adjusts the importance of each factor depending on the nature of the task for example, prioritizing speed for latency-sensitive tasks or accuracy for factual outputs and the top three models with the highest fitness scores are selected for execution and these models are grouped and retrieved from the benchmark cluster, which maintains metadata such as model type, benchmark scores, cost per token, context window constraints, and historical task success rates.

6. The system and method as claimed in claim 1, wherein the collaborative validation module compares the outputs generated by the top three selected models for each task and these outputs are evaluated using predefined metrics or other task-specific evaluation scores and the decision rule applied is based on the level of agreement between the outputs such that if at least two of the three model outputs reach consensus or meet the evaluation threshold, the result is selected and passed to the next stage in the workflow and if no consensus is reached or the outputs fall below the required quality threshold, the system either escalates the task to a fallback model or re-triggers the orchestration process for that task.

Documents

Application Documents

#	Name	Date
1	202521068254-STATEMENT OF UNDERTAKING (FORM 3) [17-07-2025(online)].pdf	2025-07-17
2	202521068254-POWER OF AUTHORITY [17-07-2025(online)].pdf	2025-07-17
3	202521068254-FORM 1 [17-07-2025(online)].pdf	2025-07-17
4	202521068254-FIGURE OF ABSTRACT [17-07-2025(online)].pdf	2025-07-17
5	202521068254-DRAWINGS [17-07-2025(online)].pdf	2025-07-17
6	202521068254-DECLARATION OF INVENTORSHIP (FORM 5) [17-07-2025(online)].pdf	2025-07-17
7	202521068254-COMPLETE SPECIFICATION [17-07-2025(online)].pdf	2025-07-17
8	Abstract.jpg	2025-08-02
9	202521068254-FORM-9 [26-09-2025(online)].pdf	2025-09-26
10	202521068254-FORM 18 [01-10-2025(online)].pdf	2025-10-01