Abstract: ABSTRACT PURPOSIVE AUTONOMOUS AGENT FRAMEWORK FOR DECISION DRIVEN TASK ORCHESTRATION Existing agent frameworks define and control agent’s behavior using system prompt which results in significant randomness and inconsistency across different runs, making it unsuitable for enterprise automation. The present disclosure provides a framework which detects intent from received user’s query and captures contexts from query. Recommends one or more tasks based on detected user intent and captured contexts. A vector normalized database is queried based on recommended one or more tasks to obtain one or more candidate domain specific Standard Operating Procedures (SOP’s) comprising associated one or more sub tasks. The system dynamically selects candidate agent and a domain specific SOP amongst one or more candidate domain-specific SOP’s. Assigned sub task is executed by candidate agent and the execution outcome is evaluated. The process is repeated until the best fit agent and best fit domain specific SOP are identified for execution of associated assigned sub tasks. [To be published with FIG. #2]
FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003
COMPLETE SPECIFICATION (See Section 10 and Rule 13)
Title of invention:
PURPOSIVE AUTONOMOUS AGENT FRAMEWORK FOR DECISION DRIVEN TASK ORCHESTRATION
Applicant
Tata Consultancy Services Limited A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th floor,
Nariman point, Mumbai 400021,
Maharashtra, India
Preamble to the description:
The following specification particularly describes the invention and the manner in
which it is to be performed.
TECHNICAL FIELD [001] The disclosure herein generally relates to systems that, in operation, enable application of autonomous agents (AAs) across plurality of problem domains, and, more particularly, to purposive autonomous agent framework for decision driven task orchestration.
BACKGROUND
[002] Autonomous agents (AAs) have gained significant attention in recent years as a promising technology for addressing complex tasks across various problem domains. The autonomous agents (AAs) possess the ability to perceive their environment, make decisions, and take actions autonomously, thereby reducing the need for direct human intervention. However, existing systems that employ autonomous agents face several limitations and challenges that hinder their widespread adoption and effectiveness. For instance, existing autonomous agent frameworks define and control the agent’s behavior only using a system prompt and lets the agent plan and act on its own. This results in significant randomness and inconsistency across different runs, making it unsuitable for enterprise automation and delivering an unsatisfactory user experience.
[003] Use of standard operating procedures (SOPs) is also an existing framework for agent orchestration that has limitation of being static by nature and does not allow for change in SOP tasks based on an outcome from any of the preceding tasks. It also adds to administrative overhead of constant management and update of standard operating procedures (SOPs) across multiple task requirements.
SUMMARY [004] Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one embodiment, a method for purposive autonomous agent framework for decision driven task orchestration is provided. The method includes receiving, via one or
more hardware processors, a business query from a user via one or more artificial intelligence agents; detecting, by an intent detection module executed via the one or more hardware processors, an intent associated with the business query and capturing one or more contexts from one or more associated topics comprised in the received business query; recommending, by a task recommender module executed via the one or more hardware processors, one or more tasks based on the detected user intent and the captured one or more contexts; querying, via one or more hardware processors, a vector normalized database based on the recommended one or more tasks to obtain one or more candidate domain specific Standard Operating Procedures (SOP’s) comprising associated one or more sub tasks; and iteratively perform, via one or more hardware processors - dynamically selecting a candidate agent and at least one domain specific Standard Operating Procedure (SOP) amongst the one or more candidate domain-specific SOP’s; executing at least one assigned sub tasks by the candidate agent; determining an execution outcome of the at least one assigned sub tasks; evaluating the candidate agent and an associated domain specific Standard Operating Procedure (SOP) based on the execution outcome; and until at least one best fit agent and a best fit domain specific Standard Operating Procedure (SOP) are identified for execution of an associated assigned sub task.
[005] In another aspect, there is provided a system for purposive autonomous agent framework for decision driven task orchestration. The system comprises: a memory storing instructions; one or more communication interfaces; and one or more hardware processors coupled to the memory via the one or more communication interfaces, wherein the one or more hardware processors are configured by the instructions to: receive a business query from a user via one or more artificial intelligence agents. The system further includes detecting an intent associated with the business query and capturing one or more contexts from one or more associated topics comprised in the received business query; recommending one or more tasks based on the detected user intent and the captured one or more contexts; querying a vector normalized database based on the recommended one or more tasks to obtain one or more candidate domain specific Standard Operating
Procedures (SOP’s) comprising associated one or more sub tasks; and iteratively perform - dynamically selecting a candidate agent and at least one domain specific Standard Operating Procedure (SOP) amongst the one or more candidate domain-specific SOP’s; executing at least one assigned sub tasks by the candidate agent; determining an execution outcome of the at least one assigned sub tasks; evaluating the candidate agent and an associated domain specific Standard Operating Procedure (SOP) based on the execution outcome; and until at least one best fit agent and a best fit domain specific Standard Operating Procedure (SOP) are identified for execution of an associated assigned sub task.
[006] In yet another aspect, there are provided one or more non-transitory machine-readable information storage mediums comprising one or more instructions which when executed by one or more hardware processors cause receiving a business query from a user via one or more artificial intelligence agents; detecting an intent associated with the business query and capturing one or more contexts from one or more associated topics comprised in the received business query; recommending one or more tasks based on the detected user intent and the captured one or more contexts; querying a vector normalized database based on the recommended one or more tasks to obtain one or more candidate domain specific Standard Operating Procedures (SOP’s) comprising associated one or more sub tasks; and iteratively perform - dynamically selecting a candidate agent and at least one domain specific Standard Operating Procedure (SOP) amongst the one or more candidate domain-specific SOP’s; executing at least one assigned sub tasks by the candidate agent; determining an execution outcome of the at least one assigned sub tasks; evaluating the candidate agent and an associated domain specific Standard Operating Procedure (SOP) based on the execution outcome; and until at least one best fit agent and a best fit domain specific Standard Operating Procedure (SOP) are identified for execution of an associated assigned sub task.
[007] It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS
[008] The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles:
[009] FIG. 1 illustrates an exemplary system for purposive autonomous agent framework for decision driven task orchestration, according to some embodiments of the present disclosure.
[010] FIGS. 2A and 2B illustrate a functional block diagram of the system for purposive autonomous agent framework for decision driven task orchestration, according to some embodiments of the present disclosure.
[011] FIGS. 3A and 3B are flow diagrams illustrating the steps involved in the method for purposive autonomous agent framework for decision driven task orchestration, according to some embodiments of the present disclosure.
[012] FIG. 4 illustrates an interaction model between the components in the autonomous agent framework in conjunction with the method for purposive autonomous agent framework for decision driven task orchestration, according to some embodiments of the present disclosure.
DETAILED DESCRIPTION OF EMBODIMENTS [013] Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments.
[014] Businesses need to constantly decide on the best possible response to changing dynamics. For example, in case of banks/financial institution, a bank manager may be interested in retaining their existing customer base and therefore may ask “In light of the recent bank run events, what is the strategy to be followed to retain the customer base?” Or he/she could seek assistance with “our competitor
bank in mid-west has increased the deposit rate by 0.5%, some of our key customers having held away accounts in this competitor bank, what should be the strategy to retain the customer base?
[015] Such dynamic business asks cannot be met with static standard operating procedures (SOPs). Also relying on current state of art on chain of thought (COT) based agent action flows may lead to differential outcomes even if there is slight change in granularity of the business request. The present disclosure looks at providing a more decision driven flow to provide flexibility in responding to dynamic requests while maintaining a higher level of accuracy in response.
[016] The two business use cases as provided above are different and need different strategies. Per existing agent architecture, there is a need to create two different standard operating procedures (SOPs) or Symbolic Plans to handle these use cases. Each standard operating procedure (SOP) would define various agent states and would define sub goals per agent for each state defined. This would make the management of various use cases complicated and cumbersome with an increasing number of standard operating procedures (SOPs). Such a static approach to standard operating procedures (SOPs) limits the applicability of one or more agents to dynamically changing business needs as above. Further, it is inconceivable for users to think of all possible business situations to create dedicated standard operating procedures (SOPs) for action and then maintain them in order to cater to constant changes in choices that may be needed to get the best results. There can also be situations where certain business validations and eligibility checks will need to be performed that can necessitate a different plan of action than originally defined. Similarly applying Chain-of-Thought (COT) based agent planning may lead to differential outcomes and recommendations for retention.
[017] Also, the agent framework, which has tried to overcome the existing problems of Chain-of-Thought (COT) frameworks, still has one major shortcoming in terms of static standard operating procedure (SOP). Herein the existing problems of Chain-of-Thought (COT) frameworks include a variability in COT based agent response that may provide different business responses even if the invoking event
and intent are the same. As a result, the states defined in the config file (please refer to line 3 of Table.1) lack the dynamism to handle complex and variety of use cases. [018] Here in the below figure (taken from AGENT paper (known in the art - [2309.07870] Agents: An Open-source Framework for Autonomous Language Agents (arxiv.org) ), it is clearly seen that standard operating procedure (SOP) is mere static in nature and is getting loaded from a Json file at line number 3 as depicted in Table. 1. This makes the standard operating procedure (SOP) less flexible and drives the overall maintenance to the higher side as the number of use cases increases.
1 def (main)
2 # agents is a dict of one or multiple agents.
3 agents = Agent.from_config(“ . / config.jason”)
4 sop = SOP.from_config(“ . / config.jason”)
5 environment = Environment.from_config(“ . / config.jason”)
6 run (agents, sop, environment)
Table .1 - Configuration file (Json file)
[019] Static standard operating procedures (SOPs) require huge overhead of maintaining sub tasks and subgoals for each agent states in the Json file. In an ever-changing business context, this translates to frequent updates in the standard operating procedures (SOPs). Also, for each business use case one must create a separate standard operating procedure (SOP) even if there is a slight deviation from the existing use case which is adding unnecessary complexity.
[020] To overcome the above-mentioned drawbacks of existing techniques, embodiments of present disclosure provide a purposive autonomous agent framework for decision driven task orchestration. The present framework detects an intent associated with the business query and captures one or more contexts from one or more associated topics comprised in the received business query. Based on the detected user intent and the captured one or more contexts, one or more tasks are recommended. Further a vector normalized database is queried based on the recommended one or more tasks to obtain one or more candidate domain specific Standard Operating Procedures (SOP’s) comprising associated one
or more sub tasks. A candidate agent and at least one domain specific Standard Operating Procedure (SOP) are dynamically selected amongst the one or more candidate domain-specific SOP’s. Furthermore at least one assigned sub task is executed by the candidate agent and an execution outcome of the at least one assigned sub tasks is determined. Based on the execution outcome, the candidate agent, and the associated domain specific Standard Operating Procedure (SOP) are evaluated, and this process is continued until at least one best fit agent and a best fit domain specific Standard Operating Procedure (SOP) are identified for execution of an associated assigned sub task.
[021] Referring now to the drawings, and more particularly to FIG. 1 through FIG. 4, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments, and these embodiments are described in the context of the following exemplary system and/or method.
[022] FIG. 1 illustrates an exemplary system for purposive autonomous agent framework for decision driven task orchestration, according to some embodiments of the present disclosure. In an embodiment, the system 100 includes or is otherwise in communication with hardware processors 102, at least one memory such as a memory 104, and an I/O interface 112. The hardware processors 102, memory 104, and the Input /Output (I/O) interface 112 may be coupled by a system bus such as a system bus 108 or a similar mechanism. In an embodiment, the hardware processors 102 can be one or more hardware processors.
[023] The I/O interface 112 may include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like. The I/O interface 112 may include a variety of software and hardware interfaces, for example, interfaces for peripheral device(s), such as a keyboard, a mouse, an external memory, a printer and the like. Further, the I/O interface 112 may enable the system 100 to communicate with other devices, such as web servers, and external databases.
[024] The I/O interface 112 can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for
example, local area network (LAN), cable, etc., and wireless networks, such as Wireless LAN (WLAN), cellular, or satellite. For the purpose, the I/O interface 112 may include one or more ports for connecting several computing systems with one another or to another server computer. The I/O interface 112 may include one or more ports for connecting several devices to one another or to another server.
[025] The one or more hardware processors 102 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, node machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the one or more hardware processors 102 is configured to fetch and execute computer-readable instructions stored in memory 104.
[026] The memory 104 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random-access memory (SRAM) and dynamic random-access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. In an embodiment, the memory 104 includes a plurality of modules 106. The memory 104 also includes a data repository (or repository) 110 for storing data processed, received, and generated by the plurality of modules 106.
[027] The plurality of modules 106 includes programs or coded instructions that supplement applications or functions performed by the system 100 for purposive autonomous agent framework for decision driven task orchestration. The plurality of modules 106, amongst other things, can include routines, programs, objects, components, and data structures, which perform particular tasks or implement particular abstract data types. The plurality of modules 106 may also be used as signal processor(s), node machine(s), logic circuitries, and/or any other device or component that manipulates signals based on operational instructions. Further, the plurality of modules 106 can be used by hardware, by computer-readable instructions executed by the one or more hardware processors 102, or by a combination thereof. The plurality of modules 106 can include various sub-modules (not shown). The plurality of modules 106 may include computer-readable
instructions that supplement applications or functions performed by the system 100 for purposive autonomous agent framework for decision driven task orchestration. In an embodiment, the modules 106 include an input module 202, a Machine Learning based intent detection module 204, a Machine Learning based best fit task recommender 206, a retrieval augmented generation (RAG) based subtask planning module 208, a Machine Learning based next-best agent 210, a Multi agent decider module 212, a Multi agent debate module 214, an Outcome evaluation module 216, an agent action module 218, a final output module 220, a Large Language Model (LLM) based response generation module 222, a business response module 224, a business feedback module 226 and a Reinforcement Learning (RL) agent 228. The modules are depicted in FIG. 2.
[028] The data repository (or repository) 110 may include a plurality of abstracted pieces of code for refinement and data that is processed, received, or generated as a result of the execution of the module(s) 106.
[029] Although the data repository 110 is shown internal to the system 100, it will be noted that, in alternate embodiments, the data repository 110 can also be implemented external to the system 100, where the data repository 110 may be stored within a database (repository 110) communicatively coupled to the system 100. The data contained within such an external database may be periodically updated. For example, new data may be added into the database (not shown in FIG. 1) and/or existing data may be modified and/or non-useful data may be deleted from the database. In one example, the data may be stored in an external system, such as a Lightweight Directory Access Protocol (LDAP) directory and a Relational Database Management System (RDBMS).
[030] FIGS. 3A and 3B are flow diagrams illustrating a method for purposive autonomous agent framework for decision driven task orchestration using the systems 100 of FIGS. 1-2, according to some embodiments of the present disclosure. Steps of the method of FIGS. 3A and 3B shall be described in conjunction with the components of FIG. 2. At step 302 of the method 300, the input module 202 executed via one or more hardware processors 102 receives a business query from a user via at least one of one or more artificial intelligence
agents. Herein the business query refers to the user query and one or more artificial intelligence agents refers to a chatbot interface via which the user is entering the user query.
[031] At step 304 of the method 300, the machine learning based intent detection module 204 executed via the one or more hardware processors 102 detects an intent associated with the business query and capturing one or more contexts from one or more associated topics comprised in the received business query. For example - 1: A Relationship Manager of a bank can seek advice from a chatbot assistant by asking the following question. “In light of the recent bank run events, what is the strategy to be followed to retain the customer base?” Taking the above example as a use case, the steps as outlined in the present disclosure are constructed. (here assumption is - this Federal Deposit Insurance Corporation (FDIC)-insured bank automatically covers deposit insurance up to $250000 and any deposit amount over and above $250000 is not insured) Step 1 in conjunction with example – 1: The machine learning based intent detection module 204 detects the intent as customer retention and associated context topic as Deposit Run.
[032] At step 306 of the method 300, the machine learning based best fit task recommender 206 executed via the one or more hardware processors 102 recommends one or more tasks based on the detected user intent and the captured one or more contexts.
Step 2 in conjunction with example – 1: The machine learning based best fit task recommender 206 identifies the task as "High value customer retention against deposit withdrawal."
[033] At step 308 of the method 300, the RAG based subtask planning module 208 executed via the one or more hardware processors 102 queries a vector normalized database based on the recommended one or more tasks to obtain one or more candidate domain specific Standard Operating Procedures (SOP’s) comprising associated one or more sub tasks. Step 3 in conjunction with example – 1: For the best fit task generated Step 2 (of example 1), Knowledge repositories will be searched (Similarity Search) for action
sub task examples for similar task domains represented by the retrieval augmented generation (RAG) based subtask planning module 208). A typical task planning prompt template provided below where additional context found from the knowledge repository will be appended to enrich and would be input to a Large Language Model (LLM) in a retrieval augmented generation (RAG) approach (known in the art). Further, the below example - 2 illustrates domain specific Standard Operating Procedures (SOP’s) for a given task, wherein the domain specific Standard Operating Procedures (SOP’s) are mentioned for each sub task to execute the sub task. Example - 2 {
‘task’: ‘High value customer retention', ‘task parameters’: { ‘parameter_1: ‘deposit withdrawal’ }
‘possible sub_tasks’: ‘[‘get_high_value_customer’, ‘get_next_best_offer’, ‘check_offer eligibility’, ‘generate offer’]’, ‘sub_task_1’: {
‘name’: ‘get_high_value_customer’, ‘sub_task description’: ‘select deposit customers’, ‘context’: ‘deposit_value’, ‘param_1_description’: top N, ‘param_1_value’: 100, ‘sub_task_outcome’: ‘ranked customer list’ } ‘sub_task_2’: {
‘name’: ‘get_next_best_offer’, ‘sub_task description’: ‘offer recommendation’, ‘context’: ‘retention’, ‘param_1_description’: ‘rank’, ‘param_1_value’: 1,
‘sub_task_outcome’: ‘customer-offer list’ }
[034] At step 310 of the method 300, the machine learning based next-best agent 210, the Multi agent decider module 212, a Multi agent debate module 214 executed via the one or more hardware processors 102 iteratively perform the following steps. First, a candidate agent and at least one domain specific Standard Operating Procedure (SOP) are dynamically selected amongst a plurality of agents and the one or more candidate domain-specific SOPs respectively. Example -2 illustrates that the agents are dynamically chosen using the machine learning based next-best agent 210. Further, at least one assigned sub task is executed by the dynamically selected candidate agent.
[035] Step 4 in conjunction with example – 1: Based on the sub tasks identified in step 3, the machine learning based next-best agent 210 identifies highest ranked agents. It is assumed by the system 100 that, for the first subtask ‘get_high_value_customer’, the recommended agent is ‘client-insight’ (the selection of agent may entail multi agent debate as explained above). This agent will invoke the tool to construct (example SQL output from the tool mentioned below) and execute SQL against the DB.
select top 100 deposit_value from customers where deposit_balance >=500000 order by depsosit_value desc
The above-mentioned SQL output would be evaluated (represented by the outcome evaluation module 216) against the expected output (it is expected to fetch hundred customer records as N=100 in the sub task example above). Let us also assume the number of total depositors fetched by executing the above query is eighty, which is less than the expected count. In this event, the deposit value will be recalibrated (by a predefined %, for example let us assume by 5% in this case). Hence the query would be auto corrected and recomputed. This auto correction will continue until it finds the top hundred customers as defined.
Recalibrated query:
select top 100 deposit_value from customers where deposit_balance >=475000 order by depsosit_value desc
[036] Step 5 in conjunction with example – 1: Similarly, using multi agent debate as explained above, best-fit agent ‘offer-oracle’ would be chosen for the next sub task and corresponding recommendation model is invoked by this agent to get the recommended offers for the customers identified in the previous sub task.
[037] As discussed in the previous section, based on the sub tasks identified, the machine learning based next-best agent 210 identifies highest ranked one or more agents. In case of close ranking of the one or more agents, the multi agent debate (represented by the Multi agent debate module 214) is initiated, otherwise only the highest ranked agent (Single Agent) is assigned. In a multi agent debate setting, each agent individually proposes and jointly debates their responses to arrive at a single common answer. Given a query, each agent generates individual candidate answers to a query and reads and critiques the responses of all other agents and uses this content to update its own answer. This step is then repeated over several iterations. This process induces agents to construct answers that are consistent with both their internal critic as well as sensible considering the responses of other agents. The resulting quorum of Large Language models (LLM) underneath the agents which are participating in the multi agent debate also maintain possible answers simultaneously before proposing the final consensus answer.
[038] Example for multi agent debate (known in the art – [2305.14325] Improving Factuality and Reasoning in Language Models through Multiagent Debate (arxiv.org)) - the sub task (next_best_offer) as mentioned above tries to find the next best offer which is recommended to the customer. Let’s assume there are multiple different agents with different underlying Large Language models (LLM) which are fit for this sub task at hand. Let us consider that the machine learning based next-best agent 210 has identified four closely ranked different agents. As a result, the multi agent decider module 212 decided to go with these four agents for this sub task . To illustrate how the multi agents debate works as a part of the multi agent debate module 214, let us assume out of these four agents, three agents, (Agent 1, Agent 2 and Agent 3) are assigned to find the next best offer for the user. Fourth agent (Agent 4) is assigned as the evaluator agent (wherein the
Agent 4 will act as a judge to determine the best agent out of these three agents)
which will consider the final responses of these three agents, evaluate, and declare
the best agent and its response. It is to be noted that multiple rounds of iterations
happen in a typical multi agent debate.
The order and task prompts of agents in the first round of iteration as assigned by
multi agent debate procedure is illustrated below:
Agent1 task prompt: find the next best offer for the customer.
Agent2: task prompt: find the next best offer for the customer.
Agent3: task prompt: find the next best offer for the customer.
The order and task prompt of agents in the second round of iteration are illustrated
below:
Agent1 - task prompt:
Taking into consideration the next best offer generated by other agents as additional
information, the agent1 gives an updated response (As agent1 itself, in the first
iteration has not taken into consideration of other agent’s answer, so agent1 will
revised/update its response based on other agent’s response).
Agent2 (role: Negative) – task prompt:
Taking into consideration the next best offer generated by other agents as additional
information, the Agent2 gives an updated response.
Agent3 (role: Affirmative) – task prompt:
Taking into consideration the next best offer generated by other agents as additional
information, the Agent3 gives an updated response.
Agent 4 (Debate Judge): task prompt:
Evaluate the offers generated by all the agents (3 agents, i.e., agent1, agent2,
agent3), select and declare the best agent (say offer-oracle) and the next best offer
generated by the best agent (offer-oracle in this example). Further in case all agents
generate the same offer, then any of them is picked, if say 2 agents (majority) out
of 3 agents voted the same answer, then one of the majorities voted answer is
picked.
[039] It is to be noted that, as explained in one of the papers for reasoning based calculation (known in the art), the one or more agents typically converge to
a final answer in muti-round iterations. Even if the final answers of these agents don’t merge, then the majority voted answer (using odd number of agents) can be taken as the final consensus answer.
[040] In the above example, it can be seen that, how multi agents were involved, where agent generates individual candidate answers to a query,reads and critiques the responses of all other agents and uses this content to update its own answer and finally best answer was selected.
[041] In an embodiment of the present disclosure, the outcome evaluation module 216 is configured to evaluate an execution outcome of the at least one assigned sub tasks against the expected output. The agent action module 218 is configured to evaluate the candidate agent and associated domain specific Standard Operating Procedure (SOP) based on the execution outcome from the outcome evaluation module 216. The iterative sub steps (310A to 310D) of steps 310 are repeated until at least one best fit agent and a best fit domain specific Standard Operating Procedure (SOP) for execution of an associated assigned sub task. Further, evaluation of the candidate agent and the associated domain specific Standard Operating Procedure (SOP) is based on one or more inputs obtained from the user interacting with at least one of an associated environment. Herein the environment refers to at least one of one or more external surroundings and a context where the one or more artificial intelligence agents are operating. The one or more inputs refers to the feedback from the user. Example for feedback from the user - Once the next best offer is generated, the user can either approve or reject the offer recommended, which gets translated to rewards/penalties.
[042] The final output module 220 is configured to receive the validated response of the outcome evaluation module 216. The Large Language Model (LLM) based response generation module 222 is configured to generate a response to the business query received from the user, using the validated response from the final output module 220. The business response module 224 is configured to provide the response to the business query generated by the LLM based response generation module 222 to the user. The business feedback module 226 is configured
to obtain feedback from the user based on the response provided to the business query.
[043] Step 6 in conjunction with example – 1: The final outcome (represented by the final output module 220) on meeting evaluation gates is then passed to an LLM (LLM based response generation module 222) to generate a response for business user (represented by the business response module 224) based on the offer generated in previous step ( i.e., the offer generated by the selected best agent passes through the outcome evaluation module 216). Below email depicts a probable sample email to a customer offering complementary additional insurance above $250000 covering his entire deposit amount. Dear ,
Congratulations! We are delighted to inform you that you have been selected as a valued customer to receive complementary additional insurance coverage for amounts exceeding $250000 in your deposit account. At [Bank Name (XYZ Bank], we are committed to providing our customers with enhanced financial security and peace of mind. As a token of our appreciation for your continued trust in our services, we are pleased to offer this exclusive benefit. The additional insurance coverage will also safeguard amounts above $250000 in your deposit account, ensuring that your financial assets are well protected. We believe that this enhancement will contribute to your overall banking experience and provide you with an extra layer of confidence in managing your funds with us. If you have any questions or require further details regarding this complementary insurance coverage, please feel free to reach out to our dedicated customer support team at [Customer Support contact details]. Thank you for choosing [ Bank Name]. Best Regards, [ Email Signature]
[044] In an embodiment of the present disclosure, the Reinforcement Learning agent 228 also referred as online learning agent interacts with an environment and human feedback to apply a Ɛ-greedy policy to learn in which
combination of one or more agents and one or more tools work better, based on one or more rewards obtained by the online learning agent rewards. By learning and improving over the wrong outcomes because of incorrect choices eliminates randomness and inconsistency which is prevalent in current agent framework. Such online learning will thus lead to improved choices of the one or more agents and the one or more tools within a recommended task or can also lead to improvement in task recommendation based on the derived intent. Such learning-based task orchestration will lead to increased autonomy in decisions while incorporating human feedback. Herein the terms “human” and “user” can be interchangeably used.
[045] The online learning agent in the environment will be rewarded or penalized based on the human feedback (user) based on agent’s choice of executions of various tools, prompts, other agents. Using off policy learning (known in the art), learning agents can guide other agents to choose correct tools, prompts, other agents for better outcome and accuracy. Once an agent has completed an action, its action state (agent-tool combination) can be recorded in a Q-table along with its reward or penalty value derived from either (1) human feedback or (2) environment._observed() value. As the online learning agent starts interacting with the environment, the online learning agent applies Ɛ-greedy policy to learn which combination of agents and tools work better based on the rewards it obtains. Further the online learning agent employs ‘off-policy’ learning to explore and discover new Standard Operating Procedure (SOP) paths for subtasks to execute.
[046] As already stated above, it receives feedback from the human supervisor (user). The feedback here would be translated in the form of reward/penalties. For example, based on the reward received by the online learning agent, the online learning agent incorporates the feedback for taking a certain action in a specific Standard Operating Procedure (SOP) state. This action may increase the Q-value to encourage similar actions in the future. Over time as the online learning agent accumulates more data and updates the online learning agent q-values based on both environmental response/ environment._observed() value (as a
result of the reward received by executing the action and by observing the next state of the environment) and human feedback converges to an optimal policy that maximizes rewards.
[047] Table. 2 depicts Q-value table based on the two agents (a1, a2) and two tools (t1, t2). Agents can choose from a choice of tools that can generate the best reward expressed as q values using a Q-Function and the tool with highest q value is selected.
SOP states Outcome_1 Outcome_2
(a1-t1, a2-t2) q1 q2
(a1-t2, a2-t1) q3 q4
Table. 2 – Q value table
[048] Step 7 in conjunction with example – 1: Based on the outcome at each step (Step 1 to Step 6), the user can provide feedback, which is then passed to a Q Learning agent (Reinforcement Learning agent 228) for learning incorporation in sub-task and agent selection. This allows for multi-level learning either at highest outcome or at granular agent and tool selection level providing inputs to agents for course correction based on constant learning resulting in better accuracy.
[049] The example- 1 thus helps to outline how a user intent can guide composition of tasks and sub tasks for actions to be performed by an agent or group of agents. The retrieval augmented generation (RAG) based subtask planning module 208, Machine Learning based next-best agent 210 and the Outcome evaluation module 216 iterate based on the output from the previous step and also has ability to course correct in future (if needed) based on the human feedback received. The introduction of this decision-based plan using ML-LLM (Machine Learning - Large Language Model) orchestration (as explained above) offers the opportunity to provide fine-grained control of an agent’s behavior while allowing for decision intelligence to be embedded to avoid static Standard Operating Procedures (SOP’s).
[050] The realization of use cases using this generic framework is vast. The framework of the present disclosure can be potentially applied in multi-domain to realize domain specific use cases. Again, another use case from the banking domain is provided below for better understanding. Example - 3, a Relationship Manager of a bank can seek advice from a chatbot assistant by asking the following questions. “In light of the recent bank run events, what is the strategy to be followed to retain the customer base?” Or he could seek assistance with “our competitor bank in mid¬west has increased the deposit rate by 0.5%, some of our key customers having held away accounts in this competitor bank, what should be the strategy to retain the customer base? “
[051] Essentially the chat bot easily identifies here the impacted customer base based on the event types and can determine next best action and therefore generating suitable personalized recommendations (ex. rate adjustments for eligible customers) as explained under the ‘business problem’ section above. As a result, the customer churn can be minimized by targeting these customers with personalized offers and recommendations.
[052] The examples (example- 1 and example – 3) can be more accurately crafted and better managed & maintained with the agent framework as base and with the dynamic decision-based Standard Operating Procedures (SOP’s) and the learning agent introduced as per the proposed invention model.
[053] FIG. 4 illustrates an interaction model between the components in the autonomous agent framework in conjunction with the method for purposive autonomous agent framework for decision driven task orchestration, according to some embodiments of the present disclosure.
[054] The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do
not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
[055] Existing agent frameworks define and control agent’s behavior using system prompt which results in significant randomness and inconsistency across different runs, making it unsuitable for enterprise automation. The embodiments of present disclosure provide a purposive autonomous agent framework for decision driven task orchestration. Moreover, the embodiments herein further provide an autonomous framework which proposes a decision-driven-SOPs using a combination of ML models, business rules and datasets to bring intelligence and fine-grained control in the task orchestration process.
[056] It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g., any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g., hardware means like e.g., an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g., an ASIC and an FPGA, or at least one microprocessor and at least one memory with software processing components located therein. Thus, the means can include both hardware means and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g., using a plurality of CPUs.
[057] The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various components described herein may be implemented in other components or combinations of other components. For the purposes of this description, a
computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
[058] The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
[059] Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory,
nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
[060] It is intended that the disclosure and examples be considered as exemplary only, with a true scope of disclosed embodiments being indicated by the following claims.
We Claim:
1. A processor implemented method (300), comprising:
receiving (302), via one or more hardware processors, a business query from a user via one or more artificial intelligence agents;
detecting (304), by an intent detection module executed via the one or more hardware processors, an intent associated with the business query and capturing one or more contexts from one or more associated topics comprised in the received business query;
recommending (306), by a task recommender module executed via the one or more hardware processors, one or more tasks based on the detected user intent and the captured one or more contexts;
querying (308), via the one or more hardware processors, a vector normalized database based on the recommended one or more tasks to obtain one or more candidate domain specific Standard Operating Procedures (SOP’s) comprising associated one or more sub tasks; and
iteratively perform (310), via the one or more hardware processors -dynamically selecting (310A) a candidate agent and at least one
domain specific Standard Operating Procedure (SOP) amongst the one or
more candidate domain specific SOP’s;
executing (310B) at least one assigned sub tasks by the candidate
agent;
determining (310C) an execution outcome of the at least one
assigned sub tasks;
evaluating (310D) the candidate agent and an associated domain
specific Standard Operating Procedure (SOP) based on the execution
outcome; and
until at least one best fit agent and a best fit domain specific Standard
Operating Procedure (SOP) are identified for execution of an associated
assigned sub task (310E).
2. The processor implemented method as claimed in claim 1, wherein the step of evaluation of the candidate agent and the associated domain specific Standard Operating Procedure (SOP) is further based on one or more inputs obtained from the user interacting with at least one of an associated environment.
3. A system (100), comprising:
a memory (104) storing instructions;
one or more communication interfaces (112); and
one or more hardware processors (102) coupled to the memory (104) via the one or more communication interfaces (112), wherein the one or more hardware processors (102) are configured by the instructions to:
receive a business query from a user via at least one of one or more artificial intelligence agents;
detect an intent associated with the business query and capturing one or more contexts from one or more associated topics comprised in the received business query;
recommend one or more tasks based on the detected user intent and the captured one or more contexts;
query a vector normalized database based on the recommended one or more tasks to obtain one or more candidate domain specific Standard Operating Procedures (SOP’s) comprising associated one or more sub tasks; and
iteratively perform:
dynamically selecting a candidate agent and at least one domain
specific Standard Operating Procedure (SOP) amongst the one or more
candidate domain-specific SOP’s;
executing at least one assigned sub tasks by the candidate agent; determining an execution outcome of the at least one assigned sub
tasks;
evaluating the candidate agent and an associated domain specific
Standard Operating Procedure (SOP) based on the execution outcome; and
until at least one best fit agent and a best fit domain specific Standard Operating Procedure (SOP) are identified for execution of an associated assigned sub task (310E).
4. The system as claimed in claim 3, wherein the step of evaluation of the candidate agent and the associated domain specific Standard Operating Procedure (SOP) is further based on one or more inputs obtained from the user interacting with at least one of an associated environment.
| # | Name | Date |
|---|---|---|
| 1 | 202421024887-STATEMENT OF UNDERTAKING (FORM 3) [27-03-2024(online)].pdf | 2024-03-27 |
| 2 | 202421024887-REQUEST FOR EXAMINATION (FORM-18) [27-03-2024(online)].pdf | 2024-03-27 |
| 3 | 202421024887-FORM 18 [27-03-2024(online)].pdf | 2024-03-27 |
| 4 | 202421024887-FORM 1 [27-03-2024(online)].pdf | 2024-03-27 |
| 5 | 202421024887-FIGURE OF ABSTRACT [27-03-2024(online)].pdf | 2024-03-27 |
| 6 | 202421024887-DRAWINGS [27-03-2024(online)].pdf | 2024-03-27 |
| 7 | 202421024887-DECLARATION OF INVENTORSHIP (FORM 5) [27-03-2024(online)].pdf | 2024-03-27 |
| 8 | 202421024887-COMPLETE SPECIFICATION [27-03-2024(online)].pdf | 2024-03-27 |
| 9 | 202421024887-FORM-26 [08-05-2024(online)].pdf | 2024-05-08 |
| 10 | Abstract1.jpg | 2024-05-22 |
| 11 | 202421024887-Proof of Right [13-06-2024(online)].pdf | 2024-06-13 |
| 12 | 202421024887-FORM-26 [22-05-2025(online)].pdf | 2025-05-22 |