Sign In to Follow Application
View All Documents & Correspondence

Method And System For Agent Based Generative Artificial Intelligence (Genai) Model Selection

Abstract: This disclosure relates to method (300) and system (100) for agent-based Generative Artificial (GenAI) model selection. The method (300) includes retrieving (302) model metadata (216) of each of a plurality of GenAI models for a GenAI agent (214) based on benchmark details of each of a plurality of benchmark datasets. The method (300) may include selecting (304) set of GenAI models from plurality of GenAI models corresponding to GenAI agent (214) based on predefined selection criteria through a semantic analysis. For each GenAI model of set of GenAI models, method (300) may include generating (310), via GenAI model, response corresponding to prompt of GenAI agent (214), and computing (312), via the GenAI model, performance score of response based on set of evaluation metrics using judge prompt. The method (300) may include selecting (314) one or more of set of GenAI models to respond to prompts received through GenAI agent (214). [To be published with FIG. 6]

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
11 July 2025
Publication Number
31/2025
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
Parent Application

Applicants

HCL Technologies Limited
806, Siddharth, 96, Nehru Place, New Delhi, 110019, India

Inventors

1. Navin Sabharwal
N 3A Jangpura Extension, New Delhi, 110014, India
2. Atul Singh
C406 SJR Brooklyn, ITPL Main Road, Bengaluru, Karnataka, 560037, India
3. Arvind Maurya
B1/212, Tower-9, Silvercity-2, Sec Pi-2, Greater Noida, Uttar Pradesh, 201310, India
4. Rupesh Prasad
1259/86, Shanti Nagar, Tri Nagar, Near Axis Bank, New Delhi, 110035, India
5. Mohammad Faiyaz
72/92-A, Gandhi Road, Devakottai, Sivagangai, Tamil Nadu, 630302, India

Specification

Description:DESCRIPTION
Technical Field
[001] This disclosure relates generally to agentic Artificial Intelligence (AI), and more particularly to method and system for agent-based Generative AI (GenAI) model selection.
Background
[002] Agentic systems, powered by advancements in Generative Artificial Intelligence (GenAI) models (for example, Large language Models (LLMs)), are at a forefront of software innovations. The agentic systems may include a plurality of applications (i.e., agents). The plurality of applications may autonomously execute tasks to achieve objectives without direct human intervention. The plurality of applications may utilize available tools to interact with the environment and accordingly carry out functions. Agents may be classified as either GenAI-based agents or non-GenAI based agents. GenAI-based agents may be built on GenAI models.
[003] Selection of an LLM used to built the GenAI-based agent may impact operational cost and performance of the agent. By way of an example, a first agent may be tasked to generate a Python code and a second agent may be tasked to create a service documentation. A single LLM may potentially handle tasks of both the first agent and the second agent. However, the first agent and the second agent may be ineffective if the LLM may lack training to generate the Python code.
[004] In the present state of art, agents may be configured to use a static GenAI model. The GenAI model may be selected by developer/designer of the agent at design time, based on their experience and knowledge. However, in such scenarios, the agents may rely on potentially outdated or inaccurate information when the GenAI model gets outdated. Moreover, the predetermined and fixed structure of the connections between the agents and the GenAI models decided during the design phase can lead to the selection of less than ideal GenAI models when better GenAI models are introduced for the tasks of the agents.
[005] Thus, the present invention is directed to overcome one or more limitations stated above or any other limitations associated with the known arts.
SUMMARY
[006] In one embodiment, a method for agent-based Generative Artificial (GenAI) model selection is disclosed. In one example, the method may include retrieving model metadata of each of a plurality of GenAI models for a GenAI agent based on benchmark details of each of a plurality of benchmark datasets. The GenAI agent corresponds to a prompt. The method may further include selecting a set of GenAI models from the plurality of GenAI models corresponding to the GenAI agent based on predefined selection criteria through a semantic analysis. The predefined selection criteria is based on the benchmark details, the model metadata, model availability, and user preferences. For each GenAI model of the set of GenAI models, the method may further include generating, via the GenAI model, a response corresponding to the prompt of the GenAI agent. The method may further include computing, via the GenAI model, a performance score of the response based on a set of evaluation metrics using a judge prompt. The judge prompt may include the response and a set of evaluation instructions. The method may further include selecting one or more of the set of GenAI models to respond to prompts received through the GenAI agent.
[007] In one embodiment, a system for agent-based GenAI model selection is disclosed. In one example, the system may include a processor and a computer-readable medium communicatively coupled to the processor. In one example, the computer-readable medium may store processor-executable instructions, which, on execution, may cause the processor to retrieve model metadata of each of a plurality of GenAI models for a GenAI agent based on benchmark details of each of a plurality of benchmark datasets. The GenAI agent corresponds to a prompt. The processor-executable instructions, on execution, may further cause the processor to select a set of GenAI models from the plurality of GenAI models corresponding to the GenAI agent based on predefined selection criteria through a semantic analysis. The predefined selection criteria may be based on the benchmark details, the model metadata, model availability, and user preferences. For each GenAI model of the set of GenAI models, the processor-executable instructions, on execution, may further cause the processor to generate, via the GenAI model, a response corresponding to the prompt of the GenAI agent. The processor-executable instructions, on execution, may further cause the processor to compute, via the GenAI model, a performance score of the response based on a set of evaluation metrics using a judge prompt. The judge prompt may include the response and a set of evaluation instructions. The processor-executable instructions, on execution, may further cause the processor to select one or more of the set of GenAI models to respond to prompts received through the GenAI agent.
[008] It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS
[009] The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles.
[010] FIG. 1 is a block diagram of an exemplary system for agent-based Generative Artificial (GenAI) model selection, in accordance with some embodiments.
[011] FIG. 2 illustrates a functional block diagram of a system for agent-based GenAI model selection, in accordance with some embodiments.
[012] FIG. 3 illustrates a flow diagram of an exemplary process for agent-based GenAI model selection, in accordance with some embodiments.
[013] FIG. 4 illustrates a flow diagram of an exemplary process for retrieving the model metadata of the plurality of GenAI models for the GenAI agent, in accordance with some embodiments.
[014] FIG. 5 illustrates a flow diagram for performing a Parameter Efficient Fine-Tuning (PEFT) on a first GenAI model of a GenAI model pair using a second GenAI model of the GenAI model pair, in accordance with some embodiments.
[015] FIG. 6 illustrates a flow chart of a detailed exemplary process for agent-based GenAI model selection, in accordance with an embodiment.
[016] FIG. 7A and FIG. 7B illustrate a database schema and a corresponding exemplary dataset table, in accordance with an embodiment.
[017] FIG. 8A and FIG. 8B illustrate a database schema and a corresponding exemplary benchmark table, in accordance with an embodiment.
[018] FIG. 9 illustrates an exemplary interface to configure the model invoker and profiler, in accordance with an embodiment.
[019] FIG. 10 is a block diagram of an exemplary computer system for implementing embodiments consistent with the present disclosure.
DETAILED DESCRIPTION
[020] Exemplary embodiments are described with reference to the accompanying drawings. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the spirit and scope of the disclosed embodiments. It is intended that the following detailed description be considered as exemplary only, with the true scope and spirit being indicated by the following claims.
[021] Referring now to FIG. 1, an exemplary system 100 for agent-based Generative Artificial Intelligence (GenAI) model selection is illustrated, in accordance with some embodiments. The system 100 may include a computing device 102 (for example, a server, a desktop, a laptop, a notebook, a netbook, a tablet, a smartphone, a mobile phone, or any other computing device), in accordance with some embodiments. The computing device 102 may select an optimal GenAI model for a given agent.
[022] As will be described in greater detail in conjunction with FIGS. 2 – 10, the computing device 102 may retrieve model metadata of each of a plurality of GenAI models for a GenAI agent based on benchmark details of each of a plurality of benchmark datasets. The GenAI agent may correspond to a prompt. The computing device 102 may further select a set of GenAI models from the plurality of GenAI models corresponding to the GenAI agent based on predefined selection criteria through a semantic analysis. The predefined selection criteria are based on the benchmark details, the model metadata, model availability, and user preferences. For each GenAI model of the set of GenAI models, the computing device 102 may further generate, via the GenAI model, a response corresponding to the prompt of the GenAI agent. The computing device 102 may further compute, via the GenAI model, a performance score of the response based on a set of evaluation metrics using a judge prompt. The judge prompt may include the response and a set of evaluation instructions. The computing device 102 may further select one or more of the set of GenAI models to respond to prompts received through the GenAI agent.
[023] In some embodiments, the computing device 102 may include one or more processors 104 and a memory 106. The memory 106 may include a cache memory. The memory 106 may store instructions that, when executed by the one or more processors 104, may cause the one or more processors 104 to select agent-based GenAI model, in accordance with aspects of the present disclosure. The memory 106 may also store various data (for example, benchmark details, a plurality of benchmark datasets, a model metadata, user preferences, and the like) that may be captured, processed, and/or required by the system 100.
[024] The system 100 may further include a display 108. The system 100 may interact with a user via a user interface 110 accessible via the display 108. The system 100 may also include one or more external devices 112. In some embodiments, the computing device 102 may interact with the one or more external devices 112 over a communication network 114 for sending or receiving various data. The external devices 112 may include, but may not be limited to, a remote server, a digital device, or another computing system.
[025] Referring now to FIG. 2, a functional block diagram of a system 200 for agent-based GenAI model selection is illustrated, in accordance with some embodiments. FIG. 2 is explained in conjunction with FIG. 1. The system 200 may be analogous to the system 100. The system 200 may include, within the memory 106, a model metadata retrieving 202, a GenAI model selecting module 204, a response generating module 206, a performance score computing module 208, a model routing module 210, and a training module 212.
[026] Initially, the model metadata retrieving module 202 may receive a GenAI agent 214 for which optimal GenAI models are to be identified. In an embodiment, the GenAI agent 214 may correspond to a prompt configured for a predefined task. The model metadata retrieving module 202 may then retrieve a model metadata 216 of each of a plurality of GenAI models 218 for the GenAI agent 214 based on benchmark details of each of a plurality of benchmark datasets. To retrieve the model metadata 216 of the plurality of GenAI models 218, the model metadata retrieving module 202 may extract dataset details of each of the plurality of benchmark datasets through a web scraping technique. By way of an example, the dataset details may include a plurality of data types (for example, int, varchar(255), text, timestamp, etc.), a column name, and a key corresponding to each of the plurality of datatypes. The web scraping technique may be used to extract data from a plurality of websites. By way of an example, the web scraping technique may be a text pattern matching, a Hyper Text Transfer Protocol (HTTP) programming, a Hyper Text Markup Language (HTML) parsing, a Document Object Model (DOM) parsing, a vertical aggregation, a semantic annotation recognizing, a computer vision web-page analysis, an Artificial Intelligence (AI) powered document understanding, or the like.
[027] Further, the model metadata retrieving module 202 may generate, via a description expansion GenAI model, a detailed dataset description of the dataset details. Further, the model metadata retrieving module 202 may store the dataset details and the corresponding detailed dataset description in a dataset table (for example, a dataset_table (explained in greater detail in conjunction with FIGs. 7A and 7B)). The dataset table may include the dataset details in a tabular form. Additionally, the model metadata retrieving module 202 may extract the benchmark details associated with each of the plurality of benchmark datasets in the dataset table. The benchmark details may include a plurality of datatypes (for example, int, varchar(255), float, Boolean, varchar(100), decimal(10.2), etc.), the column name, and the key corresponding to each of the plurality of datatypes.
[028] Further, the model metadata retrieving module 202 may store the benchmark details in a benchmark table (for example, benchmark_table (explained in conjunction with FIGs. 8A and 8B). The benchmark table may include the benchmark details in the tabular form. Further, the model metadata retrieving module 202 may select a first list of GenAI models corresponding to each of the plurality of benchmark datasets based on an associated benchmark rank. By way of an example, the first list of GenAI models may include GenAI model 1, GenAI model 2, GenAI model 3, up to GenAI model n. Finally, the model metadata retrieving module 202 may retrieve the model metadata 216 of the plurality of GenAI models 218. The plurality of GenAI models 218 may include the first list of GenAI models corresponding to each of the plurality of benchmark datasets.
[029] Further, the GenAI model selecting module 204 may select a set of GenAI models from the plurality of GenAI models 218 corresponding to the GenAI agent 214 based on predefined selection criteria through a semantic analysis. The predefined selection criteria may be based on the benchmark details, the model metadata 216, model availability, and user preferences. To select a set of GenAI models from the plurality of GenAI models 218 corresponding to the GenAI agent 214, the GenAI model selecting module 204 may identify one or more of the plurality of benchmark datasets based on a semantic analysis between the prompt of the GenAI agent 214 and information corresponding to each of the plurality of benchmark datasets in the dataset table. For each of the one or more of the plurality of benchmark datasets, the GenAI model selecting module 204 may select a second list of GenAI models from the first list of GenAI models based on the associated benchmark rank, the user preferences, and the model availability, to obtain the set of GenAI models. It should be noted that, the second list of GenAI model may include the GenAI models less than the GenAI models in the first list of GenAI models. In continuation with above example, the second list of GenAI models may include GenAI model 1, GenAI model 2, GenAI model 3, up to GenAI model m, where (m < n).
[030] Further, for each GenAI model of the set of GenAI models, the response generating module 206 may generate, via the GenAI model, a response corresponding to the prompt to the GenAI agent 214. Further, the performance score computing module 208 may compute, via the GenAI model, a performance score of the response based on a set of evaluation metrics using a judge prompt. The judge prompt may include the response and a set of evaluation instructions.
[031] Finally, the model routing module 210 may select one or more of the set of GenAI models (i.e., selected GenAI models 220) to respond to prompts received through the GenAI agent 214. The model routing model 210 may select the selected GenAI models 220 based on a plurality of predefined policies. The predefined policies may include, but may not be limited to, a round robin policy, a performance optimal policy, a quality optimal policy, and a cost optimal policy.
[032] Upon selecting the selected GenAI models 220, the model routing model 210 may send the set of GenAI models to the training module 212. Further, the training module 212 may identify one or more GenAI model pairs from the set of GenAI models based on the performance score and a size of each of the set of GenAI models. To identify the one or more GenAI model pairs, the training module 212 may calculate an average performance score based on the performance score of each of the set of GenAI models selected for the GenAI agent 214. For each of the set of GenAI models, the training module 212 may then compare the performance score with the average performance score. Further, the training module 212 may identify the one or more GenAI model pairs based on the comparing and the size of each of the set of GenAI models. Further, for each of the one or more GenAI model pairs, the training module 212 may perform a Parameter Efficient Fine-Tuning (PEFT) (for example, a Low-Rank Adpatation (LoRA), a prompt tuning, a prefix tuning, an adapters, a BitFit, a Quantized LoRA (QLoRa), Hybrid Methods, etc.) on a first GenAI model of a GenAI model pair using a second GenAI model of the GenAI model pair. This is explained in greater detail in conjunction with FIG. 6.
[033] It should be noted that all such aforementioned modules 202 – 212 may be represented as a single module or a combination of different modules. Further, as will be appreciated by those skilled in the art, each of the modules 202 – 212 may reside, in whole or in parts, on one device or multiple devices in communication with each other. In some embodiments, each of the modules 202 – 212 may be implemented as dedicated hardware circuit comprising custom application-specific integrated circuit (ASIC) or gate arrays, off-the-shelf semiconductors such as logic chips, transistors, or other discrete components. Each of the modules 202 – 212 may also be implemented in a programmable hardware device such as a field programmable gate array (FPGA), programmable array logic, programmable logic device, and so forth. Alternatively, each of the modules 202 – 212 may be implemented in software for execution by various types of processors (e.g., processor 104). An identified module of executable code may, for instance, include one or more physical or logical blocks of computer instructions, which may, for instance, be organized as an object, procedure, function, or other construct. Nevertheless, the executables of an identified module or component need not be physically located together, but may include disparate instructions stored in different locations which, when joined logically together, include the module and achieve the stated purpose of the module. Indeed, a module of executable code could be a single instruction, or many instructions, and may even be distributed over several different code segments, among different applications, and across several memory devices.
[034] As will be appreciated by one skilled in the art, a variety of processes may be employed for agent-based GenAI model selection. For example, the exemplary system 100 and the associated computing device 102 may include agent-based GenAI model selection by the processes discussed herein. In particular, as will be appreciated by those of ordinary skill in the art, control logic and/or automated routines for performing the techniques and steps described herein may be implemented by the system 100 and the associated computing device 102 either by hardware, software, or combinations of hardware and software. For example, suitable code may be accessed and executed by the one or more processors on the system 100 to perform some or all of the techniques described herein. Similarly, application specific integrated circuits (ASICs) configured to perform some or all of the processes described herein may be included in the one or more processors on the system 100.
[035] Referring now to FIG. 3, an exemplary process 300 agent-based GenAI model selection is depicted via a flowchart, in accordance with some embodiments. FIG. 3 is explained in conjunction with FIGS. 1 and 2. The process 300 may be implemented by the computing device 102 of the system 100. The process 300 may include retrieving, by a model metadata retrieving module (for example, the model metadata retrieving module 202), model metadata (for example, the model metadata 216) of each of a plurality of GenAI models (for example, the plurality of GenAI models 218) for a GenAI agent (for example, the GenAI agent 214) based on benchmark details of each of a plurality of benchmark datasets, at step 302. The GenAI agent may correspond to a prompt.
[036] Further, the process 300 may include selecting, by a GenAI model selecting module (for example, the GenAI model selecting module 204), a set of GenAI models from the plurality of GenAI models corresponding to the GenAI agent based on predefined selection criteria through a semantic analysis, at step 304. The predefined selection criteria may be based on the benchmark details, the model metadata, model availability, and user preferences. The step 304 may include step 306 and step 308. To select a set of GenAI models from the plurality of GenAI models corresponding to the GenAI agent, the process 300 may include identifying, by the GenAI model selecting module, one or more of the plurality of benchmark datasets based on a semantic analysis between the prompt of the GenAI agent and information corresponding to each of the plurality of benchmark datasets in the dataset table, at step 306. Further, for each of the one or more of the plurality of benchmark datasets, the process 300 may include selecting, by the GenAI model selecting module, a second list of GenAI models from the first list of GenAI models based on the associated benchmark rank, the user preferences, and the model availability, to obtain the set of GenAI models.
[037] Further, for each GenAI model of the set of GenAI models, the process 300 may include generating, by a response generating module (for example, the response generating module 206) via the GenAI model, a response corresponding to the prompt of the GenAI agent, at step 310. Further, the process 300 may include computing, by a performance score computing module (for example, the performance score computing module 208) via the GenAI model, a performance score of the response based on a set of evaluation metrics using a judge prompt, at step 312. The judge prompt may include the response and a set of evaluation instructions. Further, the process 300 may include selecting, by a model routing module (for example, the model routing model 210), one or more of the set of GenAI models (for example, the selected GenAI models 220) to respond to prompts received through the GenAI agent, at step 314.
[038] Referring now to FIG. 4, a flow diagram of an exemplary process 400 for retrieving the model metadata of the plurality of GenAI models for the GenAI agent is illustrated, in accordance with an embodiment. FIG. 4 is explained in conjunction with FIGS. 1, 2, and 3. The process 400 may be implemented by the computing device 102 of the system 100. The process 400 may include retrieving, by a model metadata retrieving module (for example, the model metadata retrieving module 202), model metadata (for example, the model metadata 216) of each of a plurality of GenAI models (for example, the plurality of GenAI models 218) for a GenAI agent (for example, the GenAI agent 214) based on benchmark details of each of a plurality of benchmark datasets, at step 302. The step 302 may include step 402, step 404, step 406, step 408, step 410, step 412, and step 414.
[039] To retrieve the model metadata of the plurality of GenAI models for the GenAI agent, the process 400 may include extracting, by the model metadata retrieving module, dataset details of each of the plurality of benchmark datasets through a web scraping technique, at step 402. Further, the process 400 may include generating, by the model metadata retrieving module via a description expansion GenAI model, a detailed dataset description of the dataset details, at step 404. Further, the process 400 may include storing, by the model metadata retrieving module, the dataset details and the corresponding detailed dataset description in a dataset table, at step 406.
[040] Simultaneously, the process 400 may include extracting, by the model metadata retrieving module, the benchmark details associated with each of the plurality of benchmark datasets in the dataset table, at step 408. Further, the process 400 may include storing, by the model metadata retrieving module, the benchmark details in a benchmark table, at step 410. Further, the process 400 may include selecting, by the model metadata retrieving module, a first list of GenAI models corresponding to each of the plurality of benchmark datasets based on an associated benchmark rank, at step 412. Further, the process 400 may include retrieving, by the model metadata retrieving module, the model metadata of the plurality of GenAI models, at step 414. It should be noted that the plurality of GenAI models includes the first list of GenAI models corresponding to each of the plurality of benchmark datasets.
[041] Referring now to FIG. 5, a flow diagram of an exemplary process 500 for performing a Parameter Efficient Fine-Tuning (PEFT) on a first GenAI model of a GenAI model pair is illustrated, in accordance with an embodiment. FIG. 5 is explained in conjunction with FIGS. 1 - 4. The process 500 may be implemented by the computing device 102 of the system 100. The process 500 may include identifying, by a performance score computing module (for example, the performance score computing module 208), one or more GenAI model pairs from the set of GenAI models based on the performance score and a size of each of the set of GenAI models, at step 502. The step 502 may include step 504, step 506, and step 508. To identify the one or more GenAI model pairs, the process 500 may include calculating, by the performance score computing module, an average performance score based on the performance score of each of the set of GenAI models selected for the GenAI agent, at step 504.
[042] Further, for each of the set of GenAI models, the process 500 may include comparing, by the performance score computing module, the performance score with the average performance score, at step 506. Further, the process 500 may include identifying, by the performance score computing module, the one or more GenAI model pairs based on the comparing and the size of each of the set of GenAI models, at step 508. Further, for each of the one or more GenAI model pairs, the process 500 may include performing, by a training module (for example, the training module 212), a PEFT on a first GenAI model of a GenAI model pair using a second GenAI model of the GenAI model pair, at step 510.
[043] Referring now to FIG. 6, a flow chart of a detailed exemplary process 600 for agent-based GenAI model selection is illustrated, in accordance with an embodiment. FIG. 6 is explained in conjunction with FIGS. 1 - 5. The process 600 may include populating, by the model metadata retrieving module 202, the model metadata 216 corresponding to the plurality of GenAI models 218. By way of an example, the plurality of GenAI models 218 may include GenAI model 1, GenAI model 2, GenAI model 3, up to GenAI model n. To populate the model metadata 216, the process 600 may include inputting, by the model metadata retrieving module 202, data source (i.e., a list of dataset sources (such as websites)) generated by web crawling through a crawler 602 via internet 604. Further, the process 600 may include initializing, by the model metadata retrieving module 202, a plurality of variables. To initialize the plurality of variables, the process 600 may include creating, by the model metadata retrieving module 202, an empty dataset table for storing dataset details and an empty benchmark table for storing benchmark details.
[044] Further, the process 600 may include scraping, by the model metadata retrieving module 202, the dataset details from the dataset sources. Further, the process 600 may include extracting, by the model metadata retrieving module 202, the dataset details through a web scraping technique. The dataset details may include task name, dataset link, dataset description, or the like. Further, the process 600 may include storing, by the model metadata retrieving module 202, the extracted dataset details in the dataset table. Further, the process 600 may include generating, by the model metadata retrieving module 202, dataset descriptions using a GenAI model (such as the description expansion GenAI model) based on the scraped dataset details. Further, the process 600 may include adding (or storing), by the model metadata retrieving module 202, the generated descriptions to the dataset table.
[045] Referring now to FIG. 7A and FIG. 7B, a database schema 700a and a corresponding exemplary dataset table 700b are illustrated, in accordance with an embodiment. In FIG. 7A, the database schema 700a corresponding to the dataset table 700b may include a data type column, a column name column, and a key column. Entries in the column name column may correspond to dataset details to be populated in the dataset table 700b. Entries in the data type column may describe a data type of values corresponding to each property in the dataset table 700b. Entries in the key column may correspond to a key type corresponding to each property in the dataset table 700b.
[046] By way of an example, for a dataset detail “id”, the data type may be “INT” and the key may be “Primary Key (PK)”. For a dataset detail “task_name”, the data type may be “VARCHAR(255)” and the key may not be defined. For a dataset detail “dataset_link”, the data type may be “TEXT”, and the key may not be defined. For a dataset detail “description”, the data type may be “TEXT”, and the key may not be defined. For a dataset detail “created_at”, the data type may include “TIMESTAMP”, and the key may not be defined.
[047] In FIG. 7B, the exemplary dataset table 700b is shown. The dataset table 700b may be populated based on the database schema 700a. Columns of the dataset table 700b may correspond to the database details defined in the “column name” column of the database schema 700a. Thus, the dataset table 700b may include an “id” column, a “task_name” column, a “dataset_link” column, a “description” column, and a “created_at” column. It should be noted that the dataset details may be retrieved from the model metadata 216. The description (i.e., the detailed dataset description) may be generated using the description expansion GenAI model. By way of an example, for id “1”, the “task_name” may be “Natural Language Understanding (NLU)”, the “dataset_link” may be a link to “GLUE Benchmark” web page, the “description” may include “A collection of resources for training, evaluating, and analyzing NLU systems”, and the “created_at” timestamp may be “2025-03-24 12:00:00”.
[048] For id “2”, the “task_name” may be “Question Answering”, the “dataset_link” may be a link to “SQuAD” web page, the “description” may include “A reading comprehension dataset consisting of questions posed on Wikipedia articles”, and the “created_at” timestamp may be “2025-03-24 12:05:00”. For id “3”, the “task_name” may be “Commonsense Reasoning”, the “dataset_link” may be a link to “HellaSwag” web page, the “description” may include “A dataset for evaluating commonsense reasoning in AI models”, and the “created_at” timestamp may be “2025-03-24 12:10:00”. For id “4”, the “task_name” may be “Mathematical Problem Solving”, the “dataset_link” may be a link to “MATH” web page, the “description” may include “A dataset containing 12,500 competition-level mathematics problems”, and the “created_at” timestamp may be “2025-03-24 12:15:00”. For id “5”, the “task_name” may be “Truthfulness Evaluation”, the “dataset_link” may be a link to “TruthfulQA” web page, the “description” may be “A benchmark to measure how truthful language models are in generating answers”, and the “created_at” timestamp may be “2025-03-24 12:20:00”.
[049] Referring back to FIG. 6, the process 600 may include crawling, by the model metadata retrieving module 202, for benchmarks. To crawl for benchmarks, for each dataset link, the process 600 may include extracting, by the model metadata retrieving module 202, benchmark information by visiting the associated benchmark pages. Further, the process 600 may include storing, by the model metadata retrieving module 202, the benchmark details in the benchmark table. Further, the process 600 may include downloading, by the model metadata retrieving module 202, the benchmarks. Further, the process 600 may include storing, by the model metadata retrieving module 202, the benchmark details locally for future use.
[050] Further, the process 600 may include selecting, by the model metadata retrieving module 202, a first list of GenAI models for each of the benchmark datasets, to obtain the plurality of GenAI models 218. By way of an example, for each retrieved benchmark in the benchmark table, the model metadata retrieving module 202 may select top twenty GenAI models. Thus, the plurality of GenAI models 218 may include the top 20 GenAI models for each benchmark. Further, the process 600 may include extracting, by the model metadata retrieving module 202, the model metadata 216 to obtain additional information for benchmark table. By way of an example, the additional information may include, but may not be limited to, a license type, a GPU usage, a cost, and inference time. Further, the benchmark table may be populated with the additional information for each of the first list of GenAI models for each of the benchmark datasets.
[051] Referring now to FIG. 8A and FIG. 8B, a database schema 800a and a corresponding exemplary benchmark table 800b are illustrated, in accordance with an embodiment. In FIG. 8A, the database schema 800a corresponding to the benchmark table 800b may include a data type column, a column name column, and a key column. Entries in the column name column may correspond to benchmark details to be populated in the benchmark table 800b. Entries in the data type column may describe a data type of values corresponding to each property in the benchmark table 800b. Entries in the key column may correspond to a key type corresponding to each property in the benchmark table 800b.
[052] By way of an example, for a benchmark detail “id”, the data type may be “INT” and the key may be “ Primary Key (PK)”. For a benchmark detail “dataset_id”, the data type may be “INT” and the key corresponding to the second id may be “Foreign Key (FK) referencing dataset table”. For a benchmark detail “benchmark_name”, the data type may be “VARCHAR(255)” and the key may not be defined. For a benchmark detail “model_name”, the data type may be “VARCHAR(255)” and the key may not be defined. For a benchmark detail “model_size”, the data type may be “INT” and the key may not be defined. For a benchmark detail “accuracy”, the data type may be “FLOAT” and the key may not be defined. For a benchmark detail “benchmark_rank”, the data type may be “FLOAT” and the key may not be defined. For a benchmark detail “gpu_usage”, the data type may be “BOOLEAN” and the key may not be defined. For a benchmark detail “license_type”, the data type may be “VARCHAR(100)” and the key may not be defined. For a benchmark detail “cost”, the data type may be “DECIMAL (10.2)” and the key may not be defined.
[053] In FIG. 8B, the exemplary benchmark table 800b is shown. The benchmark table 800b may be populated based on the database schema 800a. Columns of the benchmark table 800b may correspond to the benchmark details defined in the “column name” column of the database schema 800a. Thus, the benchmark table 800b may include an “id” column, a “dataset_id” column, a “benchmark_name” column, a “model_name” column, an “accuracy” column, a “benchmark_rank” column, a “gpu_usage” column, a “license type” column, a “cost” column, and a “parameter_size” column. By way of an example, for id “1”, the “dataset_id” may be “1”, the “benchmark_name” may be “GLUE Benchmark”, the “model_name” may be “BERT”, the “accuracy” may be “0.85”, the “benchmark_rank” may be “1”, the “gpu_usage” may be “TRUE”, the “license_type” may be “Apache 2.0”, the “cost” may be “0”, and the “parameter_size” may be “110M”.
[054] For id “2”, the “dataset_id” may be “2”, the “benchmark_name” may be “SQuAD Benchmark”, the “model_name” may be “RoBERTa”, the “accuracy” may be “0.88”, the “benchmark_rank” may be “2”, the “gpu_usage” may be “TRUE”, the “license_type” may be “MIT”, the “cost” may be “0”, and the “parameter_size” may be “125M”.
[055] For id “3”, the “dataset_id” may be “3”, the “benchmark_name” may be “MMLU Benchmark”, the “model_name” may be “GPT-3”, the “accuracy” may be “0.92”, the “benchmark_rank” may be “1”, the “gpu_usage” may be “TRUE”, the “license_type” may be “OpenAI”, the “cost” corresponding to the third id may be “0.02”, and the “parameter_size” may be “175B”
[056] For id “4”, the “dataset_id” may be “4”, the “benchmark_name” may be “HellaSwag”, the “model_name” may be “GPT-2”, the “accuracy” may be “0.78”, the “benchmark_rank” may be “3”, the “gpu_usage” may be “TRUE”, the “license_type” may be “OpenAI”, the “cost” may be “0”, and the “parameter_size” may be “1.5B”.
[057] For id “5”, the “dataset_id” may be “5”, the “benchmark_name” may be “TruthfulQA”, the “model_name” may be “GPT-4”, the “accuracy” may be “0.95”, the “benchmark_rank” may be “1”, the “gpu_usage” may be “TRUE”, the “license_type” may be “OpenAI”, the “cost” corresponding to the fifth id may be “0.03”, and the “parameter_size” may be “Large (Undisclosed)”.
[058] Referring back to FIG. 6, the process 600 may include returning, by the model metadata retrieving module 202, populated dataset table (i.e., the dataset table 700b) and populated benchmark table (i.e., the benchmark table 800b). The populated dataset table may include a table storing the task names, the dataset links, and the descriptions. The populated benchmark table may include table storing benchmark data. Further, the process 600 may include managing, by the GenAI model selecting module 204, the GenAI model through a smart adapter 606 for the agent 214. The smart adapter 606 may include a model selector 608, a model invoker and profiler 610, and a model trainer 612. Further, the process 600 may include utilizing, by the GenAI model selecting module 204 via the model selector 608, the model metadata 216 and a list of configured models within the environment for executing the prompt. Further, the process 600 may include selecting, by the GenAI model selecting module 204 through the model selector 608, a plurality of top models from the models in the model metadata 216.
[059] To select the plurality of top models, the process 600 may include receiving, by the GenAI model selecting module 204 through the model selector 608, an input. The input may include a query. The query may be an agent prompt string. The input may further include the dataset table with the dataset details. The input may further include the benchmark table with the benchmark details. The input may further include user preferences. The user preferences may include a plurality of preferences for ranking the GenAI models. The input may further include available models. The available models may include a set of available models.
[060] Further, the process 600 may include finding, by the GenAI model selecting module 204 through the model selector 608, matching datasets from the dataset table providing a semantic match with the query. Further, the process 600 may include storing, by the GenAI model selecting module 204 through the model selector 608, matching datasets in matched datasets list. Further, for each of the matched datasets, the process 600 may include extracting, by the GenAI model selecting module 204 through the model selector 608, benchmark data where benchmark rank is greater than m (for example, m = 20). Further, the process 600 may include storing, by the GenAI model selecting module 204 through the model selector 608, the benchmark data in retrieved benchmarks.
[061] Further, the process 600 may include filtering, by the GenAI model selecting module 204 through the model selector 608, retrieved benchmarks in user preferences (including price) to get a set of candidate models based on the user preferences. Further, the process 600 may include returning, by the GenAI model selecting module 204 through the model selector 608, a set of top GenAI models from the set of candidate models based on the model availability (i.e., return candidate_models n available_models). It should be noted that the set of top GenAI models selected by the model selector 608 may be less than the plurality of models for which the model metadata 216 is retrieved..
[062] Once the plurality of top models is selected, the process 600 may include executing, by the response generating module 206 through the model invoker and profiler 610, the prompt. The process 600 may include recording, by the response generating module 206 through model invoker and profiler 610, execution time through a judge prompt, executed on the same or a different GenAI model, to generate a response score for the GenAI model. Further, the process 600 may include recording, by the response generating module 206 through the model invoker and profiler 610 through the model selector 608, statistics on the performance score and time for each of the GenAI models and prompt tuple selected.
[063] The process 600 may include selecting, by the model routing module 210 through the model invoker and profiler 610, one or more of the set of top GenAI models for the agent 214. The available set of GenAI models may be selected based on a plurality of predefined policies. By way of an example, the predefined policies may be, but not limited to, a round robin policy, a performance optimal policy, a quality optimal policy. The round robin policy may select one model after another in a round robin fashion. The performance optimal policy may select models having the lowest time in a model invoker metadata. The model invoker metadata may include an agent ID, a system prompt, a context, a time, an input, an output, a score, and the GenAI model for each invocation. The performance optimal policy may include an initial warm up phase. The initial warmup phase may be invoked in the round robin fashion to get measurement on the performance of the available selected models.
[064] Referring now to FIG. 9, an exemplary interface 900 to configure the model invoker and profiler 610 is illustrated, in accordance with an embodiment. The interface 900 may include a policy section 902, a judge prompt section 904 and a save button 906. The policy section 902 may include a plurality of radio buttons to select a predefined policy. By way of an example, the predefined policy may include a round robin policy, a performance optimal policy, a cost optimal policy, and a quality optimal policy. Further, the judge prompt section 904 may include a text box to input the judge prompt. It should be noted that the judge prompt may be written by following a plurality of standards. The plurality of standards may be include a role clarity, an evaluation criteria, and a context completeness ensuring consistent and meaningful scoring. A well-crafted judge prompt may guide the evaluation model to assess outputs based on a plurality of dimensions (for example, accuracy clarity, relevance, etc.) reducing ambiguity. Further, the save button 906 may save the selected predefined policy and the judge prompt.
[065] By way of an example, an agent persona corresponding to the GenAI agent 214 and a prompt is passed to the smart adapter 606. The agent persona may be “You are an expert summarizer. Generate concise, factual 3-point summaries.” The prompt may be “Summarize the following news article in 3 bullet points: [article text]”. Further, a GenAI model may produce an output to the prompt based on the agent persona as follows.
- The government announced new subsidies for electric vehicles starting next year.
- Experts predict a 30% increase in EV adoption due to the incentives.
- Environmental groups praised the move as a step toward carbon neutrality.
[066] Further, a judge prompt may be constructed and sent to judge model. In an embodiment, the judge model may be the GenAI model given the persona of an evaluation model. The judge prompt may be as follows.
"You are a fact-checker and editorial reviewer. Rate the summary for:
1. Accuracy
2. Clarity
3. Relevance
Give a score (1-10) and short justification."
[067] Further, the judge model may provide an output including a score for the output of the GenAI model and a justification of the score. The score may be ‘8’ and the justification may be “Summary is clear and mostly accurate, but misses a counterpoint from opposition parties”.
[068] Referring back to FIG. 6, the process 600 may include ranking, by the performance score computing module 208 through the model trainer 612, the plurality of top models (selected by the model invoker and profiler 610). Further, the process 600 may include finding, by the model routing module 210 through the model trainer 612, a pair of GenAI model including a small size GenAI model and a top performing large size GenAI model. The small size GenAI model has a performance within m% of the top performing large size GenAI model. It should be noted that the value of m may lie anywhere between a 5% and 20% relative difference. If such a pair is found then the model trainer 612 may use the input and the output data from the top performing large size GenAI model to finetune and improve the performance of the small size GenAI model helping to reduce cost, as the small size GenAI models may be operated using low-capacity hardware.
[069] Further, the process 600 may include finding, by the model routing module 210 through the model trainer 612, a small size GenAI model to be trained by a top performing large size GenAI model. To find the small size GenAI model, the process 600 may include creating, by the model routing module 210 through the model trainer 612, an empty set of trainable model set (for example, trainable_model_set = {}). Further, the process 600 may include grouping, by the model routing module 210 through the model trainer 612, the model invoker metadata based on the agent ID and the GenAI model columns in the model invoker metadata. Further, the process 600 may include averaging, by the model routing module 210 through the model trainer 612, the score column for each group. Further, the process 600 may include augmenting, by the model routing module 210 through the model trainer 612, the tuple of GenAI model and average score with model size to create triad of the GenAI model, the model size, and the average score. Further, for every distinct GenAI model triad pairs, the process 600 may include checking, by the model routing module 210 through the model trainer 612, if the small size GenAI model has a performance score within m% of the top performing large size GenAI model. In an embodiment, the value of m may be within a range of 5% and 20% relative difference. If true, the process 600 may include adding, by the model routing module 210 through the model trainer 612, the top performing large size GenAI model and the small size GenAI model name tuple to the trainable model set.
[070] Once the trainable model set is obtained, the process 600 may include training, by the training module 212 through the model trainer 612, the small size GenAI model by the top performing large size GenAI model through the PEFT. It should be noted that the data stored in the model invoker metadata is used to train the small size GenAI model. The input and output columns for the top performing large size GenAI model may be used to finetune the small size GenAI model. Alternatively, the process 600 may include training, by the training module 212 through the model trainer 612, the small size GenAI model from the large size GenAI model using distillation along with context and the model invoker metadata.
[071] By way of an example, in the set of trainable models, the large size GenAI model may be a teacher model and the small size GenAI model may be a student model. The process 600 may include generating, by the training module 212 through the model trainer 612, outputs on the dataset from the model metadata invoker and profiler 610 using the teacher model. Further, the process 600 may include collecting, by the training module 212 through the model trainer 612, teacher-generated data (for example, soft targets, hard labels, or full responses). Further, the process 600 may include using, by the training module 212 through the model trainer 612, teacher-generated data to train the student model with the PEFT technique (for example, LoRA, adapters, etc.). Alternatively, the process 600 may include using, by the training module 212 and the model trainer 612, distillation for training the student model. Further, the process 600 may include getting, by the training module 212 and the model trainer 612, a lightweight model. The lightweight model may mimic the teacher model and may be efficient for deployment.
[072] As will be also appreciated, the above-described techniques may take the form of computer or controller implemented processes and apparatuses for practicing those processes. The disclosure can also be embodied in the form of computer program code containing instructions embodied in tangible media, such as floppy diskettes, solid state drives, CD-ROMs, hard drives, or any other computer-readable storage medium, wherein, when the computer program code is loaded into and executed by a computer or controller, the computer becomes an apparatus for practicing the invention. The disclosure may also be embodied in the form of computer program code or signal, for example, whether stored in a storage medium, loaded into and/or executed by a computer or controller, or transmitted over some transmission medium, such as over electrical wiring or cabling, through fiber optics, or via electromagnetic radiation, wherein, when the computer program code is loaded into and executed by a computer, the computer becomes an apparatus for practicing the invention. When implemented on a general-purpose microprocessor, the computer program code segments configure the microprocessor to create specific logic circuits.
[073] The disclosed methods and systems may be implemented on a conventional or a general-purpose computer system, such as a personal computer (PC) or server computer. Referring now to FIG. 10, an exemplary computing system 1000 that may be employed to implement processing functionality for various embodiments (e.g., as a SIMD device, client device, server device, one or more processors, or the like) is illustrated. Those skilled in the relevant art will also recognize how to implement the invention using other computer systems or architectures. The computing system 1000 may represent, for example, a user device such as a desktop, a laptop, a mobile phone, personal entertainment device, DVR, and so on, or any other type of special or general-purpose computing device as may be desirable or appropriate for a given application or environment. The computing system 1000 may include one or more processors, such as a processor 1002 that may be implemented using a general or special purpose processing engine such as, for example, a microprocessor, microcontroller or other control logic. In this example, the processor 1002 is connected to a bus 1004 or other communication medium. In some embodiments, the processor 1002 may be an Artificial Intelligence (AI) processor, which may be implemented as a Tensor Processing Unit (TPU), or a Graphical Processor Unit, or a Quantum Processing Unit (QPU), or a custom programmable solution Field-Programmable Gate Array (FPGA).
[074] The computing system 1000 may also include a memory 1006 (main memory), for example, Random Access Memory (RAM) or other dynamic memory, for storing information and instructions to be executed by the processor 1002. The memory 1006 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by the processor 1002. The computing system 1000 may likewise include a read only memory (“ROM”) or other static storage device coupled to bus 1004 for storing static information and instructions for the processor 1002.
[075] The computing system 1000 may also include a storage devices 1008, which may include, for example, a media drive 1010 and a removable storage interface. The media drive 1010 may include a drive or other mechanism to support fixed or removable storage media, such as a hard disk drive, a floppy disk drive, a magnetic tape drive, an SD card port, a USB port, a micro USB, an optical disk drive, a CD or DVD drive (R or RW), or other removable or fixed media drive. A storage media 1012 may include, for example, a hard disk, magnetic tape, flash drive, or other fixed or removable medium that is read by and written to by the media drive 1010. As these examples illustrate, the storage media 1012 may include a computer-readable storage medium having stored therein particular computer software or data.
[076] In alternative embodiments, the storage devices 1008 may include other similar instrumentalities for allowing computer programs or other instructions or data to be loaded into the computing system 1000. Such instrumentalities may include, for example, a removable storage unit 1014 and a storage unit interface 1016, such as a program cartridge and cartridge interface, a removable memory (for example, a flash memory or other removable memory module) and memory slot, and other removable storage units and interfaces that allow software and data to be transferred from the removable storage unit 1014 to the computing system 1000.
[077] The computing system 1000 may also include a communications interface 1018. The communications interface 1018 may be used to allow software and data to be transferred between the computing system 1000 and external devices. Examples of the communications interface 1018 may include a network interface (such as an Ethernet or other NIC card), a communications port (such as for example, a USB port, a micro USB port), Near field Communication (NFC), etc. Software and data transferred via the communications interface 1018 are in the form of signals which may be electronic, electromagnetic, optical, or other signals capable of being received by the communications interface 1018. These signals are provided to the communications interface 1018 via a channel 1020. The channel 1020 may carry signals and may be implemented using a wireless medium, wire or cable, fiber optics, or other communications medium. Some examples of the channel 1020 may include a phone line, a cellular phone link, an RF link, a Bluetooth link, a network interface, a local or wide area network, and other communications channels.
[078] The computing system 1000 may further include Input/Output (I/O) devices 1022. Examples may include, but are not limited to a display, keypad, microphone, audio speakers, vibrating motor, LED lights, etc. The I/O devices 1022 may receive input from a user and also display an output of the computation performed by the processor 1002. In this document, the terms “computer program product” and “computer-readable medium” may be used generally to refer to media such as, for example, the memory 1006, the storage devices 1008, the removable storage unit 1014, or signal(s) on the channel 1020. These and other forms of computer-readable media may be involved in providing one or more sequences of one or more instructions to the processor 1002 for execution. Such instructions, generally referred to as “computer program code” (which may be grouped in the form of computer programs or other groupings), when executed, enable the computing system 1000 to perform features or functions of embodiments of the present invention.
[079] In an embodiment where the elements are implemented using software, the software may be stored in a computer-readable medium and loaded into the computing system 1000 using, for example, the removable storage unit 1014, the media drive 1010 or the communications interface 1018. The control logic (in this example, software instructions or computer program code), when executed by the processor 1002, causes the processor 1002 to perform the functions of the invention as described herein.
[080] Thus, the disclosed method and system try to overcome the technical problem of agent-based Generative Artificial (GenAI) model selection. The disclosed method and system may retrieve model metadata of each of a plurality of GenAI models for a GenAI agent based on benchmark details of each of a plurality of benchmark datasets. The GenAI agent corresponds to a prompt. Further, the disclosed method and system may select a set of GenAI models from the plurality of GenAI models corresponding to the GenAI agent based on predefined selection criteria through a semantic analysis. The predefined selection criteria is based on the benchmark details, the model metadata, model availability, and user preferences. Further, for each GenAI model of the set of GenAI models, the disclosed method and system may generate, via the GenAI model, a response corresponding to the prompt of the GenAI agent. Further, the disclosed method and system may compute, via the GenAI model, a performance score of the response based on a set of evaluation metrics using a judge prompt. The judge prompt may include the response and a set of evaluation instructions. Further, the disclosed method and system may select one or more of the set of GenAI models to respond to prompts received through the GenAI agent.
[081] As will be appreciated by those skilled in the art, the techniques described in the various embodiments discussed above are not routine, or conventional, or well understood in the art. The techniques may include a dynamic model selection. The smart adapter may enable dynamic model selection of most suitable GenAI model for a given agent task, based on benchmark-driven performance and cost-efficiency. The dynamic model selection may avoid hardcoding models at design time and may improve flexibility across use cases. Further, the techniques may include operational cost optimization. By profiling GenAI models for latency, inference cost, and compute resource usage, the techniques may ensure low-cost models are prioritized when appropriately leading to significant cost savings in high-frequency or production workloads. The techniques may further include scalable across domains. The smart adapters may support multiple agents with diverse tasks from coding assistance to document generation by mapping each task to benchmark datasets and retrieving the top-performing LLM for specific task domains.
[082] In light of the above-mentioned advantages and the technical advancements provided by the disclosed method and system, the claimed steps as discussed above are not routine, conventional, or well understood in the art, as the claimed steps enable the following solutions to the existing problems in conventional technologies. Further, the claimed steps clearly bring an improvement in the functioning of the device itself as the claimed steps provide a technical solution to a technical problem.
[083] The specification has described method and system agent-based Generative Artificial (GenAI) model selection. The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope and spirit of the disclosed embodiments.
[084] Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
[085] It is intended that the disclosure and examples be considered as exemplary only, with a true scope and spirit of disclosed embodiments being indicated by the following claims. , Claims:CLAIMS
I/WE CLAIM:
1. A method (300) for agent-based Generative Artificial (GenAI) model selection, the method comprising:
retrieving (302), by a computing device (102), model metadata (216) of each of a plurality of Generative Artificial Intelligence (GenAI) models for a GenAI agent (214) based on benchmark details of each of a plurality of benchmark datasets, wherein the GenAI agent (214) corresponds to a prompt;
selecting (304), by the computing device (102), a set of GenAI models from the plurality of GenAI models corresponding to the GenAI agent (214) based on predefined selection criteria through a semantic analysis, wherein the predefined selection criteria is based on the benchmark details, the model metadata (216), model availability, and user preferences;
for each GenAI model of the set of GenAI models,
generating (310), by the computing device (102) via the GenAI model, a response corresponding to the prompt of the GenAI agent (214);
computing (312), by the computing device (102) via the GenAI model, a performance score of the response based on a set of evaluation metrics using a judge prompt, wherein the judge prompt comprises the response and a set of evaluation instructions; and
selecting (314), by the computing device (102), one or more of the set of GenAI models to respond to prompts received through the GenAI agent (214).

2. The method (300) as claimed in claim 1, wherein retrieving the model metadata (216) of the plurality of GenAI models for the GenAI agent (214) comprises:
extracting (402) dataset details of each of the plurality of benchmark datasets through a web scraping technique;
generating (404), via a description expansion GenAI model, a detailed dataset description of the dataset details; and
storing (406) the dataset details and the corresponding detailed dataset description in a dataset table.

3. The method (300) as claimed in claim 2, comprising:
extracting (408) the benchmark details associated with each of the plurality of benchmark datasets in the dataset table;
storing (410) the benchmark details in a benchmark table;
selecting (412) a first list of GenAI models corresponding to each of the plurality of benchmark datasets based on an associated benchmark rank; and
retrieving (414) the model metadata (216) of the plurality of GenAI models, wherein the plurality of GenAI models comprises the first list of GenAI models corresponding to each of the plurality of benchmark datasets.

4. The method (300) as claimed in claim 3, wherein selecting a set of GenAI models from the plurality of GenAI models corresponding to the GenAI agent (214) comprises:
identifying (306) one or more of the plurality of benchmark datasets based on a semantic analysis between the prompt of the GenAI agent (214) and information corresponding to each of the plurality of benchmark datasets in the dataset table; and
for each of the one or more of the plurality of benchmark datasets, selecting (308) a second list of GenAI models from the first list of GenAI models based on the associated benchmark rank, the user preferences, and the model availability, to obtain the set of GenAI models.

5. The method (300) as claimed in claim 1, comprising:
identifying (502) one or more GenAI model pairs from the set of GenAI models based on the performance score and a size of each of the set of GenAI models; and
for each of the one or more GenAI model pairs, performing (510) a Parameter Efficient Fine-Tuning (PEFT) on a first GenAI model of a GenAI model pair using a second GenAI model of the GenAI model pair.

6. The method (300) as claimed in claim 5, wherein identifying the one or more GenAI model pairs comprises:
calculating (504) an average performance score based on the performance score of each of the set of GenAI models selected for the GenAI agent (214);
for each of the set of GenAI models, comparing (506) the performance score with the average performance score; and
identifying (508) the one or more GenAI model pairs based on the comparing and the size of each of the set of GenAI models.

7. A system (100) for agent-based GenAI model selection, the system (100) comprising:
a processor (104); and
a memory (106) communicatively coupled to the processor (104), wherein the memory (106) stores processor instructions, which when executed by the processor (104), cause the processor (104) to:
retrieve (302) model metadata (216) of each of a plurality of GenAI models for a GenAI agent (214) based on benchmark details of each of a plurality of benchmark datasets, wherein the GenAI agent (214) corresponds to a prompt;
select (304) a set of GenAI models from the plurality of GenAI models corresponding to the GenAI agent (214) based on predefined selection criteria through a semantic analysis, wherein the predefined selection criteria is based on the benchmark details, the model metadata (216), model availability, and user preferences;
for each GenAI model of the set of GenAI models,
generate (310), via the GenAI model, a response corresponding to the prompt of the GenAI agent (214);
compute (312), via the GenAI model, a performance score of the response based on a set of evaluation metrics using a judge prompt, wherein the judge prompt comprises the response and a set of evaluation instructions; and
select (314) one or more of the set of GenAI models to respond to prompts received through the GenAI agent (214).

8. The system (100) as claimed in claim 7, wherein to retrieve the model metadata (216) of the plurality of GenAI models for the GenAI agent (214) the processor instructions, on execution, cause the processor (104) to:
extract (402) dataset details of each of the plurality of benchmark datasets through a web scraping technique;
generate (404), via a description expansion GenAI model, a detailed dataset description of the dataset details; and
store (406) the dataset details and the corresponding detailed dataset description in a dataset table

9. The system (100) as claimed in claim 8, wherein processor instructions, on execution, cause the processor (104) to:
extract (408) the benchmark details associated with each of the plurality of benchmark datasets in the dataset table;
store (410) the benchmark details in a benchmark table; and
select (412) a first list of GenAI models corresponding to each of the plurality of benchmark datasets based on an associated benchmark rank; and
retrieve (414) the model metadata (216) of the plurality of GenAI models, wherein the plurality of GenAI models comprises the first list of GenAI models corresponding to each of the plurality of benchmark datasets.

10. The system (100) as claimed in claim 9, wherein to select a set of GenAI models from the plurality of GenAI models corresponding to the GenAI agent (214), the processor instructions, on execution, cause the processor (104) to:
identify (306) one or more of the plurality of benchmark datasets based on a semantic analysis between the prompt of the GenAI agent (214) and information corresponding to each of the plurality of benchmark datasets in the dataset table; and
for each of the one or more of the plurality of benchmark datasets, select (308) a second list of GenAI models from the first list of GenAI models based on the associated benchmark rank, the user preferences, and the model availability, to obtain the set of GenAI models.

11. The system (100) as claimed in claim 7, wherein the processor instructions, on execution, cause the processor (104) to:
identify (502) one or more GenAI model pairs from the set of GenAI models based on the performance score and a size of each of the set of GenAI models; and
for each of the one or more GenAI model pairs, perform (510) a Parameter Efficient Fine-Tuning (PEFT) on a first GenAI model of a GenAI model pair using a second GenAI model of the GenAI model pair.

12. The system (100) as claimed in claim 11, wherein to identify the one or more GenAI model, the processor instructions, on execution, cause the processor (104) to:
calculate (504) an average performance score based on the performance score of each of the set of GenAI models selected for the GenAI agent (214);
for each of the set of GenAI models, compare (506) the performance score with the average performance score; and
identify (508) the one or more GenAI model pairs based on the comparing and the size of each of the set of GenAI models.

Documents

Application Documents

# Name Date
1 202511066321-STATEMENT OF UNDERTAKING (FORM 3) [11-07-2025(online)].pdf 2025-07-11
2 202511066321-REQUEST FOR EXAMINATION (FORM-18) [11-07-2025(online)].pdf 2025-07-11
3 202511066321-REQUEST FOR EARLY PUBLICATION(FORM-9) [11-07-2025(online)].pdf 2025-07-11
4 202511066321-PROOF OF RIGHT [11-07-2025(online)].pdf 2025-07-11
5 202511066321-POWER OF AUTHORITY [11-07-2025(online)].pdf 2025-07-11
6 202511066321-FORM-9 [11-07-2025(online)].pdf 2025-07-11
7 202511066321-FORM 18 [11-07-2025(online)].pdf 2025-07-11
8 202511066321-FORM 1 [11-07-2025(online)].pdf 2025-07-11
9 202511066321-FIGURE OF ABSTRACT [11-07-2025(online)].pdf 2025-07-11
10 202511066321-DRAWINGS [11-07-2025(online)].pdf 2025-07-11
11 202511066321-DECLARATION OF INVENTORSHIP (FORM 5) [11-07-2025(online)].pdf 2025-07-11
12 202511066321-COMPLETE SPECIFICATION [11-07-2025(online)].pdf 2025-07-11