System And Method For Distributed Training Of Meta Learning Models

< Back

System And Method For Distributed Training Of Meta Learning Models

Abstract: This disclosure relates to a system and method for a distributed training of meta-learning models. Herein, a generic approach is proposed to accelerate the training process of meta-learning algorithms by leveraging a distributed training setup. The training is conducted on multiple nodes with each node processing a subset of the tasks, in a distributed training paradigm. A QMAML is proposed that is a distributed variant of the MAML algorithm, to illustrate the efficacy of the distributed training setup. The learning tasks in QMAML are run on multiple nodes in order to accelerate the training process. Further, similar to the distributed training paradigm, gradients for learning-tasks are consolidated to update the meta-model. Finally, initialization parameters are learned upon convergence epochs to generalize the meta-learning models for a new task. [To be published with FIG. 2]

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

29 June 2021

Publication Number

52/2022

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

kcopatents@khaitanco.com

Parent Application

Patent Number

Legal Status

Grant Date

2024-11-06

Renewal Date

Applicants

Tata Consultancy Services Limited

Nirmal Building, 9th Floor, Nariman Point Mumbai Maharashtra India 400021

Inventors

1. KUNDE, Shruti

Tata Consultancy Services Limited Olympus - A, Opp Rodas Enclave, Hiranandani Estate, Ghodbunder Road, Patlipada, Thane West Maharashtra India 400607

2. PANDIT, Amey

Tata Consultancy Services Limited Olympus - A, Opp Rodas Enclave, Hiranandani Estate, Ghodbunder Road, Patlipada, Thane West Maharashtra India 400607

3. MISHRA, Mayank

Tata Consultancy Services Limited Olympus - A, Opp Rodas Enclave, Hiranandani Estate, Ghodbunder Road, Patlipada, Thane West Maharashtra India 400607

4. SINGHAL, Rekha

Tata Consultancy Services Limited Olympus - A, Opp Rodas Enclave, Hiranandani Estate, Ghodbunder Road, Patlipada, Thane West Maharashtra India 400607

Specification

FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003
COMPLETE SPECIFICATION (See Section 10 and Rule 13)
Title of invention:
SYSTEM AND METHOD FOR DISTRIBUTED TRAINING OF
META-LEARNING MODELS
Applicant
Tata Consultancy Services Limited
A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th floor,
Nariman point, Mumbai 400021,
Maharashtra, India
Preamble to the description
The following specification particularly describes the invention and the
manner in which it is to be performed.

TECHNICAL FIELD [001] The disclosure herein generally relates to the field of training of machine learning models and more specifically, to a system and method for a distributed training of meta-learning models.
BACKGROUND
[002] Deep learning has achieved a lot of success in learning one task using a large amount of data (such as images, video, text etc.), without the need for feature engineering. However, neural networks are limited in real world scenarios, where data is scarce and requirements for model accuracy and speed are critical. Artificial intelligence (AI) is now trying to emulate human beings in the way they learn and adapt via meta-learning. Various approaches in gradient based meta-learning propose models which are hierarchical in nature, may be computationally expensive, but attain expected accuracy. However, the model training process is time consuming and compute-intensive in most meta-learning algorithms.
[003] The existing solutions trains a model of different tasks, such that, learning new tasks can be achieved by using only a small number of training samples. Model Agnostic Meta-learning (MAML) is hierarchical in nature, which makes it computationally very expensive. The major challenge encountered was to enable a distributed deep learning training framework to work with data in the forms of tasks, since meta-learning models represent input data in the form of tasks.
SUMMARY
[004] Embodiments of the disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one embodiment, a system and method for a distributed training of meta-learning models is provided.
[005] In one aspect, a processor-implemented method for a distributed training of meta-learning models is provided. The method includes one or more

steps such as receiving a plurality of tasks corresponding to one or more nodes of a meta-learning algorithm to train meta-learning models, distributing each iteration of an inner loop of the meta-learning algorithm parallelly to update task specific parameters of the meta-learning models, and scheduling each of the plurality of tasks corresponding to the one or more nodes of the meta-learning algorithm for a parallel training of independent local copies of the meta-learning models, wherein each of the one or more nodes has an independent copy of the meta-learning models. Herein, the plurality of tasks is sampled into a plurality of meta-batches of a predefined batch-size. In the next step, the method comprising consolidating gradients of each inner loop of the meta-learning algorithm from the one or more nodes when training on the one or more nodes is completed, and updating meta-learning model in an outer loop of the meta-learning algorithm using the averaged gradients of the inner loop. It would be appreciated that the consolidated gradients are averaged and each iteration of the outer loop updation is one epoch. Finally, the method includes learning initialization parameters for the meta-learning model upon convergence of each of one or more epochs to generalize the meta-learning model for a new task.
[006] In another aspect, a system for a distributed training of meta-learning models is provided. The system includes an input/output interface to receive a plurality of tasks corresponding to one or more nodes of a meta-learning algorithm to train meta-learning models, one or more hardware processors and at least one memory storing a plurality of instructions, wherein the one or more hardware processors are configured to execute the plurality of instructions stored in the at least one memory. Further, the system is configured to distribute each iteration of an inner loop of the meta-learning algorithm parallelly to update task specific parameters of the meta-learning models, and schedule each of the plurality of tasks corresponding to the one or more nodes of the meta-learning algorithm for a parallel training of independent local copies of the meta-learning models. Herein, the plurality of tasks is sampled into a plurality of meta-batches of a predefined batch-size and each of the one or more nodes has an independent copy of the meta-learning models. In the next step, the system is configured to

consolidate gradients of each inner loop of the meta-learning algorithm from the one or more nodes when training on the one or more nodes is completed, and update meta-learning model in an outer loop of the meta-learning algorithm using the averaged gradients of the inner loop. It would be appreciated that the consolidated gradients are averaged and each iteration of the outer loop updation is one epoch. Finally, the system is configured to learn initialization parameters for the meta-learning model upon convergence of each of one or more epochs to generalize the meta-learning model for a new task.
[007] In yet another aspect, there are provided one or more non-transitory machine-readable information storage mediums comprising one or more instructions, which when executed by one or more hardware processors causes a method for a distributed training of meta-learning models. The method includes one or more steps such as receiving a plurality of tasks corresponding to one or more nodes of a meta-learning algorithm to train meta-learning models, distributing each iteration of an inner loop of the meta-learning algorithm parallelly to update task specific parameters of the meta-learning models, and scheduling each of the plurality of tasks corresponding to the one or more nodes of the meta-learning algorithm for a parallel training of independent local copies of the meta-learning models, wherein each of the one or more nodes has an independent copy of the meta-learning models. Herein, the plurality of tasks is sampled into a plurality of meta-batches of a predefined batch-size. In the next step, the method comprising consolidating gradients of each inner loop of the meta-learning algorithm from the one or more nodes when training on the one or more nodes is completed, and updating meta-learning models in an outer loop of the meta-learning algorithm using the averaged gradients of the inner loop. It would be appreciated that the consolidated gradients are averaged and each iteration of the outer loop updation is one epoch. Finally, the method includes learning initialization parameters for the meta-learning model upon convergence of each of one or more epochs to generalize the meta-learning models for a new task.
[008] It is to be understood that both the foregoing general description

and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS
[009] The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles:
[010] FIG. 1 illustrates a network diagram of an exemplary system for a distributed training of meta-learning models, according to an embodiment of the present disclosure.
[011] FIG. 2 is a functional block diagram to illustrate a system for a distributed training of meta-learning models, according to an embodiment of the present disclosure.
[012] FIG. 3 is a flow diagram to illustrate MAML algorithm, in accordance with some embodiments of the present disclosure.
[013] FIG. 4 is a schematic diagram to illustrate parallel distribution of inner loop of the meta-learning algorithms, in accordance with some embodiments of the present disclosure.
[014] FIG. 5 is a flow chart to illustrate a method for a distributed training of meta-learning models, according to an embodiment of the present disclosure.
[015] It should be appreciated by those skilled in the art that any block diagrams herein represent conceptual views of illustrative systems and devices embodying the principles of the present subject matter. Similarly, it will be appreciated that any flow charts, flow diagrams, and the like represent various processes, which may be substantially represented in computer readable medium and so executed by a computer or processor, whether or not such computer or processor is explicitly shown.
DETAILED DESCRIPTION OF EMBODIMENTS [016] Exemplary embodiments are described with reference to the

accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments.
[017] The embodiments herein provide a system and method for a distributed training of meta-learning models. It would be appreciated that the meta-learning, also known as learn-to-learn, which is a form of generalization. The meta-learning uses prior experience and knowledge to train a model that will quickly adapt and generalize to newer tasks. Meta-learning requires a large number of iterations over tasks, where a task contains multiple data points of several data classes. Various approaches in gradient based meta-learning propose models which are hierarchical in nature may be computationally expensive but attain expected accuracy. However, the model training process is time consuming and compute-intensive in most meta-learning algorithms. Herein, the system and method propose a distributed training approach for meta-learning algorithms, that reduces the training time and also provides additional resources for training, by distributing tasks across multiple nodes.
[018] The distributed training is effective in reducing the training time for complex models and application with large data sets. Meta-learning models learn parameters from the distributed data at the nodes, in resources constrained environments, without moving the raw data to a centralized location. Herein, a generic approach to distribute meta-learning algorithms which enable high scalability and reduce the training time is disclosed.
[019] It is to be noted that a model agnostic meta-learning algorithm (MAML) trains a model of different tasks, such that, learning new tasks can be achieved by using only a small number of training samples. The MAML is hierarchical in nature that aims to learn the initialization parameters for a meta-learning model. It expedites the process of adapting the model to a new task with

fewer training steps, as it is already initialized with parameters learned over previous tasks. Formally, a task may be defined as a distribution over the input data samples, a distribution over the labels and a loss function. The tasks are sampled from a distribution of tasks. It would be appreciated that the MAML algorithm has two nested loops, i.e. inner loop, and outer loop. Step sizes for the updates are hyperparameters that are fixed at inner loop and outer loop. When the model adapts to a new task, the task specific parameters are updated, and an updated parameters vector is computed. The meta optimization in the outer loop is performed by update parameters. The MAML algorithm uses the concept of rapid learning, in which the parameters setting of the outer loop meta initialization is such that adaptation happens faster in the inner loop.
[020] Referring now to the drawings, and more particularly to FIG. 1 through FIG. 5, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.
[021] FIG. 1 illustrates a network diagram of an exemplary system (100) for a distributed training of meta-learning models, in accordance with an example embodiment. Although the present disclosure is explained considering that the system (100) is implemented on a server, it may be understood that the system (100) may comprise one or more computing devices (102), such as a laptop computer, a desktop computer, a notebook, a workstation, a cloud-based computing environment and the like. It will be understood that the system 100 may be accessed through one or more input/output interfaces 104-1, 104-2... 104-N, collectively referred to as I/O interface (104). Examples of the I/O interface (104) may include, but are not limited to, a user a portable computer, a personal digital assistant, a handheld device, a smartphone, a tablet computer, a workstation, and the like. The I/O interface (104) are communicatively coupled to the system (100) through a communication network (106).
[022] In an embodiment, the communication network (106) may be a wireless or a wired network, or a combination thereof. In an example, the

communication network (106) can be implemented as a computer network, as one of the different types of networks, such as virtual private network (VPN), intranet, local area network (LAN), wide area network (WAN), the internet, and such. The communication network (106) may either be a dedicated network or a shared network, which represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), and Wireless Application Protocol (WAP), to communicate with each other. Further, the communication network (106) may include a variety of network devices, including routers, bridges, servers, computing devices, storage devices. The network devices within the communication network (106) may interact with the system (100) through communication links.
[023] The system (100) supports various connectivity options such as BLUETOOTH®, USB, ZigBee and other cellular services. The communication network (106) environment enables connection of various components of the system (100) using any communication link including Internet, WAN, MAN, and so on. In an exemplary embodiment, the system (100) is implemented to operate as a stand-alone device. In another embodiment, the system (100) may be implemented to work as a loosely coupled device to a smart computing environment. The components and functionalities of the system (100) are described further in detail.
[024] Referring FIG. 2, a functional block diagram (200) to illustrate a system for a distributed training of meta-learning models, in accordance with an example embodiment. The system (100) comprises at least one memory with a plurality of instructions, one or more databases (112), one or more input/output (I/O) interfaces (104) and one or more hardware processors (108) which are communicatively coupled with the at least one memory to execute a plurality of modules therein.
[025] The one or more I/O interfaces (104) are configured to receive a plurality of tasks corresponding to one or more nodes of a meta-learning algorithm to train meta-learning models. The plurality of received tasks is

sampled into a plurality of meta-batches of a predefined batch-size. The one or more I/O interfaces (104) are configured for learning initialization parameters for the meta-learning model upon convergence of each of one or more epochs to generalize the meta-learning models for a new task.
[026] In the preferred embodiment, the system (100) is configured to distribute each iteration of an inner loop of the meta-learning algorithm parallelly to update task specific parameters of the meta-learning models. Further, the system 100 schedules each of the plurality of tasks corresponding to the one or more nodes of the meta-learning algorithm for a parallel training of independent local copies of the meta-learning models. It would be appreciated that the one or more nodes of the meta-learning algorithm has an independent copy of the meta-learning models.
[027] Referring FIG. 3, a functional flow chart (300) to illustrate functioning of the MAML algorithm, in accordance with some embodiments of the present disclosure. Herein, the inner loop iterates over the plurality of tasks in a single batch. Here, the number of tasks, in a single batch are divided by the number of nodes. The DistributedSampler of Pytorch is given a single batch as input. It randomly loads one task on each node and trains the model on the node. For example, if the size of a batch is 5, i.e. there are 5 tasks in a batch and the number of nodes in 2, then the DistributedSampler may first load one task on each node, After both the nodes train their model, it may then load the next two tasks, one on each node followed by the final task.
[028] Furthermore, the gradients are consolidated after each batch is processed at a node in the distributed training. The gradients are shared across nodes and the meta-learning models is updated. A new epoch starts when the meta-learning model has been trained and updated using all the meta-batches. When the plurality of tasks in the single meta-batch have updated a local copy of the model, the gradients are then consolidated, and the model copy is updated in the outer loop. The optimization function of the distributed deep learning training framework is called only after all the tasks are processed at each of the node. This signifies one epoch.

[029] In one instance, a Quick MAML (QMAML), which is a distributed variant of the MAML, is explained to illustrate generic approach to accelerate the training process of the meta-learning algorithms by leveraging a distributed training setup. Herein, the training is conducted on one or more nodes with each node processing a subset of the plurality of tasks in a distributed training paradigm. Similar to the distributed training paradigm, gradients for learning tasks are consolidated to update the meta-learning model. It is to be noted that a lightweight distributed training library is leverages to implement QMAML, but this approach can be implemented on any distributed deep learning training framework.
[030] Referring FIG. 4, a schematic diagram (400) illustrating parallel distribution of inner loop of the meta-learning algorithms in accordance with some embodiments of the present disclosure. It is to be noted that the , QMAML is a parallelized version of MAML, wherein the inner loop of the QMAML is executed in parallel on different processes on a single machine. The number of parallel processes was equal to the meta batch. It is to be noted that this approach of parallelization cannot scale beyond a single machine.
[031] In one instance, wherein each node in the distributed deep learning training framework gets a copy of the model. the plurality of tasks in the predefined batch-size are distributed across all the nodes. The inner loop of the QMAML runs in a distributed manner on multiple nodes of the distributed deep learning training framework. The task specific parameters of the QMAML are independently determined on each node and local copy of the model is updated. Once, every node is iterated through the plurality of tasks in the batch, it implies that the inner loop of the QMAML algorithm is complete. The gradients are accumulated and parameters of the meta model are then updated. This signifies one epoch or one iteration of the outer loop. A broadcast is then sent to all the nodes with an updated copy of the model for next epoch. The MAML algorithm requires a large number of iterations for convergence and hence the time required for training also increases significantly. Distributing tasks across multiple node facilitates training on different tasks in parallel and thus enables the metalearner to

learn faster.
[032] Referring FIG. 5, a flow diagram (500) to illustrate a processor-implemented method for a distributed training of meta-learning models according to an embodiment of the present disclosure.
[033] Initially, at the step (502), receiving a plurality of tasks corresponding to one or more nodes of a meta-learning algorithm to train meta-learning model. The plurality of tasks is sampled into a plurality of meta-batches of a predefined batch-size.
[034] At the next step (504), distributing each iteration of an inner loop of the meta-learning algorithm parallelly to update task specific parameters of the meta-learning models.
[035] At the next step (506), scheduling each of the plurality of tasks corresponding to the one or more nodes of the meta-learning algorithm for a parallel training of independent local copies of the meta-learning models. It is to be noted that each of the one or more nodes has an independent copy of the meta-learning models.
[036] At the next step (508), consolidating gradients of each inner loop of the meta-learning algorithm from the one or more nodes when training on the one or more nodes is completed. Wherein, the consolidated gradients are averaged.
[037] At the next step (510), updating the meta-learning models in an outer loop of the meta-learning algorithm using the averaged gradients of the inner loop, wherein each iteration of the outer loop updation is one epoch.
[038] At the last step (512), learning initialization parameters for the meta-learning models upon convergence of each of one or more epochs to generalize the meta-learning model for a new task.
[039] The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent

elements with insubstantial differences from the literal language of the claims.
[040] The embodiments of present disclosure herein address unresolved problem of training time acceleration of meta-learning models. At present there is no common framework available which provides a distributed training setup for meta-learning algorithms. Therefore, embodiments herein provide a system and method for a distributed training of meta-learning models. Herein, a generic approach is proposed to accelerate the training process of meta-learning algorithms by leveraging a distributed training setup. The training is conducted on multiple nodes with each node processing a subset of the tasks, in a distributed training paradigm. A QMAML is proposed that is a distributed variant of the MAML algorithm, to illustrate the efficacy of the distributed training setup. The learning tasks in QMAML are run on multiple nodes in order to accelerate the training process. Further, similar to the distributed training paradigm, gradients for learning-tasks are consolidated to update the meta-model.
[041] It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g. any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g. hardware means like e.g. an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g. an ASIC and an FPGA, or at least one microprocessor and at least one memory with software modules located therein. Thus, the means can include both hardware means, and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g. using a plurality of CPUs.
[042] The embodiments herein can comprise hardware and software

elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various modules described herein may be implemented in other modules or combinations of other modules. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
[043] The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
[044] Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform

steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
[045] It is intended that the disclosure and examples be considered as exemplary only, with a true scope of disclosed embodiments being indicated by the following claims.

We Claim:
1. A processor-implemented method (500) comprising:
receiving (502), via an input/output interface, a plurality of tasks corresponding to one or more nodes of a meta-learning algorithm to train a meta-learning model;
distributing (504), via the one or more hardware processors, each iteration of an inner loop of the meta-learning algorithm parallelly to update task specific parameters of the meta-learning model; scheduling (506), via one or more hardware processors, each of the plurality of tasks corresponding to the one or more nodes of the meta-learning algorithm for a parallel training of independent local copies of the meta-learning model, wherein each of the one or more nodes has an independent copy of the meta-learning model;
consolidating (508), via the one or more hardware processors, gradients of each inner loop of the meta-learning algorithm from the one or more nodes when training on the one or more nodes is completed, wherein an average is calculated of the consolidated gradients;
updating (510), via the one or more hardware processors, the meta-learning model in an outer loop of the meta-learning algorithm using the averaged gradients of the inner loop, wherein the meta-learning model in the outer loop is updated in one or more iteration, and wherein each iteration is one epoch; and learning (512), via one or more hardware processors, a plurality of initialization parameters for the meta-learning model- upon convergence of each of one or more epochs to generalize the meta-learning model for a new task.

2. The processor-implemented method (500) of claim 1, wherein the plurality of tasks is sampled into a plurality of meta-batches of a predefined batch-size.
3. The processor-implemented method (500) of claim 1, wherein the task specific parameters of the meta-learning model are updated at each of the one or more nodes after each iteration.
4. The system (100) comprising:
an input/output interface (104) to receive a plurality of tasks
corresponding to one or more nodes of a meta-learning algorithm
to train a meta-learning model;
one or more hardware processors (108);
a memory in communication with the one or more hardware
processors, wherein the one or more hardware processors (108)
are configured to execute programmed instructions stored in the
memory, to:
distribute each iteration of an inner loop of the meta-learning algorithm parallelly to update task specific parameters of the meta-learning model;
schedule each of the plurality of tasks corresponding to the one or more nodes of the meta-learning algorithm for a parallel training of independent local copies of the meta-learning model, wherein each of the one or more nodes has an independent copy of the meta-learning model; consolidate gradients of each inner loop of the meta-learning algorithm from the one or more nodes when training on the one or more nodes is completed, wherein an average is calculated of the consolidated gradients; update the meta-learning models in an outer loop of the meta-learning algorithm using the averaged gradients of

the inner loop, wherein the meta-learning model in the outer loop is updated in one or more iteration, and wherein each iteration is one epoch; and
learn a plurality of initialization parameters for the meta-learning models upon convergence of each of one or more epochs to generalize the meta-learning model for a new task.
5. The system (100) of claim 4, wherein the plurality of tasks is sampled into a plurality of meta-batches of a predefined batch-size.
6. The system (100) of claim 4, wherein the task specific parameters of the meta-learning model are updated at each of the one or more nodes after each iteration.
7. A non-transitory computer readable medium storing one or more instructions which when executed by one or more processors on a system, cause the one or more processors to perform method comprising:
receiving, via an input/output interface, a plurality of tasks corresponding to one or more nodes of a meta-learning algorithm to train a meta-learning model;
distributing, via the one or more hardware processors, each iteration of an inner loop of the meta-learning algorithm parallelly to update task specific parameters of the meta-learning model; scheduling, via one or more hardware processors, each of the plurality of tasks corresponding to the one or more nodes of the meta-learning algorithm for a parallel training of independent local copies of the meta-learning models, wherein each of the one or more nodes has an independent copy of the meta-learning model;

consolidating, via the one or more hardware processors, gradients of each inner loop of the meta-learning algorithm from the one or more nodes when training on the one or more nodes is completed, wherein the consolidated gradients are averaged;
updating, via the one or more hardware processors, the meta-learning models in an outer loop of the meta-learning algorithm using the averaged gradients of the inner loop, wherein the meta-learning model in the outer loop is updated in one or more iteration, and wherein each iteration is one epoch; and learning, via one or more hardware processors, a plurality of initialization parameters for the meta-learning model upon convergence of each of one or more epochs to generalize the meta-learning model for a new task.

Documents

Application Documents

#	Name	Date
1	202121029174-STATEMENT OF UNDERTAKING (FORM 3) [29-06-2021(online)].pdf	2021-06-29
2	202121029174-REQUEST FOR EXAMINATION (FORM-18) [29-06-2021(online)].pdf	2021-06-29
3	202121029174-PROOF OF RIGHT [29-06-2021(online)].pdf	2021-06-29
4	202121029174-FORM 18 [29-06-2021(online)].pdf	2021-06-29
5	202121029174-FORM 1 [29-06-2021(online)].pdf	2021-06-29
6	202121029174-FIGURE OF ABSTRACT [29-06-2021(online)].jpg	2021-06-29
7	202121029174-DRAWINGS [29-06-2021(online)].pdf	2021-06-29
8	202121029174-DECLARATION OF INVENTORSHIP (FORM 5) [29-06-2021(online)].pdf	2021-06-29
9	202121029174-COMPLETE SPECIFICATION [29-06-2021(online)].pdf	2021-06-29
10	202121029174-FORM-26 [22-10-2021(online)].pdf	2021-10-22
11	Abstract1..jpg	2021-12-13
12	202121029174-FER.pdf	2023-01-24
13	202121029174-PETITION UNDER RULE 137 [24-05-2023(online)].pdf	2023-05-24
14	202121029174-OTHERS [24-05-2023(online)].pdf	2023-05-24
15	202121029174-FER_SER_REPLY [24-05-2023(online)].pdf	2023-05-24
16	202121029174-CORRESPONDENCE [24-05-2023(online)].pdf	2023-05-24
17	202121029174-CLAIMS [24-05-2023(online)].pdf	2023-05-24
18	202121029174-PatentCertificate06-11-2024.pdf	2024-11-06
19	202121029174-IntimationOfGrant06-11-2024.pdf	2024-11-06

Search Strategy

1	metalearningE_24-01-2023.pdf