System And Method For Automatic Code Refactoring For Training

< Back

System And Method For Automatic Code Refactoring For Training Parallelization

Abstract: ABSTRACT SYSTEM AND METHOD FOR AUTOMATIC CODE REFACTORING FOR TRAINING PARALLELIZATION Existing techniques for training machine learning models suffer from inherent drawbacks to utilize the available infrastructure to the full extent. Embodiments herein provide a method and system for an automatic code refactoring for distributed training on multiple processing units. It is a tree based automatic code refactoring system which refactors the training code for the distributed training on multiple processing units. This refactoring of the code involves, creating data partitions for multiple parallel processes and ensuring that each process works on its assigned data partition, pinning the devices to processes. Also, defining new optimizer and strategies which works with distributed processes and running the training by synchronizing the intermediate data for weights and biases and their updates. This entire code flow needs to be orchestrated with the help of functions offered by underlying frameworks. [To be published with FIG. 2]

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

28 April 2022

Publication Number

44/2023

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

Tata Consultancy Services Limited

Nirmal Building, 9th floor, Nariman point, Mumbai 400021, Maharashtra, India

Inventors

1. GOEL, Ishank

Tata Consultancy Services Limited, S1 Poorna, Sahyadri Park, Plot No. 2, 3, Rajiv Gandhi infotech Park, Phase III, Hinjawadi-Maan, Pune 411057, Maharashtra, India

2. KALELE, Amit

Tata Consultancy Services Limited, S2 Torna, Sahyadri Park, Plot No. 2, 3, Rajiv Gandhi infotech Park, Phase III, Hinjawadi-Maan, Pune 411057, Maharashtra, India

3. PANWAR, Nitendra Singh

Tata Consultancy Services Limited SJM Towers, 18, Seshadri Rd, Gandhi Nagar, Bangalore 560009, Karnataka, India

4. JAIN, Anubhav

Tata Consultancy Services Limited Phase-II, Sector-80, Block C, Noida 201305, Uttar Pradesh, India

5. SUBBIAH, Ravindran

Tata Consultancy Services Limited, Peepul Park, TECHNOPARK PARK CAMPUS, KARIAVATTOM PO, THIRUVANANTHAPURAM 695581, Kerala, India

Specification

Description: FORM 2

THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003

COMPLETE SPECIFICATION
(See Section 10 and Rule 13)

Title of invention:
SYSTEM AND METHOD FOR AUTOMATIC CODE REFACTORING FOR TRAINING PARALLELIZATION

Applicant
Tata Consultancy Services Limited
A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th floor,
Nariman point, Mumbai 400021,
Maharashtra, India

Preamble to the description:
The following specification particularly describes the invention and the manner in which it is to be performed.

TECHNICAL FIELD
[001] The disclosure herein generally relates to the field of automatic code refactoring and more specifically, to a method and system for an automatic code refactoring for distributed training on multiple processing units.

BACKGROUND
[002] With the wide adaptation of Artificial Intelligence (AI) & Machine Learning (ML) in industry, models are being used for business critical applications and decisions. These models go through a long development cycle which involve many compute intensive steps including model training. Training deep learning models on large data sets is time consuming and resource intensive tasks and typically involves days of computing for a single training. These computations increase by many folds when training is combined with hyper parameters tuning. This translates into unacceptable compute time. Such cases are typically done with parallel computing where training is carried out on multiple CPUs and GPUs in a distributed manner.
[003] Majority of the frameworks used for developing and training deep learning models provides mechanism and distributed strategies to facilitate distributed training. An automated mechanism to carry out the code transformation or code refactoring is highly desirable, as data scientist and ML engineers can then focus on their main task of developing model rather than focusing on efficiently running the training and optimizing training cost and performance.
[004] However, developing such an automated mechanism has its own challenges. To carry out the code refactoring or transformation, one of the main hurdles is to identify some of the variables and code statements defined by user for data processing, defining optimizer and callbacks as each programmer has its unique way of writing code and defining variable. And this makes it almost impossible to detect them using patterns or key words.

SUMMARY
[005] Embodiments of the disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one embodiment, a method and system for an automatic code refactoring for distributed training on multiple processing units is provided.
[006] In one aspect, a processor-implemented method for an automatic code refactoring for distributed training on multiple processing units is provided. The processor-implemented method comprising receiving he training code to enable a distributed training of a plurality of machine learning models, parsing the received training code to get a tree format of the training code using an abstract syntax tree model, analyzing the received training code to determine a list of imports to be used in the training code, detecting a location of a last import node from the determined list of imports, identifying at least one node containing a variable declaration of optimizer, modifying the identified at least one optimizer node to split a plurality of tasks among one or more workers, identifying at least one node containing declaration of callbacks used in the training based on a pattern matching with a node containing callback related keywords, modifying the identified at least one node containing declaration of callbacks to broadcast training parameters and variables required for the training, and converting, via the one or more hardware processors, the modified nodes of the tree to a source code to get a refactored training code.
[007] In another aspect, a system for an automatic code refactoring for distributed training on multiple processing units is provided. The system includes an input/output interface configured to receive the training code to enable a distributed training of a plurality of machine learning models, one or more hardware processors and at least one memory storing a plurality of instructions, wherein the one or more hardware processors are configured to execute the plurality of instructions stored in the at least one memory.
[008] Further, the system is configured to parse the received training code to get a tree format of the training code using an abstract syntax tree model, analyze the received training code to determine a list of imports to be used in the training code, detect location of a last import node from the determined list of imports, identify at least one node containing a variable declaration of optimizer, modify the identified at least one optimizer node to split a plurality of tasks among one or more workers, identify at least one node containing declaration of callbacks used in the training based on a pattern matching with a node containing callback related keywords, modify the identified at least one node containing declaration of callbacks to broadcast training parameters and variables required for the training, and convert the modified nodes of the tree to a source code to get a refactored training code.
[009] In yet another aspect, one or more non-transitory machine-readable information storage mediums are provided comprising one or more instructions, which when executed by one or more hardware processors causes a method for an automatic code refactoring for distributed training on multiple processing units is provided. The processor-implemented method comprising receiving he training code to enable a distributed training of a plurality of machine learning models, parsing the received training code to get a tree format of the training code using an abstract syntax tree model, analyzing the received training code to determine a list of imports to be used in the training code, detecting a location of a last import node from the determined list of imports, identifying at least one node containing a variable declaration of optimizer, modifying the identified at least one optimizer node to split a plurality of tasks among one or more workers, identifying at least one node containing declaration of callbacks used in the training based on a pattern matching with a node containing callback related keywords, modifying the identified at least one node containing declaration of callbacks to broadcast training parameters and variables required for the training, and converting, via the one or more hardware processors, the modified nodes of the tree to a source code to get a refactored training code.
[010] It is to be understood that the foregoing general descriptions and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

BRIEF DESCRIPTION OF THE DRAWINGS
[011] The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles:
[012] FIG. 1 illustrates a block diagram of an exemplary system for an automatic code refactoring for distributed training on multiple processing units, in accordance with some embodiments of the present disclosure.
[013] FIG. 2 is a flowchart of the system for an automatic code refactoring for distributed training on multiple processing units, in accordance with some embodiments of the present disclosure.
[014] FIG. 3 is a flow diagram to illustrate a method for an automatic code refactoring for distributed training on multiple processing units, in accordance with some embodiments of the present disclosure.

DETAILED DESCRIPTION OF EMBODIMENTS
[015] Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments.
[016] The embodiments herein provide a method and system for an automatic code refactoring for distributed training on multiple processing units. It is a tree based automatic code refactoring approach which refactors the training code for the distributed training on multiple CPUs and GPUs. The method and system support popular machine learning frameworks i.e. TensorFlow and Keras. This refactoring of the code involves, creating data partitions for multiple parallel processes and ensuring that each process works on its assigned data partition, pinning the devices to processes. Also, defining new optimizer and strategies which works with distributed processes and running the training by synchronizing the intermediate data for weights and biases and their updates. This entire code flow needs to be orchestrated with the help of functions offered by underlying frameworks.
[017] The code refactoring with distributed strategies need to define carefully to ensure that it uses underlying hardware to fullest and no resources remains idle. Some amount of system engineering knowledge is required to refactor the code which runs efficiently on a distributed system and the complexity is higher for the heterogeneous systems which involve mix of CPUs and GPUs. However, developing such an automated mechanism has its own challenges. To carry out the code refactoring or transformation, one of the main hurdles is to identify some of the variables and code statements defined by user for data processing, defining optimizer and callbacks as each programmer has its unique way of writing code and defining variable. And this makes it almost impossible to detect them using patterns or key words.
[018] Referring now to the drawings, and more particularly to FIG. 1 through FIG. 3, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.
[019] FIG. 1 illustrates a block diagram of a system (100) for an automatic code refactoring for a distributed training on multiple processing units, in accordance with an example embodiment. Although the present disclosure is explained considering that the system (100) is implemented on a server, it may be understood that the system (100) may comprise one or more computing devices (102), such as a laptop computer, a desktop computer, a notebook, a workstation, a cloud-based computing environment and the like. It will be understood that the system (100) may be accessed through one or more input/output interfaces 104-1, 104-2... 104-N, collectively referred to as I/O interface (104). Examples of the I/O interface (104) may include, but are not limited to, a user interface, a portable computer, a personal digital assistant, a handheld device, a smartphone, a tablet computer, a workstation, and the like. The I/O interface (104) are communicatively coupled to the system (100) through a network (106).
[020] In an embodiment, the network (106) may be a wireless or a wired network, or a combination thereof. In an example, the network (106) can be implemented as a computer network, as one of the different types of networks, such as virtual private network (VPN), intranet, local area network (LAN), wide area network (WAN), the internet, and such. The network (106) may either be a dedicated network or a shared network, which represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), and Wireless Application Protocol (WAP), to communicate with each other. Further, the network (106) may include a variety of network devices, including routers, bridges, servers, computing devices, storage devices. The network devices within the network (106) may interact with the system (100) through communication links.
[021] The system (100) supports various connectivity options such as BLUETOOTH®, USB, ZigBee, and other cellular services. The network environment enables connection of various components of the system (100) using any communication link including Internet, WAN, MAN, and so on. In an exemplary embodiment, the system (100) is implemented to operate as a stand-alone device. In another embodiment, the system (100) may be implemented to work as a loosely coupled device to a smart computing environment. Further, the system (100) comprises at least one memory with a plurality of instructions, one or more databases (112), and one or more hardware processors (108) which are communicatively coupled with the at least one memory to execute a plurality of modules (114) therein. The components and functionalities of the system (100) are described further in detail.
[022] Referring FIG. 2, illustrates a flowchart (200) of the system (100) for an automatic code refactoring for a distributed training on multiple processing units, in accordance with an example embodiment. A tree based approach enables efficient detection of places where new statements need to be added, detection of user variables and inserting new nodes allows one to also maintain code and data flow. It requires no intervention from user and works automatically for various underlying frameworks. This reduces the huge effort required by experts to learn new tools and syntax to run their machine learning code more efficiently. It enables model training process to make efficient use of underlying hardware and interns optimizes the training time and cost.
[023] Herein, the one or more I/O interfaces (104) are configured to receive the training code to enable a distributed training of a plurality of machine learning models. The training code comprising one or more statements. Typically, deep learning or machine learning training code have a well-defined structure and flow. Some of the code structures are relatively easy to detect and modify automatically however, detecting and modifying user defined variable and statements for defining model optimizer, call backs and model check pointing are extremely challenging. These variables and statements can practically take varying forms and names hence usual text based searching and parsing is not a feasible option.
[024] Further, to detect and modify user variables and statements, it is required to identify the structure of the training code including a syntax, a type of the training code statement, one or more function definitions, one or more calls etc. A tree based representation of the training code enables efficient detection of places where new statements need to be added, detection of user variables and inserting new nodes allows one to also maintain code and data flow. The tree consists of a root node and each statement of the code represented by a node. Each node in the tree may be a child node to the root node or parent node to multiple child nodes depending on the training code statements. This representation allows the system (100) to detect variables and statements by applying conditions on node properties and manipulate the code flow by inserting new nodes.
[025] In another embodiment, the system (100) is configured to parse the received training code to get a tree format of the training code using an abstract syntax tree model. Each of the one or more statements of the training code is represented by a node in the tree format. The tree based approach enable efficient detection of places where new statements need to be added, detection of user variables and inserting new nodes allows one to also maintain code and data flow. It does not require any intervention from the user and works automatically for various underlying frameworks. It reduces the huge effort required by data scientists to learn new tool and syntax to run their machine learning code more efficiently.
[026] In yet another embodiment, the system (100) is configured to analyze the received training code to determine a list of imports to be used in the training code. Usually, one or more import statements in a code are declared at the start of the program. When a model is trained, optimizer is the main component resulting in how the model weights are updated. There are different types of optimizers available to reduce the loss function of a machine learning model training. Some of these are Adam, RMSProp, SGD, AdaGrad etc. When defining optimizer, parameters like learning rate, decay, beta etc. are also defined which controls how fast and slow the loss function converges.
[027] Further, the system (100) is configured to detect a location of a last import node from the determined list of imports. It is to be noted that a parallelization based imports are placed at the end of the detected location. Inserting new import nodes at the end of the detected location which are required for distributed strategies and parallel computation. When a new import node has to be inserted, the system (100) loops through all the parent nodes of the tree.
[028] In one example, wherein if the import statements have ended, the system (100) stops the loop and saves the location of this last import node after which the new import code will be inserted. Further, a new node for required import is created using new_node = ast.Import(params) and the system inserts this created new node at the detected location using tree.body.insert (new_node, detected_location).
[029] In another embodiment, the system (100) is configured to identify at least one node containing a variable declaration of optimizer, wherein the variable declaration comprises a general optimizer text. For example, a general optimizer declaration is optimizer = Adam(lr=0.1, ??=0.01) where optimizer is the name of variable, Adam is the optimizer function, lr & ?? are the arguments to the optimizer function. Since there are a limited number of optimizer functions available in a ML framework, the system (100) detects the optimizer function call node by detecting optimizer related keywords like Adam in the example mentioned above.
[030] In one aspect, when the machine learning model is trained, the input is passed through all the layers of neural network. At the end of last layer, a resultant output is compared with the actual labelled output and error is calculated. The optimizer is the core component defining how the weights of the layers may be updated. Different optimizer functions follow different approaches to reduce the error and achieve minimum value for loss function. So, based on the calculated error and the suitable optimizer function selected, the layer weights are updated through back propagation. This process if repeated for all the training samples passing through the network and complete cycle is repeated multiple times until the model learns the input-output relationship and gives lowest error value.
[031] Further, the system (100) is configured to modify the identified at least one optimizer node to split a plurality of tasks among one or more workers. So, when the system (100) detects the optimizer node, since the optimization task has to be split between multiple workers, the learning rate has to be divided by the number of workers available. Hence the system (100) detects number of ways how the optimizer variable is defined i.e. if the learning rate is assigned a value directly or through a variable or if the arguments or keyword arguments are used. If assigned directly, the float value is modified else the variable containing the value is modified. For example, if the optimizer is defined as optimizer = Adam(lr=0.1, ??=0.01), and the number of workers available is 4, the system (100) detects how the learning rate is defined and the resulting modified optimizer node will be Adam(lr=0.1/4, ??=0.01).
[032] The optimizer statement node can have several inputs or arguments like learning rate, ??, decay rates etc. These arguments can potentially be defined in many ways including assigning a value to them directly or first defining a variable and its value and passing it to the optimizer. All of these scenarios are checked for and then the optimizer node is modified in a modify function.
[033] Furthermore, the system (100) is configured to identify at least one node containing declaration of callbacks used in the training based on a pattern matching with a node containing callback related keywords. To run the training in the distributed manner, it is required to orchestrate the process which involve broadcasting the training parameters to all worker processes, sharing local and global parameters and variables among worker processes. This is achieved with the help of callbacks which are provided by underlying framework. It is required to insert new callbacks to facilitate the distributed training.
[034] The callback node containing declaration of list of callbacks is detected using 3 based on the underlying framework properties of the callbacks and their definitions which are stored in R_callback. Once callbacks list declaration node is found, additional callbacks required for distributed strategies are inserted to the callback list node. These additional callbacks help broadcast training parameters and local as well as global variables to all worker processes required for the training. In case no callback declaration is found, then the training code assumed to be of batch training type and the refactoring process moves to next step of detecting and modifying model writer.
[035] Further, the system (100) is configured to modify the identified at least one node containing declaration of callbacks to broadcast training parameters and variables required for the training. In general, several parameters and model weights and biases are logged during the training process. In a distributed training scenario, this logging may need to be done only by the master process. To enable this, it is required to detect corresponding node for the model parameter logging and modify it by adding the conditional check to it. case of batch train, model with best metric is saved to disk given the metric has a better value than previous epochs. This model writing will need to be done by only one worker.
[036] In another embodiment, the system (100) is configured to convert the modified nodes of the tree to a source code to get a refactored training code. The new modified tree is converted back to source code and then saved in the same directory as the original file. This new source code is the refactored code which is capable of carrying out distributed training.
[037] Referring FIG. 3, to illustrate a processor-implemented method (300) for an automatic code refactoring for distributed training on multiple processing units is provided.
[038] Initially, at the step (302), receiving, via an input/output interface, the training code to enable a distributed training of a plurality of machine learning models, wherein the training code comprising one or more statements.
[039] At the next step (304), parsing the received training code to get a tree format of the training code using an abstract syntax tree model. Each of the one or more statements of the training code is represented by a node in the tree format. The tree format of the training code comprises of a syntax, and a variable information of the code.
[040] At the next step (306), analyzing the tree format of the training code to determine a list of imports used in the training code.
[041] At the next step (308), detecting a location of a last import node from the determined list of imports, wherein placing a parallelization based imports at the end of the detected location.
[042] At the next step (310), identifying at least one node containing a variable declaration of optimizer, wherein the variable declaration comprises a common optimizer text.
[043] At the next step (312), modifying the identified at least one node to split a plurality of the model training processes among one or more hardware units.
[044] At the next step (314), identifying at least one node containing declaration of callbacks used in the training based on a pattern matching with a node containing callback related keywords.
[045] At the next step (316), modifying the identified at least one node containing declaration of callbacks to broadcast training parameters and variables required for the training.
[046] At the last step (318), converting the modified nodes of the tree to a source code to get a refactored training code.
[047] The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
[048] The embodiments of present disclosure herein address the underutilization of underlying hardware is being used by a training process. In most observed cases, training of machine learning models does not utilize the available infrastructure to the full extent. All well-known machine learning frameworks by default consume only one GPU resource even if multiple GPUs are available to share the training load. The system herein offers additional distributed strategies and modules which need to be used for developing or transforming the training code to leverage the multiple hardware. Embodiment herein provide a method and system for an automatic code refactoring for distributed training on multiple processing units. It is a tree based automatic code refactoring system which refactors the training code for the distributed training on multiple CPUs and GPUs.
[049] It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g., any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g., hardware means like e.g., an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g., an ASIC and an FPGA, or at least one microprocessor and at least one memory with software modules located therein. Thus, the means can include both hardware means, and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g., using a plurality of CPUs.
[050] The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various modules described herein may be implemented in other modules or combinations of other modules. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
[051] The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
[052] Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media. , Claims: We Claim:
1. A processor-implemented method (300) for an automatic code refactoring for a distributed training on multiple processing units comprising steps of:
receiving (302), via an input/output interface, a training code to enable the distributed training of a plurality of machine learning models, wherein the training code comprising one or more statements;
parsing (304), via the one or more hardware processors, the received training code to get a tree format of the training code using an abstract syntax tree model, wherein the tree format of the training code comprises of a syntax, and a variable information of the code;
analyzing (306), via the one or more hardware processors, the tree format of the training code to determine a list of imports used in the training code;
detecting (308), via the one or more hardware processors, a location of a last import node from the determined list of imports;
identifying (310), via the one or more hardware processors, at least one node containing a variable declaration of an optimizer, wherein the variable declaration comprises a common optimizer text;
modifying (312), via the one or more hardware processors, the identified at least one node to split a plurality of the model training processes among one or more hardware units;
identifying (314), via the one or more hardware processors, at least one node containing a declaration of callbacks used in the training based on a pattern matching with nodes containing callback related keywords;
modifying (316), via the one or more hardware processors, the identified at least one node containing the declaration of callbacks to broadcast training parameters and variables required for the training; and
converting (318), via the one or more hardware processors, the modified nodes of the tree to a source code to get a refactored training code.
2. The processor-implemented method (300) of claim 1, wherein each of the one or more statements of the training code is represented by a node in the tree format.
3. The processor-implemented method (300) of claim 1, wherein placing a parallelization based imports at the end of the detected location.
4. The processor-implemented method (300) of claim 1, wherein structure of the training code including a syntax, a type of the training code statement, one or more function definitions, and one or more calls.
5. The processor-implemented method (300) of claim 1, wherein each node in the tree format is either a child node to the root node or a parent node to one or more child nodes depending on the training code statements.
6. A system (100) for an automatic code refactoring for distributed training on multiple processing units comprising:
an input/output interface (104) to receive a training code to enable the distributed training of a plurality of machine learning models, wherein the training code comprising one or more statements;
a memory (110) in communication with the one or more hardware processors (108), wherein the one or more hardware processors are configured to execute programmed instructions stored in the memory to:
parse the received training code to get a tree format of the training code using an abstract syntax tree model, wherein the tree format of the training code comprises of a syntax, and a variable information of the code;
analyze the tree format of the training code to determine a list of imports used in the training code;
detect a location of a last import node from the determined list of imports, wherein placing a parallelization based imports at the end of the detected location;
identify at least one node containing a variable declaration of an optimizer, wherein the variable declaration comprises a common optimizer text;
modify the identified at least one node to split a plurality of the model training processes among one or more hardware units;
identify at least one node containing a declaration of callbacks used in the training based on a pattern matching with nodes containing callback related keywords;
modify the identified at least one node containing the declaration of callbacks to broadcast training parameters and variables required for the training; and
convert the modified nodes of the tree to a source code to get a refactored training code.
7. A non-transitory computer readable medium storing one or more instructions which when executed by one or more processors on a system, cause the one or more processors to perform method comprising:
receiving, via an input/output interface, a training code to enable the distributed training of a plurality of machine learning models, wherein the training code comprising one or more statements;
parsing, via one or more hardware processors, the received training code to get a tree format of the training code using an abstract syntax tree model, wherein the tree format of the training code comprises of a syntax, and a variable information of the code;
analyzing, via one or more hardware processors, the tree format of the training code to determine a list of imports used in the training code;
detecting, via the one or more hardware processors, a location of a last import node from the determined list of imports, wherein placing a parallelization based imports at the end of the detected location;
identifying, via the one or more hardware processors, at least one node containing a variable declaration of an optimizer, wherein the variable declaration comprises a common optimizer text;
modifying, via the one or more hardware processors, the identified at least one node to split a plurality of the model training processes among one or more hardware units;
identifying, via the one or more hardware processors, at least one node containing a declaration of callbacks used in the training based on a pattern matching with nodes containing callback related keywords;
modifying, via the one or more hardware processors, the identified at least one node containing the declaration of callbacks to broadcast training parameters and variables required for the training; and
converting, via the one or more hardware processors, the modified nodes of the tree to a source code to get a refactored training code.

Dated this 28th day of April 2022

Tata Consultancy Services Limited
By their Agent & Attorney

(Adheesh Nargolkar)
of Khaitan & Co
Reg No IN-PA-1086

Documents

Application Documents

#	Name	Date
1	202221025056-STATEMENT OF UNDERTAKING (FORM 3) [28-04-2022(online)].pdf	2022-04-28
2	202221025056-REQUEST FOR EXAMINATION (FORM-18) [28-04-2022(online)].pdf	2022-04-28
3	202221025056-PROOF OF RIGHT [28-04-2022(online)].pdf	2022-04-28
4	202221025056-FORM 18 [28-04-2022(online)].pdf	2022-04-28
5	202221025056-FORM 1 [28-04-2022(online)].pdf	2022-04-28
6	202221025056-DRAWINGS [28-04-2022(online)].pdf	2022-04-28
6	202221025056-FIGURE OF ABSTRACT [28-04-2022(online)].jpg	2022-04-28
7	202221025056-DRAWINGS [28-04-2022(online)].pdf	2022-04-28
8	202221025056-DECLARATION OF INVENTORSHIP (FORM 5) [28-04-2022(online)].pdf	2022-04-28
9	202221025056-COMPLETE SPECIFICATION [28-04-2022(online)].pdf	2022-04-28
10	202221025056-FORM-26 [23-06-2022(online)].pdf	2022-06-23
11	Abstract1.jpg	2022-08-08
12	202221025056-FER.pdf	2025-04-02

Search Strategy

1	refcE_16-03-2024.pdf