Method And System To Process Asynchronous And Distributed Training

< Back

Method And System To Process Asynchronous And Distributed Training Tasks

Abstract: ABSTRACT METHOD AND SYSTEM TO PROCESS ASYNCHRONOUS AND DISTRIBUTED TRAINING TASKS This disclosure relates generally relates to method and system to process asynchronous and distributed training tasks. Training a large-scale deep neural network (DNN) model with large-scale training data is time-consuming. The method creates a work queue (Q) with a set of predefined number of tasks comprising a training data. Here, set of central processing units (CPUs) information and a set of graphics processing units (GPUs) information are fetched from the current environment to initiate a parallel process asynchronously on the work queue (Q) to train a set of deep learning models with optimized resources using a data pre-processing technique, to compute a transformed training data and training by using an asynchronous model training technique, the set of deep learning models on each GPU asynchronously with the transformed training data based on a set of asynchronous model parameters. [To be published with FIG. 3]

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

17 March 2022

Publication Number

38/2023

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

Tata Consultancy Services Limited

Nirmal Building, 9th floor, Nariman point, Mumbai 400021, Maharashtra, India

Inventors

1. KALELE, Amit

Tata Consultancy Services Limited, S2 Torna, Sahyadri Park, Plot No. 2, 3, Rajiv Gandhi infotech Park, Phase III, Hinjawadi-Maan, Pune 411057, Maharashtra, India

2. SUBBIAH, Ravindran

Tata Consultancy Services Limited Peepul Park, TECHNOPARK PARK CAMPUS, KARIAVATTOM PO, THIRUVANANTHAPURAM 695581, Kerala, India

3. JAIN, Anubhav

Tata Consultancy Services Limited Phase-II, Sector-80, Block C, Noida 201305, Uttar Pradesh, India

Specification

Claims:We Claim: A processor implemented method to process asynchronous and distributed training tasks, the method comprising: creating (302), via one or more hardware processors 104, a work queue (Q) with a set of predefined number of tasks, where each task comprises of a training data obtained from one or more sources, and allocating estimated resources to process the work queue (Q) asynchronously; fetching (304), via the one or more hardware processors 104, atleast one of a set of central processing units (CPUs) information and a set of graphics processing units (GPUs) information of the current environment where the task is being processed; computing (306) via the one or more hardware processors 104, by using a resource allocator, a number of parallel processes (p) queued on each CPU, a number of parallel processes (q) queued on each GPU, a number of iterations, and a flag status; and initiating (308) via the one or more hardware processors 104, a parallel process asynchronously on the work queue (Q) to train a set of deep learning models for resource optimization by, processing each task by using a data pre-processing technique, to compute a transformed training data based on atleast one of the training data, the number of iterations, and the number of parallel processes (p) queued on each CPU; and training by using an asynchronous model training technique, the set of deep learning models on each GPU asynchronously with the transformed training data based on a set of asynchronous model parameters. The processor implemented method of claim 1, wherein computing the transformed training data of each task using the data pre-processing technique comprises: obtaining, the training data, the number of iterations, and the number of parallel processes (p) to be queued on each CPU; creating, an empty queues for the work queue (Q) and an output queue; appending, the work queue (Q) with the training data and a data transformation function based on the number of iterations; creating, (p) parallel processes to be queued to execute the task and scan the work queue (Q); and checking, if the work queue (Q) is not null to process the task, and if the flag status is zero, compute the transformed training data from the data transforming function, and save the transformed training data into a data storage with a unique identifier, if the flag status is non-zero, compute the training data with a user process without decoupling and writing the output data into the output queue, and delete the task from the work queue after processing the task. The processor implemented method of claim 1, wherein the set of asynchronous model parameters comprises (i) a selected deep learning model to be trained (ii) the number of iterations, (iii) the transformed training data, (iv) a file path of the transformed training data, and (v) the number of parallel processes (q) queued on GPU, and (vi) a number of available GPUs. The processor implemented method of claim 1, wherein training the set of deep learning models on each GPU with the transformed training data using the asynchronous model training technique comprises: obtaining, the set of asynchronous model parameters and initializing an empty list of processed files, and a count of processed files to zeros; and checking, the count of processed files is not equal to the number of iterations and iteratively perform when the number of iterations are processed by, scanning, for a new training data file to a specified path based on the flag status and if the new training data file is detected determine the file processing status; iteratively scanning, for the new training data files for processing in the writing mode and mark as processed files, and update the new training data file; loading, the new training data file with the transformed training data; and training, a set of deep learning models on each GPU with parallel processes (q) queued on the GPU with the transformed training data and its corresponding weights and save the set of deep learning models. A system (100) to process asynchronous and distributed training tasks, comprising: a memory (102) storing instructions; one or more communication interfaces (106); and one or more hardware processors (104) coupled to the memory (102) via the one or more communication interfaces (106), wherein the one or more hardware processors (104) are configured by the instructions to: create, a work queue (Q) with a set of predefined number of tasks, where each task comprises of a training data obtained from one or more sources, and allocating estimated resources to process the work queue (Q) asynchronously; fetch, atleast one of a set of central processing units (CPUs) information and a set of graphics processing units (GPUs) information of the current environment where the task is being processed; compute, by using a resource allocator, a number of parallel processes (p) queued on each CPU, a number of parallel processes (q) queued on each GPU, a number of iterations, and a flag status; and initiate, a parallel process asynchronously on the work queue (Q) to train a set of deep learning models with optimized resources by, processing each task by using a data pre-processing technique, to compute a transformed training data based on atleast one of the training data, the number of iterations, and the number of parallel processes (p) queued on each CPU, and training by using an asynchronous model training technique, the set of deep learning models on each GPU asynchronously with the transformed training data based on a set of asynchronous model parameters. The system of claim 5, wherein computing the transformed training data of each task using the data pre-processing technique comprises: obtain, the training data, the number of iterations, and the number of parallel processes (p) to be queued on each CPU; create, an empty queues for the work queue (Q) and an output queue; append, the work queue (Q) with the training data and a data transformation function based on the number of iterations; create, (p) parallel processes to be queued to execute the task and scan the work queue (Q); and check, if the work queue (Q) is not null to process the task, and if the flag status is zero, compute the transformed training data from the data transforming function, and save the transformed training data into a data storage with a unique identifier, if the flag status is non-zero, compute the training data with a user process without decoupling and writing the output data into the output queue, delete the task from the work queue after processing the task. The system of claim 5, wherein the set of asynchronous model parameters comprises (i) a selected deep learning model to be trained (ii) the number of iterations, (iii) the transformed training data, (iv) a file path of the transformed training data, and (v) the number of parallel processes (q) queued on GPU, and (vi) a number of available GPUs. The system of claim 5, wherein training the set of deep learning models on each GPU with the transformed training data using the asynchronous model training technique comprises: obtain, the set of asynchronous model parameters and initialize an empty list of processed files and a count of processed files to zeros; and check, the count of processed files is not equal to the number of iterations and iteratively perform when the number of iterations are processed by, scan, for a new training data file to a specified path based on the flag status and if the new training data file is detected determine the file processing status; iteratively scan, for the new training data files for processing in the writing mode and mark as processed files, and update the new training data file; load, the new training data file with the transformed training data; and train, a set of deep learning models on each GPU with parallel processes (q) queued on the GPU with the transformed training data and its corresponding weights and save the set of deep learning models. Dated this 17th Day of March 2022 Tata Consultancy Services Limited By their Agent & Attorney (Adheesh Nargolkar) of Khaitan & Co Reg No IN-PA-1086 , Description:FORM 2 THE PATENTS ACT, 1970 (39 of 1970) & THE PATENT RULES, 2003 COMPLETE SPECIFICATION (See Section 10 and Rule 13) Title of invention: METHOD AND SYSTEM TO PROCESS ASYNCHRONOUS AND DISTRIBUTED TRAINING TASKS Applicant Tata Consultancy Services Limited A company Incorporated in India under the Companies Act, 1956 Having address: Nirmal Building, 9th floor, Nariman point, Mumbai 400021, Maharashtra, India Preamble to the description: The following specification particularly describes the invention and the manner in which it is to be performed. TECHNICAL FIELD The disclosure herein generally relates to asynchronous training process, and, more particularly, to method and system to process asynchronous and distributed training tasks. BACKGROUND Tremendous evolution of deep learning (DL) in various fields increases its use of large-scale data and models with higher accuracy specifically in the areas such as natural language processing and computer vision. Training a large-scale deep neural network (DNN) model with large-scale training data is time-consuming. Traditional Artificial Intelligence (AI) techniques require significant resource requirements rendering modeling efforts infeasible in resource constrained environments for training the model. Deep learning models achieve high accuracy but training them requires huge data and processing resources. Training of deep learning models is typically carried out in a distributed fashion on multiple CPUs and GPUs with various frameworks and mechanisms. To speed up such massive DNN models training parallel distributed training methodology is widely adopted. Synchronous distributed training in general has good convergence rate across all workers. But synchronization overhead becomes larger as the number of workers and the size of model increase, which degrades the training performance. Several synchronization overhead with more than 70% of entire training requires more workers. Hence, performance issues can be more serious in heterogeneous environments where there are workers with different training speeds. In general, there is a tradeoff between model accuracy and training performance that reduces overhead, but it inevitably results significant difference among local models of workers. However, existing state of the art techniques do not consider the scale of training data training performance while maintaining the degradation of model accuracy. SUMMARY Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one embodiment, a system to process asynchronous and distributed training tasks is provided. The system includes creating a work queue (Q) with a set of predefined number of tasks, where each task comprises of a training data obtained from one or more sources, and allocating estimated resources to process the work queue (Q) asynchronously. Further, atleast one of a set of central processing units (CPUs) information and a set of graphics processing units (GPUs) information of the current environment are fetched where the task is being processed. Further, using a resource allocator is computed for a number of parallel processes (p) queued on each CPU, and a number of parallel processes (q) queued on each GPU, a number of iterations, and a flag status. Then, a parallel process asynchronously on the work queue (Q) are initiated to train a set of deep learning models with optimized resources by, processing each task by using a data pre-processing technique, to compute a transformed training data based on atleast one of the training data, the number of iterations, and the number of parallel processes (p) queued on each CPU. Further, by using an asynchronous model training technique, the set of deep learning models are trained on each GPU asynchronously with the transformed training data based on a set of asynchronous model parameters. In accordance with an embodiment of the present disclosure, the transformed training data of each task using the data pre-processing technique is computed by performing the steps of: obtaining, the training data, the number of iterations, and the number of parallel processes (p) to be queued on each CPU; creating, an empty queues for the work queue (Q) and an output queue; appending, the work queue (Q) with the training data and a data transformation function based on the number of iterations; creating, (p) parallel processes to be queued to execute the task and scan the work queue (Q); and checking, if the work queue (Q) is not null to process the task, and if the flag status is zero, compute the transformed training data from the data transforming function, and save the transformed training data into a data storage with a unique identifier, if the flag status is non-zero, compute the training data with a user process without decoupling and writing the output data into the output queue, and finally delete the task from the work queue after processing the task. In accordance with an embodiment of the present disclosure, the set of asynchronous model parameters comprises (i) a selected deep learning model to be trained (ii) the number of iterations, (iii) the transformed training data, (iv) a file path of the transformed training data, and (v) the number of parallel processes (q) queued on GPU, and (vi) a number of available GPUs. In accordance with an embodiment of the present disclosure, training the set of deep learning models on each GPU with the transformed training data using the asynchronous model training technique comprises: obtaining, the set of asynchronous model parameters and initializing an empty list of processed files and a count of processed files to zeros; and checking, the count of processed files is not equal to the number of iterations and iteratively perform when the number of iterations are processed by, scanning for a new training data file to a specified path based on the flag status and if the new training data file is detected determine the file processing status; iteratively scanning for the new training data files for processing in the writing mode and mark as processed files, and update the new training data file; loading the new training data file with the transformed training data; and training a set of deep learning models on each GPU with parallel processes (q) queued on the GPU with the transformed training data and its corresponding weights and save the set of deep learning models. In another aspect, a method to process asynchronous and distributed training tasks is provided. The method includes creating a work queue (Q) with a set of predefined number of tasks, where each task comprises of a training data obtained from one or more sources, and allocating estimated resources to process the work queue (Q) asynchronously. Further, atleast one of a set of central processing units (CPUs) information and a set of graphics processing units (GPUs) information of the current environment are fetched where the task is being processed. Further, using a resource allocator is computed for a number of parallel processes (p) queued on each CPU, and a number of parallel processes (q) queued on each GPU, a number of iterations, and a flag status. Then, a parallel process asynchronously on the work queue (Q) are initiated to train a set of deep learning models with optimized resources by, processing each task by using a data pre-processing technique, to compute a transformed training data based on atleast one of the training data, the number of iterations, and the number of parallel processes (p) queued on each CPU. Further, by using an asynchronous model training technique, the set of deep learning models are trained on each GPU asynchronously with the transformed training data based on a set of asynchronous model parameters. In yet another aspect, a non-transitory computer readable medium for creating a work queue (Q) with a set of predefined number of tasks, where each task comprises of a training data obtained from one or more sources, and allocating estimated resources to process the work queue (Q) asynchronously. Further, atleast one of a set of central processing units (CPUs) information and a set of graphics processing units (GPUs) information of the current environment are fetched where the task is being processed. Further, using a resource allocator is computed for a number of parallel processes (p) queued on each CPU, and a number of parallel processes (q) queued on each GPU, a number of iterations, and a flag status. Then, a parallel process asynchronously on the work queue (Q) are initiated to train a set of deep learning models with optimized resources by, processing each task by using a data pre-processing technique, to compute a transformed training data based on atleast one of the training data, the number of iterations, and the number of parallel processes (p) queued on each CPU. Further, by using an asynchronous model training technique, the set of deep learning models are trained on each GPU asynchronously with the transformed training data based on a set of asynchronous model parameters. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed. BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles: FIG. 1 illustrates an exemplary system for asynchronous and distributed process of training tasks performed in parallel in accordance with some embodiments of the present disclosure. FIG. 2 illustrates an exemplary functional block diagram of the system showing process of asynchronous and distributed training tasks, in accordance with some embodiments of the present disclosure. FIG. 3 illustrates a process flow diagram of a method of deep learning models trained in asynchronous mode, in accordance with some embodiments of the present disclosure. FIG. 4 illustrates a graphical representation of central processing units (CPUs) and graphics processing units (GPUs) plotted with time for sequential training process in deep learning models, in accordance with some embodiments of the present disclosure. FIG. 5 illustrates a graphical representation of CPUs and GPUs multiprocessing plotted with time for synchronous distributed training processes in deep learning models, in accordance with some embodiments of the present disclosure. FIG. 6 illustrates a graphical representation of CPUs and GPUs plotted with time for asynchronous and distributed training process in deep learning models, in accordance with some embodiments of the present disclosure. FIG. 7 illustrates experimental results plotted for GPU utilization with asynchronous training process, in accordance with some embodiments of the present disclosure. FIG. 8 illustrates experimental results of the performance improvement of optimized asynchronous training process for deep learning models in comparison with baseline models, in accordance with some embodiments of the present disclosure. DETAILED DESCRIPTION OF EMBODIMENTS Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments. Embodiments herein provide a method and system for process asynchronous and distributed training tasks. The method disclosed, enables training the training data of deep learning models asynchronously with improved training performance. With increasing scale of models and data, every iteration of parallel distributed training includes computation, data execution and communication. Training a large-scale deep neural network (DNN) model with large-scale data is time-consuming. To speed up the training of massive DNN models, data-parallel distributed training has been widely adopted. In general, synchronous training suffers from synchronization overhead specifically in heterogeneous environments. To reduce the synchronization overhead, asynchronous based training employs asynchronous communication between a data pre-processing module and an asynchronous model training module such that each task is executed independently eliminating waiting time. Conventionally, training the deep learning model involves a data pre-processing phase and an asynchronous model training phase. The method of the present disclosure performs parallel execution of such processes asynchronously distributing simultaneous data pre-processing phase or transformation preceded by training the deep learning models. Such asynchronous approach accelerates training and optimizes resource utilization cost. Also, the system and method of the present disclosure are time efficient, accurate and scalable with employed asynchronous approach. The disclosed system is further explained with the method as described in conjunction with FIG.1 to FIG.8 below. Referring now to the drawings, and more particularly to FIG. 1 through FIG.8, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method. FIG. 1 illustrates an exemplary system for asynchronous and distributed process of training tasks performed in parallel in accordance with some embodiments of the present disclosure. In an embodiment, the system 100 includes one or more hardware processors 104, communication interface device(s) or input/output (I/O) interface(s) 106 (also referred as interface(s)), and one or more data storage devices or memory 102 operatively coupled to the one or more hardware processors 104. The one or more processors 104 may be one or more software processing components and/or hardware processors. In an embodiment, the hardware processors can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the processor(s) is/are configured to fetch and execute computer-readable instructions stored in the memory. In an embodiment, the system 100 can be implemented in a variety of computing systems, such as laptop computers, notebooks, hand-held devices, workstations, mainframe computers, servers, a network cloud, and the like. The I/O interface device(s) 106 can include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like and can facilitate multiple communications within a wide variety of networks N/W and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. In an embodiment, the I/O interface device(s) can include one or more ports for connecting a number of devices to one another or to another server. The memory 102 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random-access memory (SRAM) and dynamic-random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. The memory 102 further comprises (or may further comprise) information pertaining to input(s)/output(s) of each step performed by the systems and methods of the present disclosure. In other words, input(s) fed at each step and output(s) generated at each step are comprised in the memory 102 and can be utilized in further processing and analysis. FIG. 2 illustrates an exemplary functional block diagram of the system showing process of asynchronous and distributed training tasks, in accordance with some embodiments of the present disclosure. FIG.2 includes a system 100 comprising a data preprocessing module and an asynchronous model training, where processing of each task in each models are executed parallelly asynchronously based on the number of available resources. Here, a set of graphics processing units (GPUs) and a set of central processing units (CPUs) available in the system process the input data utilizing the data pre-processing module and the asynchronous model training module. Randomized preprocessing computations are executed in parallel which asynchronously overlaps results of computations. Here, one process performs the asynchronous model training on the set of GPUs that gets triggered for next epoch data is available to train the model training. This reduces the GPU waiting time by feeding preprocessed data rapidly resulting on overlapping of preprocessing and training asynchronously. The method of the present disclosure first decouples the sequential tasks into separate tasks and carries out computation in asynchronous distributed manner which results, Refactoring sequential tasks into a set of decoupled independent tasks. Removes synchronization between the parallel processes such that each processes picks up new tasks of data transformation without waiting for other processes to finish their data transformation tasks. This enables optimal utilization of resources. Removes synchronization between the data transformation module and the asynchronous model training. Training the tasks by processing the available transformed data without any explicit communication of processed data and synchronization ensures that training process occupies provided resource and so that utilization is maximized. The present disclosure is further explained considering an example, where the system 100 processes a set of training tasks received from the user using the system of FIG.1 and FIG.2. FIG. 3 depicts a flow diagram illustrating a method of deep learning models trained in asynchronous mode, in accordance with some embodiments of the present disclosure. In an embodiment, the live video streaming system 100 comprises one or more data storage devices or the memory 102 operatively coupled to the processor(s) 104 and is configured to store instructions for execution of steps of the method 300 by the processor(s) or one or more hardware processors 104. The steps of the method 300 of the present disclosure will now be explained with reference to the components or blocks of the live video streaming system 100 as depicted in FIG.2 through FIG.8. Although process steps, method steps, techniques or the like may be described in a sequential order, such processes, methods and techniques may be configured to work in alternate orders. In other words, any sequence or order of steps that may be described does not necessarily indicate a requirement that the steps to be performed in that order. The steps of processes described herein may be performed in any order practical. Further, some steps may be performed simultaneously. Referring now to the steps of the method 300, at step 302, the one or more hardware processors 104 create a work queue (Q) with a set of predefined number of tasks, where each task comprises of a training data obtained from one or more sources, and allocating estimated resources to process the work queue (Q) asynchronously. Consider an example where the system 100 receives a set of tasks from one or more external sources for training a set of deep learning models. The work queue (Q) is created with a set of predefined number of tasks. Each task is processed from the work queue (Q) where each task includes a training data to train the set of deep learning models. Further, resources are estimated and the estimated resources are allocated to process each task asynchronously. In one embodiment, the system 100 is decoupled with the data pre-processing module and the asynchronous model training module where each module are executed in parallel to perform each training task. The input sequential tasks are referred as user process function which the system 100 fetches to process by using the data preprocessing preceded by the model training. FIG. 4 illustrates a graphical representation of CPUs and GPUs plotted with time for sequential training process in deep learning models, in accordance with some embodiments of the present disclosure. FIG.4 depicts the training involving multiple tasks like loading data, processing of data involving different data transformations and then finally training model with the processed data. These tasks are sequential in nature and carried out one after other due to dependency on the outcome of previous task. All the distributed training frameworks and mechanism in the state of the art carried out these steps in sequence on different data partitions in parallel or distributed fashion. This distributed computing approach inherently assumes that training steps is the most time consuming in entire processing. However, many computer vision applications involves the data preprocessing and transformations which equally computes intensive and time consuming. These transformation are applied on run-time for augmenting the training data for achieving model robustness. This creates huge imbalance in typical distributed training and results in compute delays at one end and on other end large number of expensive resources are wasted. Referring back now to the steps of the method 300, at step 304, the one or more hardware processors 104 fetch atleast one of a set of central processing units (CPU’s) information and a set of graphics processing units (GPUs) information of the current environment where the task is being processed. For the input task the system 100 fetches the information related to the CPUs denoted as (n_(cpu )) and the GPUs denoted as (n_(gpu )). It is to be noted that randomized process has large processing time with variations represented as (n_(cpu )) and (n_(gpu )) with number of CPU cores and available GPUs for computations. Referring now to the steps of the method 300, at step 306, the one or more hardware processors 104 compute by using a resource allocator, a number of parallel processes (p) queued on each CPU, a number of parallel processes (q) queued on each GPU, a number of iterations, and a flag status. Table 1 - Asynchronous process training Data: number of iterations n_iter, user_process() Data: flag Output: Trained model M n_cpu,n_gpu=fetch_resource_info(); p,q=resource_allocator (n_iter,M,d_process(),train()); if flag ==0 then d_process(),train() = refractor (user_process()); dispatch,run (f1(p,d_process()); dispatch.run (f2 (q,train (),M); end dispatch.run (f1(p,user_process()); end Referring to the above example, to process the task resource allocator computes the number of parallel process queued on each CPU and the number of parallel processes (q) queued on each GPU as depicted in Table 1. Referring now to the steps of the method 300, at step 308, the one or more hardware processors 104 initiate a parallel process asynchronously on the work queue (Q) to train a set of deep learning models with optimized resources. Each task is performed asynchronously decoupled with the data pre-processing technique and the asynchronous model training technique. The data pre-processing technique (referring now to FIG.6) computes a transformed training data based on atleast one of the training data, the number of iterations, and the number of parallel processes (p) queued on each CPU. FIG. 6 illustrates a graphical representation of CPUs and GPUs plotted with time for asynchronous and distributed training process in deep learning models. The data pre-processing technique (Table 2) performs the following steps on each task being performed, Step 1 – obtaining a set of parameters comprising the training data, the number of iterations, and the number of parallel processes (p) to be queued on each CPU. Step 2 – creating an empty queues for the work queue (Q) and an output queue. Step 3 - appending the work queue (Q) with the training data and a data transformation function based on the number of iterations. Step 4 - creating (p) parallel processes to be queued to execute the task and scan the work queue (Q). Step 5 - checking if the work queue (Q) is not null to process the task, and (a) if the flag status is zero, the transformed training data is computed from the data transforming function, and saving the transformed training data into a data storage with a unique identifier, (b) if the flag status is non-zero, the training data is computed with a user process without decoupling and writing the output data into the output queue; (c) delete the task from the work queue after processing the task. Table 2 - Data pre-processing technique Data: Training data D, number of iterations n_iter Data: Processes p, compute functions d_process() Result: D_trans_p Initialize Q,Q_out?NULL; for i in range (n_iter) do append Q?D_p ,d_process(); end if p

Documents

Application Documents

#	Name	Date
1	202221014863-STATEMENT OF UNDERTAKING (FORM 3) [17-03-2022(online)].pdf	2022-03-17
2	202221014863-REQUEST FOR EXAMINATION (FORM-18) [17-03-2022(online)].pdf	2022-03-17
3	202221014863-PROOF OF RIGHT [17-03-2022(online)].pdf	2022-03-17
4	202221014863-FORM 18 [17-03-2022(online)].pdf	2022-03-17
5	202221014863-FORM 1 [17-03-2022(online)].pdf	2022-03-17
6	202221014863-FIGURE OF ABSTRACT [17-03-2022(online)].jpg	2022-03-17
7	202221014863-DRAWINGS [17-03-2022(online)].pdf	2022-03-17
8	202221014863-DECLARATION OF INVENTORSHIP (FORM 5) [17-03-2022(online)].pdf	2022-03-17
9	202221014863-COMPLETE SPECIFICATION [17-03-2022(online)].pdf	2022-03-17
10	202221014863-FORM-26 [22-06-2022(online)].pdf	2022-06-22
11	Abstract1.jpg	2022-07-15
12	202221014863-Power of Attorney [17-04-2023(online)].pdf	2023-04-17
13	202221014863-Form 1 (Submitted on date of filing) [17-04-2023(online)].pdf	2023-04-17
14	202221014863-Covering Letter [17-04-2023(online)].pdf	2023-04-17
15	202221014863-CORRESPONDENCE(IPO)-(WIPO DAS)-01-05-2023.pdf	2023-05-01
16	202221014863-FORM 3 [21-07-2023(online)].pdf	2023-07-21
17	202221014863-FER.pdf	2025-03-18
18	202221014863-Information under section 8(2) [17-06-2025(online)].pdf	2025-06-17
19	202221014863-FORM 3 [17-06-2025(online)].pdf	2025-06-17

Search Strategy

1	SearchStrategyE_28-02-2024.pdf