Abstract: This disclosure relates generally to system and method for efficient model selection and hyper-parameter tuning. The conventional systems are time inefficient and often renders inaccurate results. The disclosed system employs hyperparameter tuning in two phases, thereby enabling efficient hyperparameter tuning and ensembling of models in time-efficient manner. In one embodiment, a first phase of hyperparameter tuning is shallow search phase while the second phase is deep learning phase. During the shallow search phase, quick and time efficient preliminary model selection is performed based on a performance-based voting. During the deep learning phase, instead of tuning all the candidate models, the deep search is then performed only on the models selected in the first learning phase. Further, the models are tuned only for one fold (or section of dataset under consideration) and fitted over the remaining folds. These two-phase learning process facilities in reducing time complexity of the overall process. [To be published with FIG. 2]
FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003
COMPLETE SPECIFICATION (See Section 10 and Rule 13)
Title of invention:
SYSTEM AND METHOD FOR EFFICIENT MODEL SELECTION AND HYPER-PARAMETER TUNING
Applicant
Tata Consultancy Services Limited A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th floor,
Nariman point, Mumbai 400021,
Maharashtra, India
Preamble to the description
The following specification particularly describes the invention and the manner in which it is to be performed.
TECHNICAL FIELD [001] The disclosure herein generally relates to model selection and hyperparameter tuning, and, more particularly, to a system and method for efficient model selection and hyperparameter tuning.
BACKGROUND
[002] A hyper parameter is a parameter whose value is set before the process of learning begins by a machine learning (ML) model. In other words, hyperparameters refers to settings that can be tuned to control the behavior of the ML model/algorithm. The process of selecting hyperparameters is a key aspect in ML algorithms. Selection of optimal hyperparameters for a model is also referred to as hyperparameter tuning or hyperparameter optimization. Typically, hyperparameter tuning for a model is a computationally intensive task.
[003] In some scenarios, instead of relying on a single model to generate an output, an ensemble (having a plurality of models) can be utilized to generate an output. The average prediction (i.e., the average of the outputs) of the plurality of models in the ensemble tends to outperform the prediction of individual models in the ensemble. The hyperparameter tuning on the ensemble of models can be performed by some of the known methods including, but are not limited to, Brute-Force method, Bayesian method, Tree of Parzen Estimators (TPE) methods, and so on. These methods are however time inefficient and often do not give accurate results. Additionally, the conventional techniques such as Tuning process, k-fold fitting process and ensemble processes, when combined, result in high time complexity which make the ML pipeline impractical as a whole.
SUMMARY [004] Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems.
[005] For example, in one embodiment, a method for efficient model selection and hyper-parameter tuning is provided. The method includes splitting, via one or more hardware processors, a dataset into a plurality of sections based on a user input. Each section of the plurality of sections includes a training dataset and a validation dataset. Further, the method includes determining, by using an optimization model/technique, an optimized ensemble of models comprising a plurality of optimized models from amongst the plurality of candidate models for fitting on the plurality of sections of the dataset, via the one or more hardware processors. The optimization technique includes iteratively performing a first phase and a second phase of hyper-parameter tuning for the plurality of sections of the dataset, fitting the second set of models on the plurality of sections of the dataset to obtain the optimized ensemble of models. Herein, performing the first phase of the hyper-parameter tuning comprises performing a shallow hyper-parameter tuning of a the plurality of candidate models on a first section of the data set, and selecting a first set of models from amongst the plurality of candidate models based on the shallow hyper-parameter tuning, the first phase of the training hyper-parameter tuning performed till a first termination criteria is met. Also, performing the second phase of the hyper-parameter tuning comprises performing a deep search for the first set of models on the first section of the dataset to obtain a second set of models, the second phase of the hyper-parameter tuning training performed till a second termination criteria is met, wherein the first termination criteria is lenient than the second termination criteria.
[006] In another aspect, a system for efficient model selection and hyper-parameter tuning is provided. The system includes a memory storing instructions, one or more communication interfaces; and one or more hardware processors coupled to the memory via the one or more communication interfaces, wherein the one or more hardware processors are configured by the instructions to split a dataset into a plurality of sections based on a user input. Each section of the plurality of sections includes a training dataset and a validation dataset. Further, the method includes determining, by using an optimization technique/model, an optimized ensemble of models comprising a plurality of optimized models from amongst the
plurality of candidate models for fitting on the plurality of sections of the dataset, via the one or more hardware processors. The optimization technique includes iteratively performing a first phase and a second phase of hyper-parameter tuning for the plurality of sections of the dataset, fitting the second set of models on the plurality of sections of the dataset to obtain the optimized ensemble of models. Herein, performing the first phase of the hyper-parameter tuning comprises performing a shallow hyper-parameter tuning of a the plurality of candidate models on a first section of the data set, and selecting a first set of models from amongst the plurality of candidate models based on the shallow hyper-parameter tuning, the first phase of the training hyper-parameter tuning performed till a first termination criteria is met. Also, performing the second phase of the hyper-parameter tuning comprises performing a deep search for the first set of models on the first section of the dataset to obtain a second set of models, the second phase of the hyper-parameter tuning training performed till a second termination criteria is met, wherein the first termination criteria is lenient than the second termination criteria. [007] In yet another aspect, a non-transitory computer readable medium for a method for efficient model selection and hyper-parameter tuning is provided. The method includes splitting, via one or more hardware processors, a dataset into a plurality of sections based on a user input. Each section of the plurality of sections includes a training dataset and a validation dataset. Further, the method includes determining, by using a optimization model, an optimized ensemble of models comprising a plurality of optimized models from amongst the plurality of candidate models for fitting on the plurality of sections of the dataset, via the one or more hardware processors. The optimization model includes iteratively performing a first phase and a second phase of hyper-parameter tuning for the plurality of sections of the dataset, fitting the second set of models on the plurality of sections of the dataset to obtain the optimized ensemble of models. Herein, performing the first phase of the hyper-parameter tuning comprises performing a shallow hyper-parameter tuning of a the plurality of candidate models on a first section of the data set, and selecting a first set of models from amongst the plurality of candidate models based on the shallow hyper-parameter tuning, the first phase of the training hyper-
parameter tuning performed till a first termination criteria is met. Also, performing the second phase of the hyper-parameter tuning comprises performing a deep search for the first set of models on the first section of the dataset to obtain a second set of models, the second phase of the hyper-parameter tuning training performed till a second termination criteria is met, wherein the first termination criteria is lenient than the second termination criteria.
[008] It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS
[009] The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles:
[010] FIG. 1 illustrates an example network implementation of a system for efficient hyper-parameter tuning and ensembling of models according to some embodiments of the present disclosure.
[011] FIG. 2 is a flow diagram for a method for efficient hyper-parameter tuning and ensembling of models, according to some embodiments of the present disclosure.
[012] FIG. 3 is a block diagram of an exemplary computer system for implementing embodiments consistent with the present disclosure.
[013] FIG. 4A illustrates a graph indicative of objective function values (or root mean square error (RMSE)) comparison (on validation set) between the deep search and the narrow search, according to an example embodiment of the present disclosure.
[014] FIG. 4B illustrates a graph representing a comparison between the time taken to complete the Shallow search and Deep Search, according to an example embodiment of the present disclosure.
[015] FIG. 4C illustrates a graph representing comparison of RMSE on the validation set between the disclosed method and standard (known) method, according to an example embodiment of the present disclosure.
[016] FIG. 4D illustrates a comparison of the disclosed method via-s-vis standard (known) method, according to an example embodiment of the present disclosure.
[017] FIG. 4E illustrates a comparison of an overall time taken by the disclosed method via-s-vis standard (known) method, according to an example embodiment of the present disclosure.
[018] FIGS. 5A, 5B and 5C illustrates performance of the shallow search, deep search and the disclosed method (of FIG. 2) on validation Set, respectively in accordance with an example embodiment.
DETAILED DESCRIPTION OF EMBODIMENTS
[019] Predictive modeling is a process of using known results to create, process and validate one or more models that can be used to predict future outcomes. Predictive models predict the future outcomes based on input parameters received (for a particular task) as a function of respective learned parameters.
[020] Supervised learning techniques are capable of producing predictive models that produce high accuracy predictions in an automated manner. Machine learning techniques utilizes learning algorithm, which learns parameters of a predictive model that cause performance of the predictive model to be optimized by utilizing a training and a validation data set. Such learning algorithm includes configuration parameters such as learning rate. Such parameters may herein after be referred to as hyper-parameters. Hyperparameters of learning models may be employed to learn parameters of a predictive model. Examples of hyperparameters includes, but are not limited to, learning rate, number of hidden units, convolution kernel width, and so on.
[021] Various methods used for hyper-parameter tuning of ensembles Brute-Force, Bayesian, TPE, and so on. Random search and grid search are time inefficient because they do not use history of previous model performances to refine the search space. In other words, they do not use historical performances to refine their region of search. These methods are time inefficient and often do not give very good results. Hyperparameter tuning process, k-fold fitting process and ensembling processes of prior art when combined, result in very high time complexity which makes the Machine Learning pipeline impractical as a whole.
[022] Various embodiments described herein include a method and a system for efficient hyperparameter tuning and ensembling of model based on an optimization model that leverages parameters of differential evolution optimization. The disclosed method facilitates in reducing time complexity in hyper-parameter tuning and ensembling of models with k-fold fitting. For example, in one embodiment, using the optimization for hyper-parameter tuning includes providing two learning phases or passes, known as a first phase and a second phase. The first phase of hyperparameter tuning is a shallow search phase while the second phase is a deep learning phase. During the shallow search phase, a quick and time efficient preliminary model selection is performed based on a performance-based voting. During the deep learning phase, instead of tuning all the candidate models, the deep search is then performed only on the models selected in the first learning phase. Further, the models are tuned only for one fold (or section of dataset under consideration) and fitted over the remaining folds. These two-phase learning process facilities in reducing time complexity of the overall process.
[023] Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments. It is
intended that the following detailed description be considered as exemplary only, with the true scope being indicated by the following claims.
[024] Referring now to the drawings, and more particularly to FIG. 1 through 4B, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.
[025] FIG. 1 illustrates an example network implementation 100 of a system 102 for efficient hyperparameter tuning and ensembling of models in accordance with an example embodiment. In an embodiment, the system 102 utilizes an optimization model that leverages parameters of differential evolution for hyper-parameter tuning and ensemble of models. The optimization includes two passes, a first pass and a second pass, where during the first pass, a first hyper-parameter tuning is performed in a time efficient manner and is done to get a rough idea of which models should be selected for further deep tuning. The models selected by the first stage are thus further tuned to perform a more comprehensive search of hyper-parameters till terminating criterion is reached to finally obtain a set of (n) tuned models. The n tuned models may then be fitted over all k folds of data, thereby resulting in nk fitted models (or the ensemble of models).
[026] Accordingly, in an embodiment, a small population size is selected for differential evolution with a very lenient termination criterion (low gtol and high ftol). Herein, the gtol is indicative of number of iterations a lookback or a patience parameter, and ftol is represented by tolerance parameter and ftol is indicative of Maximum amount of change experienced in Patience iterations allowed for convergence. In the deep tuning step (or the second pass), the population size for differential evolution is set to a relatively higher value and the terminating criterion is made stricter (relatively higher gtol and lower ftol). The solution for optimization is said to converge when:
cost[-gtol] - cost[-1] <= gtol (1)
[027] In an alternate embodiment, a plurality of candidate solutions are obtained in a plurality of iterations of the optimization model such that the plurality
of candidate solutions defines a population. In this embodiment, the first termination criteria and the second termination criteria are associated with a standard deviation of population energies associated with the population, and atol, tol, and mean of population energies based on a criteria as:
Standard Deviation of (Population Energies) <= atol + tol *
|Mean(Population Energies)| (2)
Here,
tol: Relative Tolerance (default: 0.01)
atol: Absolute Tolerance (default: 0)
Population Energies are the Objective function values calculated over
Population members.
[028] The above equations (1) and (2) means that all the solutions are very similar to each other because multiple solutions evolve in a number of iterations. If all the solutions are similar to each other in consecutively iterations, then the solution may converge. Additionally or alternatively, when the number of maximum iterations exceeds a threshold number of iterations, the solution may converge. The threshold number of maximum iterations maybe set by a user.
[029] Herein, a Population member may include a candidate solution vector and an objective function value (e.g. RMSE value). Population energies refers to the objective function values of the candidate solution vectors. As per equation (2) if the Standard deviation of population energies is less than or equal to [atol + tol *|Mean(Population Energies)], then the optimization has converged.
[030] The system also keeps track of the number of iterations and perform search within user defined iteration limit for shallow and deep search. If the number of iterations exceeds 10 (iteration limit for shallow search), then the system may stop for shallow search and if it exceeds 50(iteration limit for deep search), the algorithm stops for deep search. This criterion overrides the previous criterion (of equation 2) given by the population energies.
[031] The process of tuning models on one fold and fitting on other folds is time efficient since models are not needed to be trained over all of the k-folds. Also, a first pass (which is a surface elimination stage) contributes to the time
efficiency of the pipeline. This is a major contributor in reducing the time complexity of pipeline generation for ensemble models.
[032] Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments. It is intended that the following detailed description be considered as exemplary only, with the true scope being indicated by the following claims.
[033] Although the present disclosure is explained considering that the system 102 is implemented on a server, it may be understood that the system 102 may also be implemented in a variety of computing systems 104, such as a laptop computer, a desktop computer, a notebook, a workstation, a cloud-based computing environment and the like. It will be understood that the system 102 may be accessed through one or more devices 106-1, 106-2... 106-N, collectively referred to as devices 106 hereinafter, or applications residing on the devices 106. Examples of the devices 106 may include, but are not limited to, a portable computer, a personal digital assistant, a handheld device, a Smartphone, a tablet computer, a workstation and the like. The devices 106 are communicatively coupled to the system 102 through a network 108.
[034] In an embodiment, the network 108 may be a wireless or a wired network, or a combination thereof. In an example, the network 108 can be implemented as a computer network, as one of the different types of networks, such as virtual private network (VPN), intranet, local area network (LAN), wide area network (WAN), the internet, and such. The network 106 may either be a dedicated network or a shared network, which represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), and Wireless Application Protocol (WAP), to communicate with each other. Further,
the network 108 may include a variety of network devices, including routers, bridges, servers, computing devices, storage devices. The network devices within the network 108 may interact with the system 102 through communication links.
[035] As discussed above, the system 102 may be implemented in a computing device 104, such as a hand-held device, a laptop or other portable computer, a tablet computer, a mobile phone, a PDA, a smartphone, and a desktop computer. The system 102 may also be implemented in a workstation, a mainframe computer, a server, and a network server. In an embodiment, the system 102 may be coupled to a data repository, for example, a repository 112. The repository 112 may store data processed, received, and generated by the system 102. In an alternate embodiment, the system 102 may include the data repository 112. In an embodiment, the data repository may store training and/or validation dataset. In an embodiment, the data set may include labelled data to enable supervised learning for the prediction. In an embodiment, the data set may be a single data set that may be split into a plurality of folds or sections such that each fold of the dataset includes a training data and a validation data. Accordingly, for example, if the dataset is to be utilized for a task (such as predicting a process outcome parameter as target based on process input parameters), the dataset in multiple folds may include process input parameters with respective process outcome parameter as label.
[036] The network environment 100 supports various connectivity options such as BLUETOOTH®, USB, ZigBee and other cellular services. The network environment enables connection of devices 106 such as Smartphone with the server 104, and accordingly with the database 112 using any communication link including Internet, WAN, MAN, and so on. In an exemplary embodiment, the system 102 is implemented to operate as a stand-alone device. In another embodiment, the system 102 may be implemented to work as a loosely coupled device to a smart computing environment. The components and functionalities of the system 102 are described further in detail with reference to FIGS. 2-4B.
[037] FIG. 2 is a flow diagram of a method 200 for efficient hyperparameter tuning and ensembling of models in accordance with some examples of present disclosure. The method 200 depicted in the flow chart may be
executed by a system, for example, the system, 102 of FIG. 1. In an example embodiment, the system 102 may be embodied in a computing device.
[038] Operations of the flowchart, and combinations of operation in the flowchart, may be implemented by various means, such as hardware, firmware, processor, circuitry and/or other device associated with execution of software including one or more computer program instructions. For example, one or more of the procedures described in various embodiments may be embodied by computer program instructions. In an example embodiment, the computer program instructions, which embody the procedures, described in various embodiments may be stored by at least one memory device of a system and executed by at least one processor in the system. Any such computer program instructions may be loaded onto a computer or other programmable system (for example, hardware) to produce a machine, such that the resulting computer or other programmable system embody means for implementing the operations specified in the flowchart. It will be noted herein that the operations of the method 200 are described with help of system 100. However, the operations of the method 200 can be described and/or practiced by using any other system.
[039] The method 200 facilitates in creating an optimized ensemble of models from a plurality of candidate models. The optimization process for creating the optimized ensemble of models begins by training a model on the training set (for a first fold from amongst the plurality of folds/sections) and calculating the RMSE (root mean squared error) on the validation set (for the first fold). Herein, the optimization is performed by utilizing a Differential Evolution optimization technique.
[040] The Differential Evolution (DE) optimization has the ability to generate a large variety of trial solutions for optimization while still keeping a record of the global best solutions at every step in the process. The DE optimization technique optimizes a task/problem by iteratively trying to improve a candidate solution with regard to a given measure of quality.
[041] The method for creating the optimized ensemble of models is explained further with reference to steps 202-210. For example, at 202, a dataset
may be split into a plurality of sections (or folds) based on a user input. Each section of the plurality of sections includes a training dataset and a validation dataset. In an embodiment, the user input may include certain attributes, also referred to as ‘features’. In an example scenario, the user dataset may be include tabular data with rows and columns, such that each column may be a feature. A plurality of candidate optimization models may be trained on the training set (for a first section from amongst the plurality of sections) and further a RMSE (root mean squared error) is calculated on the validation set (for the first section).
[042] At 204, the method 200 includes determining, by using an optimization model that leverages configuration parameters of differential evolution technique, a plurality of optimized models from amongst a plurality of candidate models for fitting on the plurality of sections of the dataset. Examples of the plurality of candidate models, may include, but are not limited to, combinations of models selected from xgboost, lightgbm, sklearn GBM, sklearn Random Forest, and sklearn ExtraTrees, k-Nearest Neighbors, Support Vector Machines, Elastic Net, Deep Neural Network, and so on.
[043] The optimization model randomly selects a number of trial solutions (a population) and evaluates performance of the model for the generated trial solution. It then perturbs the trial solutions in a process called mutation. For instance, here, x1, x2 and x3 may be trial solutions. Mutation is necessary to avoid local minima and for introducing a larger variety into the solutions. A revised set of trial solutions are then combined in a process called crossover by randomly interchanging some values in a trial solution and a mutated trial solution to obtain candidate solutions. The aforementioned process of optimization ensures that a new solution is formed which retains a part of the candidate solution which had performed well in the previous iteration while still introducing an element of randomness in the revised set of trial solution. The model performance is measured by appropriate metrics such as root mean square error (RMSE) values are evaluated for the revised set of trial solutions. The solutions from amongst the revised set of trial solutions which have RMSE lower than a predetermined threshold value may
be utilized for subsequent iterations. The search for solutions is continued till a termination criterion is met.
[044] Using the optimization model has the technical advantage due to its ability to generate a large variety of trial solutions for optimization and still keeping a record of the best global solutions at every step in the process. The crossover process of DE optimization technique is efficient in creating successful solutions from previous successful solutions. The mutation process introduces enough randomness in the solution to ensure that the solution is not stuck in local minima.
[045] In an embodiment, the terminating criterion consists of satisfying criteria based on two variables, namely, gtol and ftol in the following manner. In an embodiment, the termination criterion facilitates in converging the optimization when a best solution output by the model changes by at most ftol amount in magnitude in gtol consecutive iterations. Thus, the terminating criterion can be made strict by decreasing ftol and increasing gtol.
[046] In various embodiments disclosed herein, the differential evolution technique includes iteratively performing a first phase and a second phase of hyper-parameter tuning for a section of the dataset such that the first phase and the second phase differ in the termination criteria thereof, as described below with reference to steps 206-210.
[047] At 206, the method 200 includes performing the first phase of the hyper-parameter tuning. As previously described, the first phase of hyper-parameter tuning is performed to get a rough idea of which models should be selected for further deep tuning. During the first pass, the plurality of candidate models are fitted with different hyper-parameter settings on the first section of the data set and the performance is evaluated (on the validation set of the first section) repeatedly till the optimization has converged for all of the plurality of candidate models. The model performances are for the plurality of candidate models may be measured on the validation set of the first section of the dataset. A small population size is selected for differential evolution with the first termination criterion which is lenient. During the first phase, a shallow search is performed for hyper-parameter tuning of a plurality of models on a first section of the data set to obtain a first set
of models. The first set of models is selected from amongst the plurality of candidate models based on the shallow search. In an embodiment, the first set of models may be a predetermined number of top performing models selected from amongst the first set of models. The first phase of the hyper-parameter tuning is performed till a first termination criteria is met. The first phase of the hyper-parameter tuning is time efficient since the first termination criterion is very lenient. Hence, the optimization process in the first phase terminates quickly.
[048] At 208, during the second phase, i.e. during the deep hyper-parameter tuning, a deep search is performed for the first set of models on the first section of the dataset to obtain a second set of models. During the second phase, the population size for differential evolution is set to a higher value (as compared to the population size of the shallow hyperparameter tuning) and a terminating criterion, i.e. the second termination criteria is made strict. The first set of models selected by the first stage are thus further tuned in the second phase to perform a more comprehensive search of hyper-parameters till the second termination criterion is met.
[049] As described previously, the first termination criteria can be made lenient as compared to the second termination criteria based on the values of gtol and ftol. In another embodiment, the first termination criteria can be made lenient as compared to the second termination criteria based on the determination of standard deviation of the population energies, aol, tol and mean of population energies (equation (2)).
[050] At 210, the method 200 includes fitting the second set of models on the plurality of sections of the dataset to obtain the optimized ensemble of models. The process of tuning models on one fold and fitting on other folds is time efficient since models are not trained over all the folds. Also, the surface elimination stage contributes to the time efficiency of the pipeline. This is a major contributor in reducing the time complexity of pipeline generation for ensemble models. In an embodiment, SHapley Additive exPlanations (SHAP) may be utilized to evaluate the contribution of the features.
[051] In an embodiment, the ensemble of models may be utilized for predicting an outcome of a task. For example, an average of set of predictions by the ensemble of models may be taken to obtain the prediction for the task, for example, in case, the ensemble includes 15 models, then the predictions may be obtained by obtaining individual model predictions of said 15 models and then averaging out the predictions to obtain the final predictions.
[052] In an embodiment, a measure of uncertainty present in predictions is generated using the optimized ensemble of models. The measure of uncertainty is generated by observing the variation in the predictions at a given point of time. For example, if the ensemble includes 15 models, and on prediction it is determined that the individual models of the ensemble gives vastly different predictions, it can be concluded that the models of the ensembled are uncertain. If the individual predictions are relatively close, it can be concluded that the predictions have less uncertainty. In an embodiment, the uncertainty is taken to be proportional to a Maximum Prediction value minus a minimum prediction value.
[053] FIG. 3 is a block diagram of an exemplary computer system 301 for implementing embodiments consistent with the present disclosure. The computer system 301 may be implemented in alone or in combination of components of the system 102 (FIG. 1). Variations of computer system 301 may be used for implementing the devices included in this disclosure. Computer system 301 may comprise a central processing unit (“CPU” or “hardware processor”) 302. The hardware processor 302 may comprise at least one data processor for executing program components for executing user- or system-generated requests. The processor may include specialized processing units such as integrated system (bus) controllers, memory management control units, floating point units, graphics processing units, digital signal processing units, etc. The processor may include a microprocessor, such as AMD AthlonTM, DuronTM or OpteronTM, ARM’s application, embedded or secure processors, IBM PowerPCTM, Intel’s Core, ItaniumTM, XeonTM, CeleronTM or other line of processors, etc. The processor 302 may be implemented using mainframe, distributed processor, multi-core, parallel, grid, or other architectures. Some embodiments may utilize embedded technologies
like application specific integrated circuits (ASICs), digital signal processors (DSPs), Field Programmable Gate Arrays (FPGAs), etc. The processor 302 may be a multi-core multi-threaded processor.
[054] Processor 302 may be disposed in communication with one or more input/output (I/O) devices via I/O interface 303. The I/O interface 303 may employ communication protocols/methods such as, without limitation, audio, analog, digital, monoaural, RCA, stereo, IEEE-1394, serial bus, universal serial bus (USB), infrared, PS/2, BNC, coaxial, component, composite, digital visual interface (DVI), high-definition multimedia interface (HDMI), RF antennas, S-Video, VGA, IEEE 802.11 a/b/g/n/x, Bluetooth, cellular (e.g., code-division multiple access (CDMA), high-speed packet access (HSPA+), global system for mobile communications (GSM), long-term evolution (LTE), WiMax, or the like), etc.
[055] Using the I/O interface 303, the computer system 301 may communicate with one or more I/O devices. For example, the input device 304 may be an antenna, keyboard, mouse, joystick, (infrared) remote control, camera, card reader, fax machine, dongle, biometric reader, microphone, touch screen, touchpad, trackball, sensor (e.g., accelerometer, light sensor, GPS, gyroscope, proximity sensor, or the like), stylus, scanner, storage device, transceiver, video device/source, visors, etc.
[056] Output device 305 may be a printer, fax machine, video display (e.g., cathode ray tube (CRT), liquid crystal display (LCD), light-emitting diode (LED), plasma, or the like), audio speaker, etc. In some embodiments, a transceiver 306 may be disposed in connection with the processor 302. The transceiver may facilitate various types of wireless transmission or reception. For example, the transceiver may include an antenna operatively connected to a transceiver chip (e.g., Texas Instruments WiLink WL1283, Broadcom BCM4750IUB8, Infineon Technologies X-Gold 618-PMB9800, or the like), providing IEEE 802.11a/b/g/n, Bluetooth, FM, global positioning system (GPS), 2G/3G HSDPA/HSUPA communications, etc.
[057] In some embodiments, the processor 302 may be disposed in communication with a communication network 308 via a network interface 307.
The network interface 307 may communicate with the communication network 308. The network interface may employ connection protocols including, without limitation, direct connect, Ethernet (e.g., twisted pair 10/100/1000 Base T), transmission control protocol/internet protocol (TCP/IP), token ring, IEEE 802.11a/b/g/n/x, etc. The communication network 308 may include, without limitation, a direct interconnection, local area network (LAN), wide area network (WAN), wireless network (e.g., using Wireless Application Protocol), the Internet, etc. Using the network interface 307 and the communication network 308, the computer system 301 may communicate with devices 309 and 310. These devices may include, without limitation, personal computer(s), server(s), fax machines, printers, scanners, various mobile devices such as cellular telephones, smartphones (e.g., Apple iPhone, Blackberry, Android-based phones, etc.), tablet computers, eBook readers (Amazon Kindle, Nook, etc.), laptop computers, notebooks, gaming consoles (Microsoft Xbox, Nintendo DS, Sony PlayStation, etc.), or the like. In some embodiments, the computer system 701 may itself embody one or more of these devices.
[058] In some embodiments, the processor 302 may be disposed in communication with one or more memory devices (e.g., RAM 513, ROM 514, etc.) via a storage interface 312. The storage interface may connect to memory devices including, without limitation, memory drives, removable disc drives, etc., employing connection protocols such as serial advanced technology attachment (SATA), integrated drive electronics (IDE), IEEE-1394, universal serial bus (USB), fiber channel, small computer systems interface (SCSI), etc. The memory drives may further include a drum, magnetic disc drive, magneto-optical drive, optical drive, redundant array of independent discs (RAID), solid-state memory devices, solid-state drives, etc. Variations of memory devices may be used for implementing, for example, any databases utilized in this disclosure.
[059] The memory devices may store a collection of program or database components, including, without limitation, an operating system 316, user interface application 317, user/application data 318 (e.g., any data variables or data records discussed in this disclosure), etc. The operating system 316 may facilitate resource
management and operation of the computer system 301. Examples of operating systems include, without limitation, Apple Macintosh OS X, Unix, Unix-like system distributions (e.g., Berkeley Software Distribution (BSD), FreeBSD, NetBSD, OpenBSD, etc.), Linux distributions (e.g., Red Hat, Ubuntu, Kubuntu, etc.), IBM OS/2, Microsoft Windows (XP, Vista/7/8, etc.), Apple iOS, Google Android, Blackberry OS, or the like. User interface 317 may facilitate display, execution, interaction, manipulation, or operation of program components through textual or graphical facilities. For example, user interfaces may provide computer interaction interface elements on a display system operatively connected to the computer system 301, such as cursors, icons, check boxes, menus, scrollers, windows, widgets, etc. Graphical user interfaces (GUIs) may be employed, including, without limitation, Apple Macintosh operating systems’ Aqua, IBM OS/2, Microsoft Windows (e.g., Aero, Metro, etc.), Unix X-Windows, web interface libraries (e.g., ActiveX, Java, Javascript, AJAX, HTML, Adobe Flash, etc.), or the like.
[060] In some embodiments, computer system 301 may store user/application data 318, such as the data, variables, records, etc. as described in this disclosure. Such databases may be implemented as fault-tolerant, relational, scalable, secure databases such as Oracle or Sybase. Alternatively, such databases may be implemented using standardized data structures, such as an array, hash, linked list, structured text file (e.g., XML), table, or as hand-oriented databases (e.g., using HandStore, Poet, Zope, etc.). Such databases may be consolidated or distributed, sometimes among various computer systems discussed above. It is to be understood that the structure and operation of any computer or database component may be combined, consolidated, or distributed in any working combination.
[061] Additionally, in some embodiments, the server, messaging and instructions transmitted or received may emanate from hardware, including operating system, and program code (i.e., application code) residing in a cloud implementation. Further, it should be noted that one or more of the systems and methods provided herein may be suitable for cloud-based implementation. For
example, in some embodiments, some or all of the data used in the disclosed methods may be sourced from or stored on any cloud computing platform.
Example Scenario:
[062] In an example scenario, experiments were performed for efficient model selection and hyper-parameter tuning. For the purpose of experiments, system with following system specifications was considered:
Processor: Intel Core i5-6200U CPU @ 2.30GHz 2.40GHz
RAM: 4.00 GB
System Type: 64-bit Operating System, x64 based Processor.
OS: Windows 10 64 Bit Enterprise SOEv1.3
[063] Data used: Data used for the experiments was the California Housing Data referenced in: Pace, R. Kelley, and Ronald Barry, "Sparse Spatial Autoregressions," Statistics and Probability Letters, Volume 33, Number 3, May 5 1997, p. 291-297. The target variable is the median house value for California districts. The dataset was derived from the 1990 U.S. census, using one row per census block group. A block group is the smallest geographical unit for which the U.S. Census Bureau publishes sample data (a block group typically has a population of 600 to 3,000 people)
[064] There were 9 predictors (inputs) and one target variable in the data. Following were the predictors in the Data:
MedInc: median income in block
HouseAge: median house age in block
AveRooms: average number of rooms
AveBedrms: average number of bedrooms
Population: block population
AveOccup: average house occupancy
Latitude: house block latitude
Longitude: house block longitude
[065] Herein, the target variable used for the regression task is: medv(median house value for California districts).
[066] For the purpose of experiment, a standard approach was defined as follows:
1. Split data into k folds. For the following trials, k was chosen to be 5
2. Repeat for k folds
3. Tune Each of the 5 candidate models using Differential Evolution on Current Fold
4. To Predict take the average of 5*k Tuned models
[067] The key differences between the standard approach (or known methods) and the suggested approach (or the disclosed method) are:
1. Suggested approach uses a Shallow tune stage to eliminate poor
performing models. This improves the quality of the resulting ensemble and
reduces overall time.
2. Each model is tuned only on the first fold. The hyperparameters obtained
as a result of the optimization in the first fold are then used to train models
on the remaining folds
[068] The following experiments were performed:
1. Comparison between Shallow Search Results and Deep Search Results on RMSE (Root Mean Squared Error)
2. Comparison between time of completion of Shallow Search and Deep Search
3. Comparison between Standard Approach and suggested approach in
terms of Time and RMSE.
[069] The results of the aforementioned experiments is illustrated with reference to FIGS. 4A and 4E.
[070] Referring to FIG. 4A, a graph indicative of root mean square error (RMSE) comparison (on validation set) between the deep search and the narrow search is illustrated. As seen from the graph, models tuned by the deep search achieved lower RMSE than the models tuned by the shallow search. This shows the effectiveness of the deep search in improving model performance
[071] FIG. 4B illustrates a graph representing a comparison between the time taken to complete the Shallow search and Deep Search. Despite deep search
taking longer than Shallow Search, the models produced by deep search perform better than the models selected by Shallow Search as shown in FIG. 4A
[072] FIG. 4C illustrates a graph representing comparison of RMSE on the validation set between the disclosed method and standard (known) method. As seen from FIG. 4C, models tuned by the disclosed method achieved lower RMSE than the ones tuned by the standard method. This shows the effectiveness of the disclosed method in improving model performance compared to the standard method.
[073] FIG. 4D illustrates a comparison of the disclosed method via-s-vis standard (known) method. As is seen from FIG. 4D, the disclosed method is found to perform better on the test set than the standard method.
[074] FIG. 4E illustrates a comparison of an overall time taken by the disclosed method via-s-vis standard (known) method. As illustrated in FIG. 4E, the disclosed method takes 19 hours which is ~8.1x faster than the standard (known) approach which takes 154 hours on the current data-set. FIG. 4E demonstrates the effectiveness of the voting done at the shallow search stage and of tuning the model on the first fold and generating multiple models using the same tuned hyperparameters on the remaining folds.
[075] FIGS. 5A, 5B and 5C illustrates performance of the Shallow Search, Deep Search and disclosed method on validation set, in accordance with an example embodiment. For the purpose of experiments, the shallow search was performed on k=5 folds, the deep search was performed on top n=3 models. As illustrated in FIG. 5A, the shallow search shows poor performance for Extra Trees and Random Forest on current data. These models were not subjected to deep search in second pass. Referring to FIG. 5B, deep search was performed on the top 3 models (selected during shallow Search). The performance of the 3 tuned models improves over their performance in the shallow search phase. Referring to FIG. 5C, standard approach involves tuning each model during all the k folds. It can be seen that the poorest performing models after shallow search are also the poorest performing models when subjected to standard approach. This shows the effectiveness of the shallow search in identifying the poor models in a short time.
[076] The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
[077] Various embodiments disclosed herein provides method and system for efficient hyperparameter tuning and ensembling of models in a time-efficient manner. In one embodiment, the optimization employed herein includes performing hyper parameter tuning in two phases. The first phase of hyperparameter tuning is a shallow search phase while the second phase is a deep learning phase. During the shallow search phase, a quick and time efficient preliminary model selection is performed based on a performance-based voting. During the deep learning phase, instead of tuning all the candidate models, the deep search is then performed only on the models selected in the first learning phase. Further, the models are tuned only for one fold (or section of dataset under consideration) and fitted over the remaining folds. These two-phase learning process facilities in reducing time complexity of the overall process.
[078] It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g. any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g. hardware means like e.g. an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g. an ASIC and an FPGA, or at least one microprocessor and at least one memory with software processing components located therein. Thus, the means can include both hardware
means and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g. using a plurality of CPUs.
[079] The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various components described herein may be implemented in other components or combinations of other components. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
[080] The illustrated steps are set out to explain the exemplary
embodiments shown, and it should be anticipated that ongoing technological
development will change the manner in which particular functions are performed.
These examples are presented herein for purposes of illustration, and not limitation.
Further, the boundaries of the functional building blocks have been arbitrarily
defined herein for the convenience of the description. Alternative boundaries can
be defined so long as the specified functions and relationships thereof are
appropriately performed. Alternatives (including equivalents, extensions,
variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
[081] Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A
computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
[082] It is intended that the disclosure and examples be considered as exemplary only, with a true scope of disclosed embodiments being indicated by the following claims.
We Claim:
1. A processor implemented method (200), comprising:
splitting (202), via one or more hardware processors, a dataset into a plurality of sections based on a user input, each section of the plurality of sections comprises a training dataset and a validation dataset; and
determining (204), by using an optimization model, an optimized ensemble of models comprising a plurality of optimized models from amongst a plurality of candidate models for fitting on the plurality of sections of the dataset, via the one or more hardware processors, wherein the optimization model comprises:
iteratively performing a first phase and a second phase of hyper-parameter tuning for the plurality of sections of the dataset, wherein:
performing (206) the first phase of the hyper-parameter tuning comprises performing a shallow hyper-parameter tuning of the plurality of candidate models on a first section of the data set, and selecting a first set of models from amongst the plurality of candidate models based on the shallow hyper-parameter tuning, the first phase of the training hyper-parameter tuning performed till a first termination criteria is met;
performing (208) the second phase of the hyper-parameter tuning comprises performing a deep search for the first set of models on the first section of the dataset to obtain a second set of models, the second phase of the hyper-parameter tuning training performed till a second termination criteria is met, wherein the first termination criteria is lenient than the second termination criteria; and
fitting (210) the second set of models on the plurality of sections of the dataset to obtain the optimized ensemble of models.
2. The processor implemented method of claim 1, further comprising
predicting an outcome of a task based on the optimized ensemble of models.
3. The processor implemented method of claim 2, wherein the predicting
comprises:
obtaining, using each of the optimized ensemble of models, a set of
predictions; and
averaging the set of predictions to predict the outcome of the task.
4. The processor implemented method of claim 2, further comprises generating a measure of uncertainty present in predictions using the optimized ensemble of models.
5. The processor implemented method of claim 1, wherein the plurality of candidate models comprises combinations of models selected from xgboost, lightgbm, sklearn GBM, sklearn Random Forest, and sklearn ExtraTrees, k-Nearest Neighbors, Support Vector Machines, and Elastic Net.
6. The processor implemented method of claim 1, wherein the first termination criteria and the second termination criteria are associated with values of gtol and ftol, and wherein the first termination criteria is made lenient as compared to the second termination criteria by decreasing values of ftol and increasing value of gtol from the values of ftol and gtol associated with the first termination criteria.
7. The processor implemented method of claim 1, further comprises obtaining a plurality of candidate solutions in a plurality of iterations of the optimization model, wherein the plurality of candidate solutions defines a population, further wherein the first termination criteria and the second termination criteria are associated with a standard deviation of population energies associated with the population, and atol, tol, and mean of population energies based on a critera as below:
standard deviation of (Population Energies) <= atol + tol * |Mean (Population Energies),
wherein, atol is relative tolerance, and tol is absolute tolerance, and wherein population energies comprises a plurality of objective function values calculated over a plurality of members of the population in the plurality of iterations.
8. A system (301), comprising:
a memory (315) storing instructions;
one or more communication interfaces (303); and
one or more hardware processors (302) coupled to the memory (315) via the
one or more communication interfaces (303), wherein the one or more
hardware processors (302) are configured by the instructions to:
split a dataset into a plurality of sections based on a user input, each section
of the plurality of sections comprises a training dataset and a validation
dataset; and
determine, by using an optimization model, an optimized ensemble of
models comprising a plurality of optimized models from amongst a plurality
of candidate models for fitting on the plurality of sections of the dataset,
wherein the optimization model configures the one or more hardware
processors to:
iteratively perform a first phase and a second phase of hyper-parameter tuning for the plurality of sections of the dataset, wherein to:
perform the first phase of the hyper-parameter tuning, the one or more hardware processors are configured by the instructions to perform a shallow hyper-parameter tuning of the plurality of candidate models on a first section of the data set, and selecting a first set of models from amongst the plurality of candidate models based on the shallow hyper-parameter tuning, the first phase of the training hyper-parameter tuning performed till a first termination criteria is met;
perform the second phase of the hyper-parameter tuning, the one or more hardware processors are configured by the instructions to perform a deep search for the first set of models on the first section of the
dataset to obtain a second set of models, the second phase of the hyper-parameter tuning training performed till a second termination criteria is met, wherein the first termination criteria is lenient than the second termination criteria; and
fit, via the one or more hardware processors, the second set
of models on the plurality of sections of the dataset to obtain the
optimized ensemble of models.
9. The system of claim 8, wherein the one or more hardware processors are further configured by the instructions to predict an outcome of a task based on the optimized ensemble of models.
10. The system of claim 9, wherein to predict, the one or more hardware processors are further configured by the instructions to:
obtain, using each of the optimized ensemble of models, a set of predictions; and
average the set of predictions to predict the outcome of the task.
11. The system of claim 9, wherein the one or more hardware processors are further configured by the instructions to generate a measure of uncertainty present in predictions using the optimized ensemble of models.
12. The system of claim 8, wherein the plurality of candidate models comprises combinations of models selected from xgboost, lightgbm, sklearn GBM, sklearn Random Forest, and sklearn ExtraTrees, k-Nearest Neighbors, Support Vector Machines, and Elastic Net.
13. The system of claim 8, wherein the first termination criteria and the second termination criteria are associated with values of gtol and ftol, and wherein the one or more hardware processors are configured by the instructions to make the first termination criteria lenient as compared to the second
termination criteria by decreasing values of ftol and increasing value of gtol from the values of ftol and gtol associated with the first termination criteria.
14. The system of claim 8, wherein the one or more hardware processors are
configured by the instructions to obtain a plurality of candidate solutions in a plurality of iterations of the optimization model, wherein the plurality of candidate solutions defines a population, further wherein the first termination criteria and the second termination criteria are associated with a standard deviation of population energies associated with the population, and atol, tol, and mean of population energies based on a criteria as below: standard deviation of (Population Energies) <= atol + tol * |Mean (Population Energies),
wherein, atol is relative tolerance, and tol is absolute tolerance, and wherein population energies comprise a plurality of objective function values calculated over a plurality of members of the population in the plurality of iterations.
| # | Name | Date |
|---|---|---|
| 1 | 202021031790-IntimationOfGrant24-04-2024.pdf | 2024-04-24 |
| 1 | 202021031790-STATEMENT OF UNDERTAKING (FORM 3) [24-07-2020(online)].pdf | 2020-07-24 |
| 2 | 202021031790-PatentCertificate24-04-2024.pdf | 2024-04-24 |
| 2 | 202021031790-REQUEST FOR EXAMINATION (FORM-18) [24-07-2020(online)].pdf | 2020-07-24 |
| 3 | 202021031790-REQUEST FOR EXAMINATION (FORM-18) [24-07-2020(online)]-1.pdf | 2020-07-24 |
| 3 | 202021031790-CLAIMS [24-06-2022(online)].pdf | 2022-06-24 |
| 4 | 202021031790-FORM 18 [24-07-2020(online)].pdf | 2020-07-24 |
| 4 | 202021031790-COMPLETE SPECIFICATION [24-06-2022(online)].pdf | 2022-06-24 |
| 5 | 202021031790-FORM 1 [24-07-2020(online)].pdf | 2020-07-24 |
| 5 | 202021031790-FER_SER_REPLY [24-06-2022(online)].pdf | 2022-06-24 |
| 6 | 202021031790-FIGURE OF ABSTRACT [24-07-2020(online)].jpg | 2020-07-24 |
| 6 | 202021031790-FER.pdf | 2022-03-08 |
| 7 | Abstract1.jpg | 2021-10-19 |
| 7 | 202021031790-DRAWINGS [24-07-2020(online)].pdf | 2020-07-24 |
| 8 | 202021031790-Proof of Right [08-01-2021(online)].pdf | 2021-01-08 |
| 8 | 202021031790-DECLARATION OF INVENTORSHIP (FORM 5) [24-07-2020(online)].pdf | 2020-07-24 |
| 9 | 202021031790-COMPLETE SPECIFICATION [24-07-2020(online)].pdf | 2020-07-24 |
| 9 | 202021031790-FORM-26 [16-10-2020(online)].pdf | 2020-10-16 |
| 10 | 202021031790-COMPLETE SPECIFICATION [24-07-2020(online)].pdf | 2020-07-24 |
| 10 | 202021031790-FORM-26 [16-10-2020(online)].pdf | 2020-10-16 |
| 11 | 202021031790-DECLARATION OF INVENTORSHIP (FORM 5) [24-07-2020(online)].pdf | 2020-07-24 |
| 11 | 202021031790-Proof of Right [08-01-2021(online)].pdf | 2021-01-08 |
| 12 | 202021031790-DRAWINGS [24-07-2020(online)].pdf | 2020-07-24 |
| 12 | Abstract1.jpg | 2021-10-19 |
| 13 | 202021031790-FER.pdf | 2022-03-08 |
| 13 | 202021031790-FIGURE OF ABSTRACT [24-07-2020(online)].jpg | 2020-07-24 |
| 14 | 202021031790-FER_SER_REPLY [24-06-2022(online)].pdf | 2022-06-24 |
| 14 | 202021031790-FORM 1 [24-07-2020(online)].pdf | 2020-07-24 |
| 15 | 202021031790-COMPLETE SPECIFICATION [24-06-2022(online)].pdf | 2022-06-24 |
| 15 | 202021031790-FORM 18 [24-07-2020(online)].pdf | 2020-07-24 |
| 16 | 202021031790-CLAIMS [24-06-2022(online)].pdf | 2022-06-24 |
| 16 | 202021031790-REQUEST FOR EXAMINATION (FORM-18) [24-07-2020(online)]-1.pdf | 2020-07-24 |
| 17 | 202021031790-PatentCertificate24-04-2024.pdf | 2024-04-24 |
| 17 | 202021031790-REQUEST FOR EXAMINATION (FORM-18) [24-07-2020(online)].pdf | 2020-07-24 |
| 18 | 202021031790-STATEMENT OF UNDERTAKING (FORM 3) [24-07-2020(online)].pdf | 2020-07-24 |
| 18 | 202021031790-IntimationOfGrant24-04-2024.pdf | 2024-04-24 |
| 1 | SearchStrategyE_08-03-2022.pdf |