Abstract: An optimal value of the hyperparameter is vital since a time required to train and test a machine learning model depends on the number of hyperparameters and the value associated with the hyperparameters. Obtaining an optimal hyperparameter value is challenging due to overfitting, convergence and time consumption challenges. In the present disclosure, a set of hyperparameters associated with a machine learning model are received. Further, a set of gradient values are iteratively computed for each hyperparameter by utilizing a first value and a second value based on a predefined magnitude. Further, an updated set of values are calculated for each hyperparameter and a set of local minima associated with each hyperparameter is obtained by utilizing the updated set of values and a set of validation errors. Further, a global optimum is obtained for each hyperparameter from the set of local minima.
Claims:1. A method for hyperparameter optimization, the method comprising:
receiving, by one or more hardware processors, a set of hyperparameters associated with a machine learning model, wherein each hyperparameter from the set of hyperparameters is associated with a first value comprising a X plane value and a Y plane value, wherein the first value being a range of values;
iteratively computing, by the one or more hardware processors, a set of gradient values for each hyperparameter, based on the first value and a second value, until a minimum validation error is obtained, by:
(i) evaluating a first set of validation errors for the first value associated with each hyperparameter from the set of hyperparameters;
(ii) evaluating the second value associated with each hyperparameter by adjusting at least one of the X plane value and the Y plane value corresponding to the first value, to a predefined magnitude;
(iii) evaluating a second set of validation errors for the second value associated with each hyperparameter; and
(iv) calculating a gradient for each hyperparameter, based on the first value, the second value, the first set of validation errors, and the second set of validation errors, wherein the second value is utilized as the first value in next iteration of computing the set of gradient values based on a predetermined gradient threshold;
calculating, by the one or more hardware processors, an updated value for each hyperparameter, based on the first value, the set of gradient values, a minimum inverse of the set of gradient values, and a learning rate, wherein the learning rate provides a speed of convergence of the machine learning model;
calculating, by the one or more hardware processors, a set of local minima associated with each hyperparameter by identifying a local minima for each random value from the range of values of the first value by constructing an X-Y plane with the set of updated values of each hyper parameter along an X plane and a set of validation errors corresponding to each hyper parameter along Y plane, wherein the set of validation errors includes the first set of validation errors and the second set of validation errors; and
calculating, by the one or more hardware processors, a global optimum associated with each hyperparameter based on a comparison of each local minima across the set of local minima associated with each hyperparameter.
2. The method as claimed in claim 1, wherein the machine learning model comprises a set of training data, a set of validation data and the set of hyperparameters.
3. The method as claimed in claim 1, wherein each local minima from the set of local minima is obtained for each random value selected from the range of values of the first value.
4. The method as claimed in claim 1, wherein the second value is obtained corresponding to the first value and the second value is utilized as the first value in the next iteration of computing the set of gradient values, only if the gradient associated with the second value is greater than the predetermined gradient threshold.
5. A hyperparameter optimization system, the system comprising:
one or more memories comprising programmed instructions and a repository for storing a set of hyperparameters associated with a machine learning model and a set of values associated with the set of hyperparameters;
one or more hardware processors operatively coupled to the one or more memories, wherein the one or more hardware processors are capable of executing the programmed instructions stored in the one or more memories; and
an optimization unit, wherein the optimization unit is configured to:
receive, a set of hyperparameters associated with a machine learning model, wherein each hyperparameter from the set of hyperparameters is associated with a first value comprising a X plane value and a Y plane value, wherein the first value being a range of values;
iteratively compute, a set of gradient values for each hyperparameter based on the first value and a second value until a minimum validation error is obtained, by:
(i) evaluating a first set of validation errors for the first value associated with each hyperparameter from the set of hyperparameters;
(ii) evaluating the second value associated with each hyperparameter by adjusting at least one of the X plane value and the Y plane value corresponding to the first value to a predefined magnitude;
(iii) evaluating a second set of validation errors for the second value associated with each hyperparameter; and
(iv) calculating a gradient for each hyper parameter based on the first value, the second value, the first set of validation errors and the second set of validation errors, wherein the second value is utilized as the first value in next iteration of computing the set of gradient values based on a predetermined gradient threshold;
calculate, an updated value for each hyperparameter based on the first value, the set of gradient values, a minimum inverse of the set of gradient values and a learning rate, wherein the learning rate provides a speed of convergence of the machine learning model;
calculate, a set of local minima associated with each hyperparameter by identifying a local minima for each random value from the range of values of the first value by constructing an X-Y plane with the set of updated values of each hyper parameter along an X plane and a set of validation errors corresponding to each hyper parameter along Y plane, wherein the set of validation errors includes the first set of validation errors and the second set of validation errors; and
calculate, a global optimum associated with each hyperparameter based on a comparison of each local minima across the set of local minima associated with each hyper parameter.
6. The system as claimed in claim 5, wherein the machine learning model comprises a set of training data, a set of validation data and the set of hyperparameters.
7. The system as claimed in claim 5, wherein each local minima from the set of local minima is obtained for each random value selected from the range of values of the first value.
8. The system as claimed in claim 5, wherein the second value is obtained corresponding to the first value and the second value is utilized as the first value in the next iteration of computing the set of gradient values, only if the gradient associated with the second value is greater than the predetermined gradient threshold. , Description:FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003
COMPLETE SPECIFICATION
(See Section 10 and Rule 13)
Title of invention:
SYSTEM AND METHOD FOR HYPERPARAMETER OPTIMIZATION
Applicant
Tata Consultancy Services Limited
A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th floor,
Nariman point, Mumbai 400021,
Maharashtra, India
The following specification particularly describes the invention and the manner in which it is to be performed.
TECHNICAL FIELD
The embodiments herein generally relates, in general, to machine learning and, in particular, to a system and method for hyperparameter optimization.
BACKGROUND
In machine learning, a hyperparameter is a parameter, wherein a value associated with the hyperparameter is set prior to beginning of a learning process. An optimal value of the hyperparameter is vital since a time required to train and test a machine learning model depends on the number of hyperparameters and the value associated with the hyperparameters.
Conventionally, the value associated with the hyperparameter of the machine learning model is selected based on a plurality of methods including a trial and error method, a grid search, random search method, Bayesian optimization and the like. In the trial and error method, the value is selected manually by an expert based on prior experience and expertise in a specific machine learning model. Here, the expert repeatedly conduct experiments on validation data sets to obtain the value associated with a hyperparameter. However, the value associated with the hyperparameter, obtained manually may not be an optimum value. Moreover, most of the conventional methods face challenges including overfitting, convergence, time consumption and the like.
SUMMARY
Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one embodiment, a method for hyperparameter optimization is provided. The method includes receiving, a set of hyperparameters associated with a machine learning model, wherein each hyperparameter from the set of hyperparameters is associated with a first value comprising a X plane value and a Y plane value, wherein the first value being a range of values, by the one or more hardware processors. Further, the method includes iteratively computing a set of gradient values for each hyperparameter, based on the first value and a second value, until a minimum validation error is obtained, by: (i) evaluating a first set of validation errors for the first value associated with each hyperparameter from the set of hyperparameters (ii) evaluating a second value associated with each hyperparameter by adjusting at least one of the X plane value and the Y plane value corresponding to the first value, to a predefined magnitude (iii) evaluating a second set of validation errors for the second value associated with each hyperparameter and (iv) calculating a gradient for each hyperparameter, based on the first value, the second value, the first set of validation errors, and the second set of validation errors, wherein the second value is utilized as the first value in next iteration of computing the set of gradient values based on a predetermined gradient threshold by the one or more hardware processors. Furthermore, the method includes calculating an updated value for each hyperparameter, based on the first value, the set of gradient values, a minimum inverse of the set of gradient values, and a learning rate, wherein the learning rate provides a speed of convergence of the machine learning model, by the one or more hardware processors. Furthermore, the method includes calculating , a set of local minima associated with each hyperparameter by identifying a local minima for each random value from the range of values of the first value by constructing an X-Y plane with the set of updated values of each hyper parameter along an X plane and a set of validation errors corresponding to each hyper parameter along Y plane, wherein the set of validation errors includes the first set of validation errors and the second set of validation errors, by the one or more hardware processors. Finally, the method includes calculating, a global optimum associated with each hyperparameter based on a comparison of each local minima across the set of local minima associated with each hyperparameter, by the one or more hardware processors.
In another aspect, a system for hyperparameter optimization is provided. The system includes one or more memories comprising programmed instructions and a repository for storing a set of hyperparameters associated with a machine learning model and a set of values associated with the set of hyperparameters, one or more hardware processors operatively coupled to the one or more memories, wherein the one or more hardware processors are capable of executing the programmed instructions stored in the one or more memories, an optimization unit, wherein the optimization unit is configured to receive, a set of hyperparameters associated with a machine learning model, wherein each hyperparameter from the set of hyperparameters is associated with a first value including a X plane value and a Y plane value, wherein the first value being a range of values. Further, the optimization unit is configured to iteratively compute, a set of gradient values for each hyperparameter based on the first value and a second value until a minimum validation error is obtained, by: (i) evaluating a first set of validation errors for the first value associated with each hyperparameter from the set of hyperparameters (ii) evaluating the second value associated with each hyperparameter by adjusting at least one of the X plane value and the Y plane value corresponding to the first value to a predefined magnitude (iii) evaluating a second set of validation errors for the second value associated with each hyperparameter and (iv) calculating a gradient for each hyper parameter based on the first value, the second value, the first set of validation errors and the second set of validation errors, wherein the second value is utilized as the first value in next iteration of computing the set of gradient values based on a predetermined gradient threshold. Furthermore the optimization unit is configured to calculate an updated value for each hyperparameter based on the first value, the set of gradient values, a minimum inverse of the set of gradient values and a learning rate, wherein the learning rate provides a speed of convergence of the machine learning model. Furthermore, the optimization unit is configured to calculate a set of local minima associated with each hyperparameter by identifying a local minima for each random value from the range of values of the first value by constructing an X-Y plane with the set of updated values of each hyper parameter along an X plane and a set of validation errors corresponding to each hyper parameter along Y plane, wherein the set of validation errors includes the first set of validation errors and the second set of validation errors. Finally, the optimization unit is configured to calculate, a global optimum associated with each hyperparameter based on a comparison of each local minima across the set of local minima associated with each hyper parameter.
In yet another aspect, a computer program product comprising a non-transitory computer-readable medium having embodied therein a computer program for system and method for hyperparameter optimization, is provided. The computer readable program, when executed on a computing device, causes the computing device to receive, a set of hyperparameters associated with a machine learning model, wherein each hyperparameter from the set of hyperparameters is associated with a first value including a X plane value and a Y plane value, wherein the first value being a range of values. Further, the computer readable program, when executed on a computing device, causes the computing device to iteratively compute a set of gradient values for each hyperparameter based on the first value and a second value until a minimum validation error is obtained, by: (i) evaluating a first set of validation errors for the first value associated with each hyperparameter from the set of hyperparameters (ii) evaluating the second value associated with each hyperparameter by adjusting at least one of the X plane value and the Y plane value corresponding to the first value to a predefined magnitude (iii) evaluating a second set of validation errors for the second value associated with each hyperparameter and (iv) calculating a gradient for each hyper parameter based on the first value, the second value, the first set of validation errors and the second set of validation errors, wherein the second value is utilized as the first value in next iteration of computing the set of gradient values based on a predetermined gradient threshold. Furthermore, the computer readable program, when executed on a computing device, causes the computing device to calculate an updated value for each hyperparameter based on the first value, the set of gradient values, a minimum inverse of the set of gradient values and a learning rate, wherein the learning rate provides a speed of convergence of the machine learning model. Furthermore, the computer readable program, when executed on a computing device, causes the computing device to calculate a set of local minima associated with each hyperparameter by identifying a local minima for each random value from the range of values of the first value by constructing an X-Y plane with the set of updated values of each hyper parameter along an X plane and a set of validation errors corresponding to each hyper parameter along Y plane, wherein the set of validation errors includes the first set of validation errors and the second set of validation errors. Finally, the computer readable program, when executed on a computing device, causes the computing device to calculate, a global optimum associated with each hyperparameter based on a comparison of each local minima across the set of local minima associated with each hyper parameter.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 illustrates a network environment implementing a system and method for hyperparameter optimization, according to some embodiments of the present disclosure.
FIG. 2 illustrates a block diagram of the system for hyperparameter optimization, according to some embodiments of the present disclosure,
FIG 3 depicts an example data point in the XY plane, according to some embodiments of the present disclosure.
FIG 4 depicts an example gradient calculation, according to some embodiments of the present disclosure.
FIG 5 depicts an example updated value, according to some embodiments of the present disclosure.
FIG 6 depicts an example global optimum, according to some embodiments of the present disclosure.
FIG 7 depicts an example global optimum associated with a combination of hyperparameters, according to some embodiments of the present disclosure.
FIG. 8 illustrates a flow diagram of a method 800 for the hyperparameter optimization, according to some embodiments of the present disclosure.
FIG. 9 illustrates a flow diagram of a method 900 for iteratively computing a set of gradient values, according to some embodiments of the present disclosure.
It should be appreciated by those skilled in the art that any block diagrams herein represent conceptual views of illustrative systems and devices embodying the principles of the present subject matter. Similarly, it will be appreciated that any flow charts, flow diagrams, and the like represent various processes which may be substantially represented in computer readable medium and so executed by a computer or processor, whether or not such computer or processor is explicitly shown.
DETAILED DESCRIPTION OF EMBODIMENTS
Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the spirit and scope of the disclosed embodiments. It is intended that the following detailed description be considered as exemplary only, with the true scope and spirit being indicated by the following claims.
The present subject matter overcomes the limitations of the conventional hyperparameter optimization methods by: receiving a set of hyperparameters associated with a machine learning model. Further, a set of gradient values associated with each hyperparameter are iteratively computed based on a first value, a second value and a predefined magnitude, wherein the second value is calculated corresponding to the first value. Further, an updated set of values are calculated for each hyperparameter and a set of local minima associated with each hyperparameter is obtained by utilizing the updated set of values and a set of validation errors. Further, a global optimum is obtained for each hyperparameter from the set of local minima. An implementation of the system and method for hyperparameter optimization is described further in detail with reference to FIGS. 1 through 9.
Referring now to the drawings, and more particularly to FIGS. 1 through 9, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.
FIG. 1 illustrates a network environment 100 implementing a system 102 for hyperparameter optimization, according to an example embodiment of the present subject matter. The system for hyperparameter optimization 102, hereinafter referred to as the system 102, is configured for receiving a set of hyperparameters associated with a machine learning model. The system 102 may be embodied in a computing device, for instance a computing device 104.
Although the present disclosure is explained considering that the system 102 is implemented on a server, it may be understood that the system 102 may also be implemented in a variety of computing systems, such as a laptop computer, a desktop computer, a notebook, a workstation, a cloud-based computing environment and the like. In one implementation, the system 102 may be implemented in a cloud-based environment. It will be understood that the system 102 may be accessed by multiple users through one or more user devices 106-1, 106-2... 106-N, collectively referred to as user devices 106 hereinafter, or applications residing on the user devices 106. Examples of the user devices 106 may include, but are not limited to, a portable computer, a personal digital assistant, a handheld device, a Smartphone, a Tablet Computer, a workstation and the like. The user devices 106 are communicatively coupled to the system 102 through a network 108.
In an embodiment, the network 108 may be a wireless or a wired network, or a combination thereof. In an example, the network 108 can be implemented as a computer network, as one of the different types of networks, such as virtual private network (VPN), intranet, local area network (LAN), wide area network (WAN), the internet, and such. The network 108 may either be a dedicated network or a shared network, which represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), and Wireless Application Protocol (WAP), to communicate with each other. Further, the network 108 may include a variety of network devices, including routers, bridges, servers, computing devices, storage devices. The network devices within the network 108 may interact with the system 102 through communication links.
As discussed above, the system 102 may be implemented in a computing device 104, such as a hand-held device, a laptop or other portable computer, a tablet computer, a mobile phone, a PDA, a smartphone, and a desktop computer. The system 102 may also be implemented in a workstation, a mainframe computer, a server, and a network server. In an embodiment, the system 102 may be coupled to a data repository, for example, a repository 112. The repository 112 may store data processed, received, and generated by the system 102. In an alternate embodiment, the system 102 may include the data repository 112. The components and functionalities of the system 102 are described further in detail with reference to FIG. 2.
FIG. 2 illustrates a block diagram of the system for hyperparameter optimization, according to some embodiments of the present disclosure. The hyperparameter optimization system 200 (hereinafter referred to as system 200) may be an example of the system 102 (FIG. 1). In an example embodiment, the system 200 may be embodied in, or is in direct communication with the system, for example the system 102 (FIG. 1). The system 200 includes or is otherwise in communication with one or more hardware processors such as a processor 202, at least one memory such as a memory 204, an I/O interface 206 and an optimization unit 250. In an embodiment, the optimization unit 250 can be implemented as a standalone unit in the system 200 comprising a gradient calculation module (not shown in FIG. 2), an updated values calculation module (not shown in FIG. 2), a local minima calculation module (not shown in FIG. 2), and a global optimum calculation module (not shown in FIG. 2). In another embodiment, the optimization unit 250 can be implemented as a module in the memory 204 comprising the gradient calculation module (not shown in FIG. 2), the updated values calculation module (not shown in FIG. 2), the local minima calculation module (not shown in FIG. 2), and the global optimum calculation module (not shown in FIG. 2). The processor 202, memory 204, and the I/O interface 206 may be coupled by a system bus such as a system bus 208 or a similar mechanism.
The I/O interface 206 may include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like. The interfaces 206 may include a variety of software and hardware interfaces, for example, interfaces for peripheral device(s), such as a keyboard, a mouse, an external memory, a camera device, and a printer. Further, the interfaces 206 may enable the system 102 to communicate with other devices, such as web servers and external databases. The interfaces 206 can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example, local area network (LAN), cable, etc., and wireless networks, such as Wireless LAN (WLAN), cellular, or satellite. For the purpose, the interfaces 206 may include one or more ports for connecting a number of computing systems with one another or to another server computer. The I/O interface 206 may include one or more ports for connecting a number of devices to one another or to another server.
The hardware processor 202 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the hardware processor 202 is configured to fetch and execute computer-readable instructions stored in the memory 204.
The memory 204 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. In an embodiment, the memory 204 includes a plurality of modules 220 and a repository 240 for storing data processed, received, and generated by one or more of the modules 220 and the optimization unit 250. The modules 220 may include routines, programs, objects, components, data structures, and so on, which perform particular tasks or implement particular abstract data types.
The memory 204 also includes module(s) 220 and a data repository 240. The module(s) 220 include programs or coded instructions that supplement applications or functions performed by the hyperparameter optimization system 200. The modules 220, amongst other things, can include routines, programs, objects, components, and data structures, which perform particular tasks or implement particular abstract data types. The modules 220 may also be used as, signal processor(s), state machine(s), logic circuitries, and/or any other device or component that manipulates signals based on operational instructions. Further, the modules 220 can be used by hardware, by computer-readable instructions executed by a processing unit, or by a combination thereof. The modules 220 can include various sub-modules (not shown). The modules 220 may include computer-readable instructions that supplement applications or functions performed by the hyperparameter optimization system 200.
The data repository 240 may include received set of hyperparameters associated with a machine learning model 242, a hyperparameter value database 244, a machine learning model database 246 and other data 248. Further, the other data 248 amongst other things, may serve as a repository for storing data that is processed, received, or generated as a result of the execution of one or more modules in the module(s) 220 and the modules associated with the optimization unit 250. The repository 240 is further configured to maintain a one or more values associated with each hyperparameter from the set of hyperparameters stored in the data repository 240.
Although the data repository 240 is shown internal to the hyperparameter optimization system 200, it will be noted that, in alternate embodiments, the data repository 240 can also be implemented external to the hyperparameter optimization system 200, where the data repository 240 may be stored within a database (not shown in FIG. 2) communicatively coupled to the hyperparameter optimization system 200. The data contained within such external database may be periodically updated. For example, new data may be added into the database (not shown in FIG. 2) and/or existing data may be modified and/or non-useful data may be deleted from the database (not shown in FIG. 2). In one example, the data may be stored in an external system, such as a Lightweight Directory Access Protocol (LDAP) directory and a Relational Database Management System (RDBMS). In another embodiment, the data stored in the data repository 240 may be distributed between the hyperparameter optimization system 200 and the external database.
The optimization unit 250 of the hyperparameter optimization system 200 can be configured to receive the set of hyperparameters associated with the machine learning model. For example, the set of hyperparameters associated with the machine learning model includes number of depth or leaves of a tree, number of latent factors in a matrix factorization, a learning rate, number of hidden layers in a deep neural network, number of clusters in a k-means clustering etc. Additionally, the machine learning model includes a training data, a test data and a validation data. Here, each hyperparameter from the set of hyperparameters is associated with a first value. The first value includes an X plane value and a Y plane value. Further, the first value can be a single value or a range of values. In an embodiment, let the set of hyperparameter of a discrete function be represented by h = h_1,h_2,h_3………h_n and the one or more values associated with each hyperparameter being ? =?_1 ,?_2,?_3………?_m. In an embodiment, the one or more values can be a single value or a range of values. The single value or the range of values includes the X plane value and the Y plane value as shown in FIG. 3. FIG 3 depicts an example data point in the XY plane, according to some embodiments of the present disclosure. Now, referring to FIG. 3, the discrete function 310 incudes a plurality of data points and the value ?_1 associated with a data point from the plurality of data points is denoted by a black dot. Here, the data point includes the X plane value as 5 and the Y plane value as 1.35.
Further, the optimization unit 250 of the hyperparameter optimization system 200 can be further configured to iteratively compute a set of gradient values for each hyper parameter based on the first value and a second value until a minimum validation error is obtained. Here, the second value ?_12 is obtained corresponding to the first value ?_11 as follows: Here, during a first iteration, a first set of validation errors VE_1 for the first value ?_11 associated with each hyper parameter from the set of hyper parameters is evaluated. Further, the second value ?_12 associated with each hyper parameter is calculated by adjusting at least one of the X plane value and the Y plane value corresponding to the first value ?_11 to a predefined magnitude. In an embodiment, the predefined magnitude can be in a range of 9% to 15% of the X plane value or the Y plane value. Further, a second set of validation errors VE_2 for the second value ?_12 associated with each hyper parameter is calculated. Further, a gradient ? is calculated for each hyper parameter based on the first value ?_11, the second value ?_12, the first set of validation errors VE_1 and the second set of validation errors VE_2 at the end of the first iteration. The gradient ? indicates a slope of error between the first value ?_11 and the second value ?_12 associated with each hyperparameter as shown in FIG. 4. FIG 4 depicts an example gradient calculation, according to some embodiments of the present disclosure. Now, referring to FIG. 4, h indicates the first value ?_11 and h_new indicated the second value ?_12. Here, the value associated with the hyperparameter can be changed, only if the gradient is high and the value associated with the hyperparameter can be changed based on the predefined magnitude. Hence, the gradient ? obtained is tested to satisfy a predefined gradient threshold and the second value of the first iteration can be the first value for the next iteration only if the gradient is greater than the predefined gradient threshold. The iterations can be repeated as described above until a minimum validation error is obtained. In and embodiment, the formula to calculate the gradient ? of the hyperparameter h_1 is given by equation 1.
?h_1 = (?VE?_1- ?VE?_2)/(?_11- ?_12 ) --------------------------------- (1)
In equation 1, ?h_1 indicates the gradient value associated with a hyperparameter from the set of hyperparameters during an iteration. Similarly, the gradient is calculated for each iteration, resulting in a set of gradient values associated with each hyperparameter for a first value selected at random from the set of one or more values.
Further, the optimization unit 250 of the hyperparameter optimization system 200 can be configured to calculate, an updated value for each hyper parameter based on the first value ?_11, the set of gradient values ?, a minimum inverse of the set of gradient values associated with each hyper parameter and a learning rate as shown in FIG. 5. FIG 5 depicts an example updated value, according to some embodiments of the present disclosure. The learning rate determines a speed of convergence of the gradient ?. The formula to calculate the updated value of each hyperparameter is given by the equation 2.
h_1new= h_1- ??h?_1*min??(inverse of gradients)*learning rate? ------- (2)
Where, h_1new is the updated value for the hyperparameter, h_1 is the old value of the hyperparameter and ??h?_1 is the gradient associated with the hyperparameter. Now referring to FIG. 5, the data point ‘h’ indicates an initial value associated with the hyperparameter, with the X plane value as 6 and the Y plane value as 1.5. Further, the updated value associated with hyperparameter is represented as h_update with the X plane value as 3.5 and the Y plane value as 1.34. For brevity of description, the terms h_update and h_1new can be alternatively used. The updated values are further rounded to floor or ceiling. Further, a set of validation errors are calculated for the set of updated values associated with each hyperparameter.
Further the optimization unit 250 of the hyperparameter optimization system 200 can be configured to calculate, a set of local minima associated with each hyper parameter. Here, an XY plane is constructed with the set of updated values along X plane and the set of validation errors associated with the set of updated values along the Y plane. Here, a data point with a minimum validation error is selected as the local minima. Similarly, a set of local minima are calculated for each hyperparameter for each first value selected at random from the set of one or more values.
Further the optimization unit 250 of the hyperparameter optimization system 200 can be configured to calculate, a global optimum associated with each hyper parameter based on a comparison of each local minima across the set of local minima associated with each hyper parameter. FIG 6 depicts an example global optimum 606, according to some embodiments of the present disclosure. Now, referring to FIG. 6, the data points 602, 604, 606 and 608 represents an example set of local minima and the data point 606 is the global optimum value associated with the hyperparameter.
In an embodiment, two or more hyperparameters are combined together and the set of updated values associated with the combined hyperparameters are calculated. Further, a set of validation errors are calculated for the combined hyperparameters and a global optimum can be obtained. FIG. 7 depicts an example global optimum associated with a combination of hyperparameters, according to some embodiments of the present disclosure. Now referring to FIG. 7, an error surface is depicted for the machine learning model as a function of two hyperparameters in a 3 dimensional surface. Here, the validation error is plotted in a Z Plane and the two hyperparameters are plotted in an X plane and a Y plane accordingly. Here, “Episodic iter1” depicts a path traced on the error surface for calculating a first local minimum. Each episodic iteration begins with a different starting point on the error surface and continues till a local minimum is obtained. Similarly, “Episodic iter2” depicts the path traced on the error surface to calculate a second local optimum. Here, the data point 702 is the local minima obtained in “Episodic iter1” and the data point 704 is the local minima obtained in “Episodic iter2”. Similarly, the set of local minima are obtained from a plurality of iterations and a minimum among the set of local minima is selected as the global optimum.
FIG. 8 illustrates a flow diagram of a method 800 for the hyperparameter optimization, according to some embodiments of the present disclosure. The method 800 may be described in the general context of computer executable instructions. Generally, computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, etc., that perform particular functions or implement particular abstract data types. The method 800 may also be practiced in a distributed computing environment where functions are performed by remote processing devices that are linked through a communication network. The order in which the method 800 is described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method 800, or an alternative method. Furthermore, the method 800 can be implemented in any suitable hardware, software, firmware, or combination thereof.
At 802, the system 200 receives, by the one or more hardware processors, a set of hyper parameters associated with a machine learning model, wherein each hyper parameter from the set of hyper parameters is associated with a first value including a X plane value and a Y plane value, wherein the first value being a range of values. At 804, the system 200 iteratively computes, by the one or more hardware processors, a set of gradient values for each hyper parameter based on the first value and a second value until a minimum validation error is obtained, by: (i) Evaluating, a first set of validation errors for the first value associated with each hyper parameter from the set of hyper parameters (ii) evaluating a second value associated with each hyper parameter by adjusting at least one of the X plane value and the Y plane value corresponding to the first value to a predefined magnitude (iii) Evaluating, a second set of validation errors for the second value associated with each hyper parameter and(iv) Calculating a gradient for each hyper parameter based on the first value, the second value, the first set of validation errors and the second set of validation errors, wherein the second value is utilized as the first value in next iteration of computing the set of gradient values based on a predetermined gradient threshold. At 806, the system 200 calculates, by the one or more hardware processors, an updated value for each hyper parameter based on the first value, the set of gradient values, a minimum inverse of the set of gradient values and a learning rate, wherein the learning rate provides a speed of convergence of the machine learning model. At 808, the system 200 calculates, by the one or more hardware processors, a set of local minima associated with each hyper parameter by identifying a local minima for each random value from the range of values by constructing the XY plane with the set of updated values of each hyper parameter along an X plane and a set of validation errors corresponding to each hyper parameter along Y plane, wherein the set of validation errors includes the first set of validation errors and the second set of validation errors. At 810, the system 200 calculate, by the one or more hardware processors, a global optimum associated with each hyper parameter based on a comparison of each local minima across the set of local minima associated with each hyper parameter.
FIG. 9 illustrates a flow diagram of a method 900 for iteratively computing a set of gradient values, according to some embodiments of the present disclosure. At 902, the system 200 evaluates, by the one or more hardware processors, a first set of validation errors for the first value associated with each hyper parameter from the set of hyper parameters. At 904, the system 200 evaluates, by the one or more hardware processors, a second value associated with each hyper parameter by adjusting at least one of the X plane value and the Y plane value corresponding to the first value to a predefined magnitude. At 906, the system 200 evaluates, by the one or more hardware processors, a second set of validation errors for the second value associated with each hyper parameter. At 908, the system 200 calculates, by the one or more hardware processors, a gradient for each hyper parameter based on the first value, the second value, the first set of validation errors and the second set of validation errors. At 908, the system 200 tests, by the one or more hardware processors, whether a minimum validation error is obtained and if the minimum validation error is not obtained, the second value is utilized as the first value in next iteration.
In an embodiment, a random forest machine learning model is experimented as follows: The set of hyperparameters associated with the random forest machine learning model includes, “a number of trees”, “a depth associated with each tree”, “a number of random features” and “a randomized seed”. Here, the values associated with said parameters are one dimensional. For example, the value associated with the hyperparameter “the number of trees” is within range from 100 to 1000 and needs to be optimized. The value associated with the hyperparameters, “the depth associated with each tree”, “the number of random features” and “the randomized seed” are associated with a single individual value. For example, in the present disclosure, the hyperparameter “the number of trees” is taken as the X plane value and a set of validation errors are taken as the Y plane value. Further, the system 200 randomly starts from a random value 800 associated with the X plane with the magnitude of 10 %. Here, the first validation error associated with the X plane value of 800 is calculated as 1321.6. Further, 10% of 800 is deducted from 800, resulting in 720. Further, the second validation error for the X plane value of 720 is computed by running the machine learning model on the original data and the second validation error is calculated as 1026.5. Further, the gradient is calculated as follows based on the equation 1, for example, (1026.5-1321.6)/ (720-800) = 3.688. In the similar manner, the set of gradients are calculated at each point separately to obtain a local minima. Further, a plurality of values associated with the hyperparameter “the number of trees” can be selected at random and the above computation can be repeated to obtain the set of local minima.
The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
Various embodiments disclosed methods and system for hyperparameter optimization are able to provide an end-to-end solution for optimizing the values associated with the set of hyperparameter with minimum convergence time, less validation error and thus increased the accuracy of the system 200. Here, the method of selecting a next value based on the magnitude, leads to faster convergence and less validation errors.
It is, however to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g. any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g. hardware means like e.g. an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g. an ASIC and an FPGA, or at least one microprocessor and at least one memory with software modules located therein. Thus, the means can include both hardware means and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g. using a plurality of CPUs.
The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various modules described herein may be implemented in other modules or combinations of other modules. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope and spirit of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
It is intended that the disclosure and examples be considered as exemplary only, with a true scope and spirit of disclosed embodiments being indicated by the following claims.
| # | Name | Date |
|---|---|---|
| 1 | 201821025560-STATEMENT OF UNDERTAKING (FORM 3) [09-07-2018(online)].pdf | 2018-07-09 |
| 2 | 201821025560-FORM 1 [09-07-2018(online)].pdf | 2018-07-09 |
| 3 | 201821025560-FIGURE OF ABSTRACT [09-07-2018(online)].jpg | 2018-07-09 |
| 4 | 201821025560-DRAWINGS [09-07-2018(online)].pdf | 2018-07-09 |
| 5 | 201821025560-COMPLETE SPECIFICATION [09-07-2018(online)].pdf | 2018-07-09 |
| 6 | 201821025560-FORM 18 [16-07-2018(online)].pdf | 2018-07-16 |
| 7 | 201821025560-FORM-26 [05-09-2018(online)].pdf | 2018-09-05 |
| 8 | Abstract1.jpg | 2018-09-06 |
| 9 | 201821025560-Proof of Right (MANDATORY) [06-09-2018(online)].pdf | 2018-09-06 |
| 10 | 201821025560-ORIGINAL UR 6(1A) FORM 1 &FORM 26-120918.pdf | 2019-02-13 |
| 11 | 201821025560-OTHERS [04-06-2021(online)].pdf | 2021-06-04 |
| 12 | 201821025560-FER_SER_REPLY [04-06-2021(online)].pdf | 2021-06-04 |
| 13 | 201821025560-COMPLETE SPECIFICATION [04-06-2021(online)].pdf | 2021-06-04 |
| 14 | 201821025560-CLAIMS [04-06-2021(online)].pdf | 2021-06-04 |
| 15 | 201821025560-FER.pdf | 2021-10-18 |
| 16 | 201821025560-PatentCertificate05-12-2023.pdf | 2023-12-05 |
| 17 | 201821025560-IntimationOfGrant05-12-2023.pdf | 2023-12-05 |
| 1 | SearchStrategyMatrix201821025560E_02-12-2020.pdf |