Method And System For Computing Model Representation Of Machine

< Back

Method And System For Computing Model Representation Of Machine Learning Development Process

Abstract: ABSTRACT METHOD AND SYSTEM FOR COMPUTING MODEL REPRESENTATION OF MACHINE LEARNING DEVELOPMENT PROCESS Current approaches are focused on using experiment tracking by storing different hyperparameters and their outcome alongside conventional metrics that are tracked. However, with increasing use of more complex architectures, other parameters, such as weights, architecture and training which defines the learning capabilities of a model are missed. Few available techniques keep track of metadata associated with a machine learning model. However, no technique has yet captured all the properties of the ML model. Present disclosure provides method and system for computing model representation of machine learning development process. The system encodes the model performance as a unique vector representation. In particular, the system keep track of all the properties of training process through vector representation i.e., mathematical representation. The generated vector representation can be easily stored in relational databases and thus can be easily consumed by the downstream applications. [To be published with FIG. 3]

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

27 April 2023

Publication Number

44/2024

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

Tata Consultancy Services Limited

Nirmal Building, 9th floor, Nariman point, Mumbai 400021, Maharashtra, India

Inventors

1. BIHANI, Ayush

Tata Consultancy Services Limited, Ecospace - 1B STP, Action Area II, Newtown, Kolkata 700135, West Bengal, India

2. KATHURIA, Divyanshi

Tata Consultancy Services Limited, 4th & 5th Floor, PTI Building, 4 Parliament Street, New Delhi 110001, Delhi, India

3. KALELE, Amit

Tata Consultancy Services Limited, S2 Sahyadri park, Rajiv Gandhi infotech park, phase 3, Hinjewadi, Pune 411057, Maharashtra, India

Specification

Description:FORM 2

THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003

COMPLETE SPECIFICATION
(See Section 10 and Rule 13)

Title of invention:

METHOD AND SYSTEM FOR COMPUTING MODEL REPRESENTATION OF MACHINE LEARNING DEVELOPMENT PROCESS

Applicant

Tata Consultancy Services Limited
A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th floor,
Nariman point, Mumbai 400021,
Maharashtra, India

Preamble to the description:

The following specification particularly describes the invention and the manner in which it is to be performed.
TECHNICAL FIELD
[001] The disclosure herein generally relates to model representation computation, and, more particularly, to a method and a system for computing a model representation of a machine learning development process.

BACKGROUND
[002] The use of machine learning (ML) and artificial intelligence (AI) powered software services have grown exponentially in the past few years and may continue to grow in the foreseeable future. Unlike traditional software solution development process, the ML powered software development requires rigorous experimentation in every phase of the development process as hyperparameters, data, model architecture etc., plays a crucial role in the process as they have an impact on its performance. Even for calculating a right set of hyperparameters for optimizing a single metric in the ML development process, a lot of experiments needs to be performed thereby making it different from the traditional software development process. Further, in ML, certain additional components such as model being trained, data being used, training and computational resource metrics, model architecture are also used in defining learning capabilities of a machine learning model. So, the ML development process is much more dynamic as compared to the traditional process, and its performance does not behave in a conventional way.
[003] Traditionally, software services have used monitoring solutions to keep track of their performance and metrics like, latency, throughput, hardware utilization details etc. This way of keeping track of historical information is popular since it allows to help with other downstream tasks, such as building of performance models, right sizing of service, selection of appropriate hardware, software optimization, etc. However, improved ways of tracking experiment are required in case of the ML since a machine learning component is not static and has a dynamic nature due to the use of additional components in the development process.
[004] Further, several monitoring systems are available for keeping track of metadata generated across different phases of ML model. The available monitoring systems majorly focus on model management by capturing metadata like dataset version, pointers to actual storage, location of the dataset and the model, hyperparameters list, etc., as it leads to greater visibility into the ML models by storing historical experiment information. However, they missed other important factors, such as weights, architecture, and the data on which the model is trained on, thereby making the stored experiment data ineligible for certain downstream tasks.
[005] Additionally, current solutions do not focus on defining an individual representation for data and model in mathematical form which can be easily consumed by downstream applications and can be stored in relational databases like MySQL®, PostgreSQL®, etc.

SUMMARY
[006] Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one aspect, there is provided a processor implemented method for computing a model representation of a machine learning development process. The method comprises receiving, by a model representation computation system (MRCS) via one or more hardware processors, a model object, a dataset and one or more user inputs from a user device, wherein the model object is a machine learning model, and wherein the dataset represents data used for training the machine learning model, and wherein the one or more user inputs comprises one or more of: a memory requirement, a processor type, a machine learning problem, one or more model tags, and one or more user recommended practices; estimating, by the MRCS via the one or more hardware processors, a dataset profile summary using one or more predefined functions; estimating, by the MRCS via the one or more hardware processors, a floating-point operations per second (FLOPS) performed for each pass of one or more passes performed for the model object using a customized flop estimation technique; creating, by the MRCS via the one or more hardware processors, a model configuration file using the model object, wherein the model configuration file comprises an architectural information of the model object, and wherein the architectural information comprises details of types of one or more layers used in the model object, and details of one or more layer parameters used in each layer of the one or more layers of the model object; parsing, by the MRCS via the one or more hardware processors, the model configuration file using a parsing technique to extract a layer wise information of each layer of the one or more layers of the model object, wherein the layer wise information comprises a type of a layer and a count of the layer; generating, by the MRCS via the one or more hardware processors, one or more vector embeddings for the model configuration file; combining, by the MRCS via the one or more hardware processors, the dataset profile summary, the estimated FLOPS, the extracted layer wise information and the generated one or more vector embedding to generate a model training signature, wherein the model training signature is a mathematical representation of the machine learning model, the dataset and computational information associated with the machine learning model; capturing, by the MRCS via the one or more hardware processors, hardware utilization metrics and a hardware information from the model object and the dataset using an experiment information tracking tool; and encoding, by the MRCS via the one or more hardware processors, the received one or more user inputs, the model training signature, the hardware utilization metrics and the hardware information to compute a model representation of the machine learning model, wherein the model representation represents one or more properties of the machine learning model, the dataset, the computational information and hardware related information.
[007] In another aspect, there is provided a model representation computation system for computing a model representation of a machine learning development process. The system comprises a memory storing instructions; one or more communication interfaces; and one or more hardware processors coupled to the memory via the one or more communication interfaces, wherein the one or more hardware processors are configured by the instructions to: receive a model object, a dataset and one or more user inputs from a user device, wherein the model object is a machine learning model, and wherein the dataset represents data used for training the machine learning model, and wherein the one or more user inputs comprises one or more of: memory requirement, processor type, machine learning problem, one or more model tags, and one or more user recommended practices; estimate a dataset profile summary using one or more predefined functions; estimate a floating-point operations per second (FLOPS) performed for each pass of one or more passes performed for the model object using a customized flop estimation technique; create a model configuration file using the model object, wherein the model configuration file comprises an architectural information of the model object, and wherein the architectural information comprises details of types of one or more layers used in the model object, and details of one or more layer parameters used in each layer of the one or more layers of the model object; parse the model configuration file using a parsing technique to extract a layer wise information of each layer of the one or more layers of the model object, wherein the layer wise information comprises a type of a layer and a count of the layer; generate one or more vector embeddings for the model configuration file; combine the dataset profile summary, the estimated FLOPS, the extracted layer wise information and the generated one or more vector embedding to generate a model training signature, wherein the model training signature is a mathematical representation of the machine learning model, the dataset and computational information associated with the machine learning model; capture hardware utilization metrics and a hardware information from the model object and the dataset using an experiment information tracking tool; and encode the received one or more user inputs, the model training signature, the hardware utilization metrics and the hardware information to compute a model representation of the machine learning model, wherein the model representation represents one or more properties of the machine learning model, the dataset, the computational information and hardware related information.
[008] In yet another aspect, there are provided one or more non-transitory machine-readable information storage mediums comprising one or more instructions which when executed by one or more hardware processors cause a method for computing a model representation of a machine learning development process. The method comprises receiving, by a model representation computation system (MRCS) via one or more hardware processors, a model object, a dataset and one or more user inputs from a user device, wherein the model object is a machine learning model, and wherein the dataset represents data used for training the machine learning model, and wherein the one or more user inputs comprises one or more of: a memory requirement, a processor type, a machine learning problem, one or more model tags, and one or more user recommended practices; estimating, by the MRCS via the one or more hardware processors, a dataset profile summary using one or more predefined functions; estimating, by the MRCS via the one or more hardware processors, a floating-point operations per second (FLOPS) performed for each pass of one or more passes performed for the model object using a customized flop estimation technique; creating, by the MRCS via the one or more hardware processors, a model configuration file using the model object, wherein the model configuration file comprises an architectural information of the model object, and wherein the architectural information comprises details of types of one or more layers used in the model object, and details of one or more layer parameters used in each layer of the one or more layers of the model object; parsing, by the MRCS via the one or more hardware processors, the model configuration file using a parsing technique to extract a layer wise information of each layer of the one or more layers of the model object, wherein the layer wise information comprises a type of a layer and a count of the layer; generating, by the MRCS via the one or more hardware processors, one or more vector embeddings for the model configuration file; combining, by the MRCS via the one or more hardware processors, the dataset profile summary, the estimated FLOPS, the extracted layer wise information and the generated one or more vector embedding to generate a model training signature, wherein the model training signature is a mathematical representation of the machine learning model, the dataset and computational information associated with the machine learning model; capturing, by the MRCS via the one or more hardware processors, hardware utilization metrics and a hardware information from the model object and the dataset using an experiment information tracking tool; and encoding, by the MRCS via the one or more hardware processors, the received one or more user inputs, the model training signature, the hardware utilization metrics and the hardware information to compute a model representation of the machine learning model, wherein the model representation represents one or more properties of the machine learning model, the dataset, the computational information and hardware related information.
[009] It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

BRIEF DESCRIPTION OF THE DRAWINGS
[010] The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles:
[011] FIG. 1 is an example representation of an environment, related to at least some example embodiments of the present disclosure.
[012] FIG. 2 illustrates an exemplary block diagram of a system for computing a model representation of a machine learning development process, in accordance with an embodiment of the present disclosure.
[013] FIG. 3 illustrates a schematic block diagram representation of a model representation computation process for computing a model representation of a machine learning model, in accordance with an embodiment of the present disclosure.
[014] FIGS. 4A and 4B, collectively, illustrate an exemplary flow diagram of a method for computing the model representation of the machine learning development process, in accordance with an embodiment of the present disclosure.
[015] FIG. 5 illustrates a schematic representation depicting process of building a knowledge repository, in accordance with an embodiment of the present disclosure.

DETAILED DESCRIPTION OF EMBODIMENTS
[016] Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments.
[017] As discussed earlier, adoption of artificial intelligence is increasing at an exponential rate. However, unlike the traditional software development process, machine learning (ML) development poses multiple challenges as ML development is much more dynamic and its performance does not behave in a conventional way.
[018] Machine learning metadata store is a place where the metadata generated from an end-to-end machine learning pipeline is stored for future reference. Metadata in ML can be generated from any stage of the pipeline including details such as model parameters, metrics, model configurations, and data versions. These details are essential for experiment comparison, quick rollbacks, reduction of model downtime, retraining, and many more critical functions. The parameters, such as hyperparameters, data, model architecture etc., involved in the model development process play a crucial role as it impacts the performance of the model and finding optimal configuration of the parameters requires lots of experimentation that needs to be done.
[019] So, this growth in large number of experiments led to building solutions that could keep track of the vast number of inputs, hyperparameters and input data related to the ML development process which can further be stored in the ML metadata store.
[020] Some solutions are focused on using experiment tracking by storing the different hyperparameters and their outcome alongside conventional metrics, but they do not fully leverage the architectural representation of the model which is the key component of the model metadata. And the impact of the architectural representation on the final solution is also not considered by these solutions which further limits the use of already run experiments for further downstream tasks.
[021] Further, some solutions generate vector embeddings for code for tracking purposes. However, the solutions cannot be used for deep learning (DL) model architectures as these models do not generate correct and meaningful tokens since the DL model architecture code is different from the general python language code. Additionally, some available techniques explore the computation times of the training process. However, these systems do not combine this information with other properties of training process like the model being trained and the dataset used.
[022] So, a technique that can compute a model representation which captures all the metadata, metrics, resources, architectural information, and computation requirements of the ML model and can be easily consumed by downstream applications and can easily be stored in relational databases is still to be explored.
[023] Embodiments of the present disclosure overcome the above-mentioned disadvantages by providing a method and a system for computing a model representation of a machine learning development process. The system of the present disclosure provides a novel way of representing the model architecture and model performance by encoding them as a unique vector representation. The vector representation stores the runtime experiment details, architecture details, computational details, outcomes, and the model training signatures. The system then encodes the unique vector representation along with user best practices and the resource metric before storing it in a training knowledge repository. This training knowledge repository can then be used for further downstream tasks, such as model similarity checking, building performance models, etc. The complete process of computing the unique vector representation is explained in detail with reference to FIG. 4.
[024] In the present disclosure, the system and the method track the properties of the training process through numerical process i.e., the properties are represented in form of vectors, thereby ensuring easy storage and retrieval of properties in relational databases as the properties can be analyzed using general structured query language (SQL) queries. Further, the easy storage and retrieval also increases the usability of metadata for the downstream tasks. Additionally, the system also captures the semantic information of the model architecture along with other usual information, thereby improving accuracy of the tracked information which further helps in saving time and cost as number of experiments performed will drastically reduce.
[025] Referring now to the drawings, and more particularly to FIGS. 1 through 5, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.
[026] FIG. 1 illustrates an exemplary representation of an environment 100 related to at least some example embodiments of the present disclosure. Although the environment 100 is presented in one arrangement, other embodiments may include the parts of the environment 100 (or other parts) arranged otherwise depending on, for example, estimating dataset profile summary, creating model configuration file, etc. The environment 100 generally includes a model representation computation system (hereinafter referred as ‘MRCS’) 102, and an electronic device, such as an electronic device 106 (hereinafter also referred as user device 106), each coupled to, and in communication with (and/or with access to) a network 104. It should be noted that one user device is shown for the sake of explanation; there can be more number of user devices.
[027] The network 104 may include, without limitation, a light fidelity (Li-Fi) network, a local area network (LAN), a wide area network (WAN), a metropolitan area network (MAN), a satellite network, the Internet, a fiber optic network, a coaxial cable network, an infrared (IR) network, a radio frequency (RF) network, a virtual network, and/or another suitable public and/or private network capable of supporting communication among two or more of the parts or users illustrated in FIG. 1, or any combination thereof.
[028] Various entities in the environment 100 may connect to the network 104 in accordance with various wired and wireless communication protocols, such as Transmission Control Protocol and Internet Protocol (TCP/IP), User Datagram Protocol (UDP), 2nd Generation (2G), 3rd Generation (3G), 4th Generation (4G), 5th Generation (5G) communication protocols, Long Term Evolution (LTE) communication protocols, or any combination thereof.
[029] The user device 10 is associated with a user (e.g., a user of the model representation computation system) who wants to compute a model representation for a machine learning model. Examples of the user device 106 include, but are not limited to, a personal computer (PC), a mobile phone, a tablet device, a Personal Digital Assistant (PDA), a server, a voice activated assistant, a smartphone, and a laptop.
[030] The model representation computation system (hereinafter referred as ‘MRCS’) 102 includes one or more hardware processors and a memory. The MRCS 102 is configured to receive a model object i.e., a trained machine learning (ML) model, a dataset and one or more user inputs from the user device 106 via the network 104. The MRCS 102 then generates a model training signature for the model object based on the received information. It should be noted that the model training signature is mathematical representation of the model, its dataset and computational information. In particular, the model training signature captures essential properties that define training of the ML model. The process of generating the model training signature is explained in detail with reference to FIG. 4.
[031] Once the model training signature for the model object is generated, the MRCS 102 captures hardware utilization metrics and a hardware information from the model object and the dataset using an experiment information tracking tool. Examples of the experiment information tracking tool that can be used for the same purpose include, but are not limited to, OpSense, mlflow, and the like.
[032] Thereafter, the MRCS 102 encodes the received one or more user inputs, the model training signature, the hardware utilization metrics, and the hardware information to compute a model representation of the trained ML model i.e., the model object. The model representation is representative of the entire training process of the ML model and also captures one or more properties of the trained ML model, the dataset, the computational information, and hardware related information. In an embodiment, the MRCS 102 stores the model representation of the trained ML model in a training knowledge repository from where it can be used for further downstream tasks.
[033] The number and arrangement of systems, devices, and/or networks shown in FIG. 1 are provided as an example. There may be additional systems, devices, and/or networks; fewer systems, devices, and/or networks; different systems, devices, and/or networks; and/or differently arranged systems, devices, and/or networks than those shown in FIG. 1. Furthermore, two or more systems or devices shown in FIG. 1 may be implemented within a single system or device, or a single system or device shown in FIG. 1 may be implemented as multiple, distributed systems or devices. Additionally, or alternatively, a set of systems (e.g., one or more systems) or a set of devices (e.g., one or more devices) of the environment 100 may perform one or more functions described as being performed by another set of systems or another set of devices of the environment 100 (e.g., refer scenarios described above).
[034] FIG. 2 illustrates an exemplary block diagram of a model representation computation system 200 for computing a model representation of a machine learning development process in accordance with an embodiment of the present disclosure. In an embodiment, the model representation computation system 200 may also be referred as system 200 and may be interchangeably used herein. The model representation computation system 200 is similar to the MRCS 102 explained with reference to FIG. 1. In some embodiments, the system 200 is embodied as a cloud-based and/or SaaS-based (software as a service) architecture. In some embodiments, the system 200 may be implemented in a server system. In some embodiments, the system 200 may be implemented in a variety of computing systems, such as laptop computers, notebooks, hand-held devices, workstations, mainframe computers, and the like.
[035] In an embodiment, the system 200 includes one or more processors 204, communication interface device(s) or input/output (I/O) interface(s) 206, and one or more data storage devices or memory 202 operatively coupled to the one or more processors 204. The one or more processors 204 may be one or more software processing modules and/or hardware processors. In an embodiment, the hardware processors can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the processor(s) is configured to fetch and execute computer-readable instructions stored in the memory. In an embodiment, the system 200 can be implemented in a variety of computing systems, such as laptop computers, notebooks, hand-held devices, workstations, mainframe computers, servers, a network cloud and the like.
[036] The I/O interface device(s) 206 can include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like and can facilitate multiple communications within a wide variety of networks N/W and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. In an embodiment, the I/O interface device(s) can include one or more ports for connecting a number of devices to one another or to another server.
[037] The memory 202 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. In an embodiment a database 208 can be stored in the memory 202, wherein the database 208 may comprise, but are not limited to, a training knowledge repository, dataset, and the like. The memory 202 further comprises (or may further comprise) information pertaining to input(s)/output(s) of each step performed by the systems and methods of the present disclosure. In other words, input(s) fed at each step and output(s) generated at each step are comprised in the memory 202 and can be utilized in further processing and analysis.
[038] It is noted that the system 200 as illustrated and hereinafter described is merely illustrative of an apparatus that could benefit from embodiments of the present disclosure and, therefore, should not be taken to limit the scope of the present disclosure. It is noted that the system 200 may include fewer or more components than those depicted in FIG. 2.
[039] FIG. 3, with reference to FIGS, 1-2, illustrates a schematic block diagram representation 300 of a model representation computation process performed by the system 200 of FIG. 2 for computing a model representation of a machine learning model, in accordance with an embodiment of the present disclosure.
[040] As seen in FIG.3, a user provides user inputs ‘C’, dataset ‘D’, and model object ‘M’ to the system 200 using a user device, such as the user device 106. In an embodiment, the user inputs ‘C’ include, but are not limited to, a set of values that have been pre-filled as default values such as memory required, processor type, machine learning problem, model tags, one or more user choices, and one or more user recommended practices. The system 200, upon receiving the input, vectorizes the received C using a dictionary vectorizer to obtain vectorized user inputs. In an embodiment, the dictionary vectorizer is a python function developed for vectorizing user inputs. The dictionary vectorizer represents the user choices and the user recommended practices as a vector.
[041] Thereafter, the system 200 performs dataset feature extraction on the dataset ‘D’. In particular, the dataset features (hereinafter also referred as dataset profile summary or dataset statistics) is estimated based on the D using one or more predefined functions. The dataset features include, but is not limited to, class entropy, dataset ratio, log number of classes, class probability, log number of instances etc. In an embodiment, autosklearn is used for performing dataset feature extraction.
[042] In at least one example embodiment, the system 200 estimates floating-point operations per second (FLOPS) performed for a single pass through the model object M using a customized flop estimation technique. In an embodiment, general libraries that are available for flop calculation for a single framework are modified to obtain the customized flop estimation technique. It should be noted that FLOPS estimation plays an important role in the model representation computation process as it directly impacts the runtime of the model object ‘M’.
[043] Further, the system 200 creates a model configuration file using the model object M. In an embodiment, the model configuration file includes an architectural information of the model object. The architectural information includes details of types of one or more layers used in the model object, and details of one or more layer parameters used in each layer of the one or more layers of the model object.
[044] Once the model configuration file is available, the system 200 parses the model configuration file using a parsing technique to extract a layer wise information of each layer of the one or more layers of the model object. Examples of the parsing technique includes, but are not limited to, abstract syntax tree (AST), regular expression (RegEx). Thereafter, the system 200 generates one or more vector embeddings from the model configuration file. The process of generating vector embeddings from the model configuration file is explained in detail with reference to FIG. 4.
[045] The dataset features, the estimated FLOPS, the extracted layer wise information and the generated one or more vector embedding are then combined by the system 200 to generate a model training signature for the model object.
[046] FIGS. 4A and 4B, with reference to FIGS. 1 through 3, collectively, represent an exemplary flow diagram of a method 400 for computing a model representation of the machine learning development process, in accordance with an embodiment of the present disclosure. The method 400 may use the system 200 of FIG. 2 and MRCS 102 of FIG. 1 for execution. In an embodiment, the system 200 comprises one or more data storage devices or the memory 202 operatively coupled to the one or more hardware processors 204 and is configured to store instructions for execution of steps of the method 400 by the one or more hardware processors 204. The sequence of steps of the flow diagram may not be necessarily executed in the same order as they are presented. Further, one or more steps may be grouped together and performed in form of a single step, or one step may have several sub-steps that may be performed in parallel or in sequential manner. The steps of the method of the present disclosure will now be explained with reference to the components of the system 200 as depicted in FIG. 2, and the MRCS 102 of FIG. 1.
[047] At step 402 of the method of the present disclosure, the one or more hardware processors 204 of the system 200 receive a model object, a dataset and one or more user inputs from a user device (e.g., the user device 106). The model object is a machine learning model for which the model representation is to be computed. The dataset represents data used for training the machine learning model, and the one or more user inputs includes one or more of a memory requirement, a processor type, a machine learning problem, the one or more model tags, and the one or more user recommended practices.
[048] At step 404 of the method of the present disclosure, the one or more hardware processors 204 of the system 200 estimate the dataset profile summary using one or more predefined functions. In particular, the dataset features are extracted from the dataset at this step.
[049] At step 406 of the method of the present disclosure, the one or more hardware processors 204 of the system 200 estimate a floating-point operations per second (FLOPS) performed for each pass of one or more passes performed for the model object using the customized flop estimation technique.
[050] At step 408 of the method of the present disclosure, the one or more hardware processors 204 of the system 200 create the model configuration file using the model object. The model configuration file includes the architectural information of the model object which includes details of types of one or more layers used in the model object, and details of one or more layer parameters used in each layer of the one or more layers of the model object.
[051] At step 410 of the method of the present disclosure, the one or more hardware processors 204 of the system 200 parse the created model configuration file using the parsing technique to extract a layer wise information of each layer of the one or more layers of the model object. In one example embodiment, the system 200 uses AST to parse the model configuration file for extracting the layer wise information. The layer wise information includes a type of a layer and a count of the layer.
[052] At step 412 of the method of the present disclosure, the one or more hardware processors 204 of the system 200 generate one or more vector embeddings for the model configuration file.
[053] In an embodiment, for generating the vector embeddings, the system 200 first creates a superword token for each layer present in the model configuration file. The superword token of each layer includes the layer wise information and a hyperparameter information of a respective layer. Thereafter, the system builds a vocabulary using the created one or more superword tokens. The vocabulary represents a corpus of superword tokens that is used to fine-tune a pre-trained language model encoder to obtain a fine-tuned pre-trained language model encoder. In particular, the vocabulary is the large corpus of superwords on which the pre-trained language model encoder is fine tuned. In an embodiment, a global vector (GloVe) encoder is used as the pre-trained language model encoder which is fine-tuned on superword tokens to ensure semantic information is maintained. It should be noted that any exiting language model encoder can be used as the pre-trained language model encoder.
[054] Once the fine-tuned pre-trained language model encoder is available, the system 200 converts the created one or more superword tokens into one or more embeddings using the fine-tuned pre-trained language model encoder. Then, the one or more embeddings are aggregated using a sentence embedding encoder to generate the one or more vector embeddings.
[055] At step 414 of the method of the present disclosure, the one or more hardware processors 204 of the system 200 combine the dataset profile summary, the estimated FLOPS, the extracted layer wise information and the generated one or more vector embedding to generate the model training signature. The model training signature is a mathematical representation of the machine learning model, the dataset and computational information associated with the machine learning model.
[056] At step 416 of the method of the present disclosure, the one or more hardware processors 204 of the system 200 capture hardware utilization metrics and a hardware information from the model object and the dataset using the experiment information tracking tool.
[057] In an embodiment, the system 200 also receives a user dictionary along with other inputs from the user device. The system 200 vectorizes the one or more user inputs based on the received user dictionary using a dictionary vectorizer to obtain vectorized user inputs.
[058] At step 418 of the method of the present disclosure, the one or more hardware processors 204 of the system 200 encode the vectorized one or more user inputs, the model training signature, the hardware utilization metrics, and the hardware information to compute the model representation of the machine learning model. The model representation represents one or more properties, such as metadata, metrics, resources, and architectural information of the machine learning model, the dataset, the computational information, and hardware related information.
[059] Finally, the system 200 performs a vector conversion of the computed model representation and then stores the vector conversion of the model representation in the training knowledge repository.
[060] FIG. 5, with reference to FIGS, 1 to 4A-4B, illustrates a schematic representation depicting process of building the training knowledge repository, in accordance with an embodiment of the present disclosure
[061] As seen in FIG. 5, once the model training signature is available, the system 200 performs warm start on a hardware used for training the model object M to capture runtime details of the model object M using an experiment information tracking tool. Examples of the experiment information tracking tool include, but are not limited to, OpSense, mlflow and the like. In an embodiment, the runtime details include hardware utilization metrics and a hardware information. In at least one example embodiment, the hardware utilization metrics include a central processing unit utilization, a graphics processing unit utilization, and a random access memory utilization. In an embodiment, the hardware information includes information of the underlying hardware on which the warm start is performed.
[062] Finally, the system 200 encodes the received one or more user inputs, the model training signature, the hardware utilization metrics, and the hardware information to compute a model representation of the model object M. In an embodiment, the generated encoding/model representation is stored in a training knowledge repository. In at least one example embodiment, the training knowledge repository is continuously updated with the vectorized encodings from future trainings so that the stored information can be used for downstream tasks.
[063] The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
[064] As discussed earlier, growth in use of machine learning in traditional software has led to the rise of increasing need for improved ways of tracking experiment since a machine learning component is not static and has a dynamic nature due to its hyperparameters. Available techniques focus on using experiment tracking by storing the different hyperparameters and their outcome alongside conventional metrics that are tracked. This leads to greater visibility into the ML models by storing historical experiment information. However, with increasing use of more complex architectures, this does not capture the entire picture as other important factors, such as weights, architecture, and the data on which the model is trained on which defines the learning capabilities of the model are missed. So, to overcome the disadvantages, embodiments of the present disclosure provide the method and the system for computing a model representation of the machine learning development process. More specifically, the system and the method track the properties of the training process through numerical process i.e., the properties are represented in form of vectors, thereby ensuring easy storage and retrieval of properties in relational databases as the properties can be analyzed using general structured query language (SQL) queries. Further, the easy storage and retrieval also increases the usability of metadata for the downstream tasks. Additionally, the system also captures the semantic information of the model architecture along with other usual information, thereby improving accuracy of the tracked information which further helps in saving time and cost as number of experiments performed will drastically reduce.
[065] It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g., any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g., hardware means like e.g., an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g., an ASIC and an FPGA, or at least one microprocessor and at least one memory with software processing components located therein. Thus, the means can include both hardware means, and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g., using a plurality of CPUs.
[066] The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various components described herein may be implemented in other components or combinations of other components. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
[067] The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
[068] Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
[069] It is intended that the disclosure and examples be considered as exemplary only, with a true scope of disclosed embodiments being indicated by the following claims.
, Claims:We Claim:
1. A processor implemented method, comprising:
receiving, by a model representation computation system (MRCS) via one or more hardware processors, a model object, a dataset and one or more user inputs from a user device, wherein the model object is a machine learning model, and wherein the dataset represents data used for training the machine learning model, and wherein the one or more user inputs comprises one or more of: a memory requirement, a processor type, a machine learning problem, one or more model tags, and one or more user recommended practices (402);
estimating, by the MRCS via the one or more hardware processors, a dataset profile summary using one or more predefined functions (404);
estimating, by the MRCS via the one or more hardware processors, a floating-point operations per second (FLOPS) performed for each pass of one or more passes performed for the model object using a customized flop estimation technique (406);
creating, by the MRCS via the one or more hardware processors, a model configuration file using the model object, wherein the model configuration file comprises an architectural information of the model object, and wherein the architectural information comprises details of types of one or more layers used in the model object, and details of one or more layer parameters used in each layer of the one or more layers of the model object (408);
parsing, by the MRCS via the one or more hardware processors, the model configuration file using a parsing technique to extract a layer wise information of each layer of the one or more layers of the model object, wherein the layer wise information comprises a type of a layer and a count of the layer (410);
generating, by the MRCS via the one or more hardware processors, one or more vector embeddings for the model configuration file (412);
combining, by the MRCS via the one or more hardware processors, the dataset profile summary, the estimated FLOPS, the extracted layer wise information and the generated one or more vector embedding to generate a model training signature, wherein the model training signature is a mathematical representation of the machine learning model, the dataset and computational information associated with the machine learning model (414);
capturing, by the MRCS via the one or more hardware processors, hardware utilization metrics and a hardware information from the model object and the dataset using a experiment information tracking tool (416); and
encoding, by the MRCS via the one or more hardware processors, the received one or more user inputs, the model training signature, the hardware utilization metrics and the hardware information to compute a model representation of the machine learning model, wherein the model representation represents one or more properties of the machine learning model, the dataset, the computational information and hardware related information (418).

2. The processor implemented method as claimed in claim 1, wherein the experiment information tracking tool runs the machine learning model for one or more iterations for capturing the hardware utilization metrics, and wherein the hardware utilization metrics comprises one or more of: a central processing unit utilization, a graphics processing unit utilization, and a random access memory utilization.

3. The processor implemented method as claimed in claim 1, comprising:
performing, by the MRCS via the one or more hardware processors, a vector conversion of the model representation; and
storing, by the MRCS via the one or more hardware processors, the vector conversion of the model representation in a training knowledge repository.

4. The processor implemented method as claimed in claim 1, wherein the step of generating, by the MRCS via the one or more hardware processors, the one or more vector embeddings for the model configuration file comprises:
creating, by the MRCS via the one or more hardware processors, a superword token for each layer present in the model configuration file, wherein the superword token of each layer comprises the layer wise information and a hyperparameter information of a respective layer;
building, by the MRCS via the one or more hardware processors, a vocabulary using created one or more superword tokens, wherein the vocabulary represents a corpus of superword tokens that is used to fine-tune a pre-trained language model encoder to obtain a fine-tuned pre-trained language model encoder;
converting, by the MRCS via the one or more hardware processors, the created one or more superword tokens into one or more embeddings using the fine-tuned pre-trained language model encoder; and
generating, by the MRCS via the one or more hardware processors, the one or more vector embeddings by aggregating the one or more embeddings using a sentence embedding encoder, wherein the sentence embedding encoder is configured to capture ordering of the model object in the generated one or more vector embeddings.

5. The processor implemented method as claimed in claim 4, wherein the step of encoding, by the MRCS via the one or more hardware processors, the received one or more user inputs, the model training signature, the hardware utilization metrics and the hardware information to compute the model representation of the machine learning model is preceded by:
receiving, by the MRCS via the one or more hardware processors, a user dictionary;
vectorizing, by the MRCS via the one or more hardware processors, the one or more user inputs based on the user dictionary using a dictionary vectorizer to obtain vectorized user inputs.

6. A model representation computation system (200), comprising:
a memory (202) storing instructions;
one or more communication interfaces (206); and
one or more hardware processors (204) coupled to the memory (202) via the one or more communication interfaces (206), wherein the one or more hardware processors (204) are configured by the instructions to:
receive a model object, a dataset and one or more user inputs from a user device, wherein the model object is a machine learning model, and wherein the dataset represents data used for training the machine learning model, and wherein the one or more user inputs comprises one or more of: memory requirement, processor type, machine learning problem, one or more model tags, and one or more user recommended practices;
estimate a dataset profile summary using one or more predefined functions ;
estimate a floating-point operations per second (FLOPS) performed for each pass of one or more passes performed for the model object using a customized flop estimation technique;
create a model configuration file using the model object, wherein the model configuration file comprises an architectural information of the model object, and wherein the architectural information comprises details of types of one or more layers used in the model object, and details of one or more layer parameters used in each layer of the one or more layers of the model object;
parse the model configuration file using a parsing technique to extract a layer wise information of each layer of the one or more layers of the model object, wherein the layer wise information comprises a type of a layer and a count of the layer;
generate one or more vector embeddings for the model configuration file;
combine the dataset profile summary, the estimated FLOPS, the extracted layer wise information and the generated one or more vector embedding to generate a model training signature, wherein the model training signature is a mathematical representation of the machine learning model, the dataset and computational information associated with the machine learning model;
capture hardware utilization metrics and a hardware information from the model object and the dataset using an experiment information tracking tool; and
encode the received one or more user inputs, the model training signature, the hardware utilization metrics and the hardware information to compute a model representation of the machine learning model, wherein the model representation represents one or more properties of the machine learning model, the dataset, the computational information and hardware related information.

7. The system as claimed in claim 6, wherein the experiment information tracking tool runs the machine learning model for one or more iterations for capturing the hardware utilization metrics, and wherein the hardware utilization metrics comprises one or more of: a central processing unit utilization, a graphics processing unit utilization, and a random access memory utilization.

8. The system as claimed in claim 6, wherein the one or more hardware processors (204) are configured by the instructions to:
perform a vector conversion of the model representation; and
store the vector conversion of the model representation in a training knowledge repository.

9. The system as claimed in claim 6, wherein for generating the one or more vector embeddings for the model configuration file, the one or more hardware processors (204) are configured by the instructions to:
create a superword token for each layer present in the model configuration file, wherein the superword token of each layer comprises the layer wise information and a hyperparameter information of a respective layer;
build a vocabulary using created one or more superword tokens, wherein the vocabulary represents a corpus of superword tokens that is used to fine-tune a pre-trained language model encoder to obtain a fine-tuned pre-trained language model encoder;
convert the created one or more superword tokens into one or more embeddings using the fine-tuned pre-trained language model encoder; and
generate the one or more vector embeddings by aggregating the one or more embeddings using a sentence embedding encoder, wherein the sentence embedding encoder is configured to capture ordering of the model object in the generated one or more vector embeddings.

10. The system as claimed in claim 9, wherein prior to encoding the received one or more user inputs, the model training signature, the hardware utilization metrics and the hardware information to compute the model representation of the machine learning model, the one or more hardware processors (204) are configured by the instructions to:
receive a user dictionary; and
vectorize the one or more user inputs based on the user dictionary using a dictionary vectorizer to obtain vectorized user inputs.

Dated this 27th Day of April 2023

Tata Consultancy Services Limited
By their Agent & Attorney

(Adheesh Nargolkar)
of Khaitan & Co
Reg No IN-PA-1086

Documents

Application Documents

#	Name	Date
1	202321030411-STATEMENT OF UNDERTAKING (FORM 3) [27-04-2023(online)].pdf	2023-04-27
2	202321030411-REQUEST FOR EXAMINATION (FORM-18) [27-04-2023(online)].pdf	2023-04-27
3	202321030411-Proof of Right [27-04-2023(online)].pdf	2023-04-27
4	202321030411-FORM 18 [27-04-2023(online)].pdf	2023-04-27
5	202321030411-FORM 1 [27-04-2023(online)].pdf	2023-04-27
6	202321030411-FIGURE OF ABSTRACT [27-04-2023(online)].pdf	2023-04-27
7	202321030411-DRAWINGS [27-04-2023(online)].pdf	2023-04-27
8	202321030411-DECLARATION OF INVENTORSHIP (FORM 5) [27-04-2023(online)].pdf	2023-04-27
9	202321030411-COMPLETE SPECIFICATION [27-04-2023(online)].pdf	2023-04-27
10	202321030411-FORM-26 [23-06-2023(online)].pdf	2023-06-23
11	Abstract.1.jpg	2023-11-29