Abstract: ABSTRACT SYSTEM AND METHOD FOR TRAINING ONE OR MORE MODELS The present disclosure relates to a method for training a model by processors (202). The method includes retrieving data from at least one of, an existing data source or a new data source. Further, the method includes categorizing the data as at least one of, historic data and current data. Further, the method includes automatically pre-processing the historic data and the current data. Further, the method includes autotuning one or more hyperparameters for the pre-processed historic data and the current data based on one or more parameters. Further, the method includes training the model with the historic data and the current data. The historic data and the current data are pre-processed and autotuned with the one or more hyperparameters. Further, the method includes notifying, at least one of a user, or one of, a service, a microservice, a component and an application, a status of training the model. Ref. FIG. 5
DESC: FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENTS RULES, 2003
COMPLETE SPECIFICATION
(See section 10 and rule 13)
1. TITLE OF THE INVENTION
SYSTEM AND METHOD FOR TRAINING ONE OR MORE MODELS
2. APPLICANT(S)
NAME NATIONALITY ADDRESS
JIO PLATFORMS LIMITED INDIAN OFFICE-101, SAFFRON, NR. CENTRE POINT, PANCHWATI 5 RASTA, AMBAWADI, AHMEDABAD 380006, GUJARAT, INDIA
3.PREAMBLE TO THE DESCRIPTION
THE FOLLOWING SPECIFICATION PARTICULARLY DESCRIBES THE NATURE OF THIS INVENTION AND THE MANNER IN WHICH IT IS TO BE PERFORMED.
FIELD OF THE INVENTION
[0001] The present invention relates to the field of network data analytics for predictive network management and, more specifically, to a system and a method thereof to auto-select hyper-parameters for a machine learning (ML) model training based on parameters like data size, model size and model weight.
BACKGROUND OF THE INVENTION
[0002] With the increase in number of users, the network service providers have been upgrading their network service quality so as to keep pace with such high demand. With advancement of technology, there is a demand for a telecommunication service to induce up-to-date features into the scope of provision. To enhance user experience and implement advanced monitoring mechanisms, prediction methodologies are being incorporated in a network management service. An advanced prediction system integrated with an Artificial intelligence (AI)/Machine learning (ML) system excels in executing a wide array of algorithms and predictive tasks/forecasting tasks. The advanced prediction system is underpinned by the capabilities of Large Language Models (LLMs). Its primary mission centers around the comprehensive analysis of both network data and operational data, capitalizing on the advanced techniques of machine learning (ML) to glean profound insights.
[0003] There are various data processing steps for preparing data input suitable for a machine learning model training. The steps may involve data standardization, hyper-parameter selection etc. The user sets the hyper-parameter manually. For setting the hyper-parameter manually, the user must have understanding of which hyper-parameter is required for the model training. Understanding and tuning the numerous hyper-parameters involved in the model training can be challenging and difficult for the users. It typically requires a deep understanding of machine learning principles and the intricacies of hyper-parameter tuning. There is a need for an automated mechanism that can perform selection of hyper-parameters.
[0004] In view of the above, there is a need for a system and method thereof to auto-select hyper-parameters for ML model training.
SUMMARY OF THE INVENTION
[0005] One or more embodiments of the present disclosure provide a system and a method for training one or more models.
[0006] In one aspect of the present invention, the method for training the one or more models are disclosed. The method includes retrieving, by one or more processors, data from at least one of: an existing data source or a new data source. Further, the method includes categorizing, by the one or more processors, the data as at least one of: historic data and current data. Further, the method includes automatically pre-processing, by the one or more processors, the historic data and the current data. Further, the method includes autotuning, by the one or more processors, one or more hyperparameters for the pre-processed historic data and the current data based on one or more parameters. Further, the method includes training, by the one or more processors, the model with the historic data and the current data. The historic data and the current data are pre-processed and autotuned with the one or more hyperparameters. Further, the method includes notifying, by the one or more processors, at least one of, a user or one of, a service, a microservice, a component and an application, a status of training the model.
[0007] In an embodiment, retrieving, data from the existing data source includes selecting the existing data source via a user interface by a user or a Command Line Interface (CLI), or one of, a service, microservice, component and an application transmits a command to the one or more processors to select the existing data source.
[0008] In an embodiment, retrieving, the data from the new data source, includes creating, by the one or more processors, the new data source when the user selects one or more data sources from a list of data sources via the user interface or a Command Line Interface (CLI), or one of, the service, the microservice, the component and the application transmits a command to the one or more processors to select the one or more data sources, pulling, by the one or more processors, the data from one or more data sources and storing the data at the new data source, and retrieving, by the one or more processors, the data from the new data source.
[0009] In an embodiment, the data is categorized as at least one of, the historic data and the current data based on a time of generation of the data.
[0010] In an embodiment, the one or more parameters include at least one of: size of the data, size of the model and weight of the model.
[0011] In an embodiment, further, the method includes creating, by the one or more processors, a training name and a model name for training the model based on receiving the training name and the user selecting the model’s name from a list of model names via the user interface. Further, the method includes setting, by the one or more processors, a version based on the created training name. Further, the method includes allocating, by the one or more processors, one or more network elements for training the model based on one of, the user selecting the one or more network elements for training the model via the user interface or one of, the service, the microservice, the application or the component selecting the one or more network elements based on transmitting a request to the one or more processors.
[0012] In an embodiment, autotuning, one or more hyperparameters for the pre-processed historic data and the current data, includes the steps of: identifying, by the one or more processors, the one or more parameters from the historic data and the current data, selecting, by the one or more processors, from one or more sources, the one or more hyperparameters based on the identified one or more parameters, and autotuning, by the one or more processors, the one or more hyperparameters selected for the historic data and the current data based on the selected one or more hyperparameters. The one or more sources includes the one or more hyperparameters which are at least one of, used previously representing similar types of the one or more parameters, or pre-defined for the one or mor parameters.
[0013] In an embodiment, the status of notifying the user of training the model, includes at least one of, status of completion of training the model utilizing one or more identifiers including at least one of, training name, model name, version, type and/or name of the data source used and one or more actions including at least one of, retrain or delete the trained model, The user is notified of training the model by at least one of, alerts or notifications.
[0014] In an embodiment, the method includes retraining, by the one or more processors, the one or more trained models based on receiving configurations from at least one of, the user via the user interface or the CLI, or receiving configurations from one of, the service, the microservice, the component and the application. Further, the method includes collecting, by the one or more processors, results of retraining the one or more trained models. Further, the method includes notifying, by the one or more processors, the results of retraining to the user by displaying the results on the user interface, or notifying the results to one of, the service, the microservice, the component and the application. The one or more models are retrained using different sets of the historic and current data and one or more hyperparameters which are autotuned, the different sets of the historic and the current data are allocated by at least one of, the user or one of, the service, the microservice, the component and the application.
[0015] In an embodiment, the method further includes, transmitting, by the one or more processors, an acknowledgement to at least one of, the service, the microservice, the component and the application pertaining to the status of the training of the one or more models. In one aspect of the present invention, the system for training the one or more models is disclosed. The system includes a retrieving unit, a categorizing unit, a pre-processing unit, a tuning unit, a training unit and a notifying unit. The retrieving unit is configured to retrieve, data from at least one of, an existing data source or a new data source. The categorizing unit is configured to categorize the data as at least one of, historic data and current data. The pre-processing unit is configured to automatically pre-process, the historic data and the current data. The tuning unit is configured to autotune, one or more hyperparameters for the pre-processed historic data and the current data based on one or more parameters. The training unit is configured to train the model with the historic data and the current data. The historic data and the current data are pre-processed and autotuned with the one or more hyperparameters. The notifying unit is configured to notify, at least one of a user or one of, a service, a microservice, a component and an application, a status of training the one or more models.
[0016] In one aspect, A User Equipment (UE) is provided. The UE, comprising, one or more primary processors, communicatively coupled to one or more processors in a network. The one or more primary processors are coupled with a memory stores instructions, when executed by the one or more primary processors cause the UE to, transmit, a command to select at least one of, the existing data sources or the one or more data sources to one or more processor, wherein the one or more processors is configured to perform the steps of claim 1.
[0017] In one aspect of the present invention, a non-transitory computer-readable medium having stored thereon computer-readable instructions is provided. The computer-readable instructions causes the processor to retrieve, data from at least one of: an existing data source or a new data source. Further, the processor categorizes the data as at least one of, historic data and current data. Further, the processor pre-processes the historic data and the current data. Further, the processor autotunes one or more hyperparameters for the pre-processed historic data and the current data based on one or more parameters. Further, the processor trains the model with the historic data and the current data. The historic data and the current data are pre-processed and autotuned with the one or more hyperparameters. Further, the processor notifies a user, a status of training the model.
[0018] Other features and aspects of this invention will be apparent from the following description and the accompanying drawings. The features and advantages described in this summary and in the following detailed description are not all-inclusive, and particularly, many additional features and advantages will be apparent to one of ordinary skill in the relevant art, in view of the drawings, specification, and claims hereof. Moreover, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes and may not have been selected to delineate or circumscribe the inventive subject matter, resort to the claims being necessary to determine such inventive subject matter.
BRIEF DESCRIPTION OF THE DRAWINGS
[0019] The accompanying drawings, which are incorporated herein, and constitute a part of this disclosure, illustrate exemplary embodiments of the disclosed methods and systems in which like reference numerals refer to the same parts throughout the different drawings. Components in the drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the present disclosure. Some drawings may indicate the components using block diagrams and may not represent the internal circuitry of each component. It will be appreciated by those skilled in the art that disclosure of such drawings includes disclosure of electrical components, electronic components or circuitry commonly used to implement such components.
[0020] FIG. 1 is an exemplary block diagram of an environment for training one or more models, according to various embodiments of the present disclosure.
[0021] FIG. 2 is a block diagram of a system of FIG. 1, according to various embodiments of the present disclosure.
[0022] FIG. 3 is an example schematic representation of the system of FIG. 1 in which various entities operations are explained, according to various embodiments of the present system.
[0023] FIG. 4 illustrates a system architecture for training the one or more models, according to various embodiments of the present system.
[0024] FIG. 5 is an exemplary flow diagram illustrating the method for training the one or more models, according to various embodiments of the present disclosure.
[0025] FIG. 6 is an example flow diagram illustrating the method for training the one or more models, according to various embodiments of the present disclosure.
[0026] Further, skilled artisans will appreciate that elements in the drawings are illustrated for simplicity and may not have necessarily been drawn to scale. For example, the flow charts illustrate the method in terms of the most prominent steps involved to help to improve understanding of aspects of the present invention. Furthermore, in terms of the construction of the device, one or more components of the device may have been represented in the drawings by conventional symbols, and the drawings may show only those specific details that are pertinent to understanding the embodiments of the present invention so as not to obscure the drawings with details that will be readily apparent to those of ordinary skill in the art having benefit of the description herein.
[0027] The foregoing shall be more apparent from the following detailed description of the invention.
DETAILED DESCRIPTION OF THE INVENTION
[0028] Some embodiments of the present disclosure, illustrating all its features, will now be discussed in detail. It must also be noted that as used herein and in the appended claims, the singular forms "a", "an" and "the" include plural references unless the context clearly dictates otherwise.
[0029] Various modifications to the embodiment will be readily apparent to those skilled in the art and the generic principles herein may be applied to other embodiments. However, one of ordinary skill in the art will readily recognize that the present disclosure including the definitions listed here below are not intended to be limited to the embodiments illustrated but is to be accorded the widest scope consistent with the principles and features described herein.
[0030] A person of ordinary skill in the art will readily ascertain that the illustrated steps detailed in the figures and here below are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope and spirit of the disclosed embodiments.
[0031] Before discussing example, embodiments in more detail, it is to be noted that the drawings are to be regarded as being schematic representations and elements that are not necessarily shown to scale. Rather, the various elements are represented such that their function and general purpose becomes apparent to a person skilled in the art. Any connection or coupling between functional blocks, devices, components, or other physical or functional units shown in the drawings or described herein may also be implemented by an indirect connection or coupling. A coupling between components may also be established over a wireless connection. Functional blocks may be implemented in hardware, firmware, software or a combination thereof.
[0032] Further, the flowcharts provided herein, describe the operations as sequential processes. Many of the operations may be performed in parallel, concurrently or simultaneously. In addition, the order of operations maybe re-arranged. The processes may be terminated when their operations are completed, but may also have additional steps not included in the figured. It should be noted, that in some alternative implementations, the functions/acts/ steps noted may occur out of the order noted in the figured. For example, two figures shown in succession may, in fact, be executed substantially concurrently, or may sometimes be executed in the reverse order, depending upon the functionality/acts involved.
[0033] Further, the terms first, second etc… may be used herein to describe various elements, components, regions, layers and/or sections, it should be understood that these elements, components, regions, layers and/or sections should not be limited by these terms. These terms are used only to distinguish one element, component, region, layer or section from another region, layer, or a section. Thus, a first element, component, region layer, or section discussed below could be termed a second element, component, region, layer, or section without departing form the scope of the example embodiments.
[0034] Spatial and functional relationships between elements (for example, between modules) are described using various terms, including “connected,” “engaged,” “interfaced,” and “coupled.” Unless explicitly described as being “direct,” when a relationship between first and second elements is described in the description below, that relationship encompasses a direct relationship where no other intervening elements are present between the first and second elements, and also an indirect relationship where one or more intervening elements are present (either spatially or functionally) between the first and second elements. In contrast, when an element is referred to as being "directly” connected, engaged, interfaced, or coupled to another element, there are no intervening elements present. Other words used to describe the relationship between elements should be interpreted in a like fashion (e.g., "between," versus "directly between," "adjacent," versus "directly adjacent," etc.).
[0035] The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting. Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which example embodiments belong. It will be further understood that terms, e.g., those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
[0036] As used herein, the singular forms “a,” “an,” and “the,” are intended to include the plural forms as well, unless the context clearly indicates otherwise. As used herein, the terms “and/or” and “at least one of” include any and all combinations of one or more of the associated listed items. It will be further understood that the terms “comprises,” “comprising,” “includes,” and/or “including,” when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
[0037] Unless specifically stated otherwise, or as is apparent from the description, terms such as “processing” or “computing” or “calculating” or “determining” of “displaying” or the like, refer to the action and processes of a computer system, or similar electronic computing device/hardware, that manipulates and transforms data represented as physical, electronic quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.
[0038] Various embodiments of the present invention provide a system and a method to auto-select hyper-parameters for ML model training based on data size and model weight size. The system is also configured for retraining of existing trained model with the processed existing data source or new data source so as to improve accuracy of the trained model.
[0039] To simplify the selection of hyper-parameter, the system deploys LLM as service which auto-selects hyper-parameters for model training based on data size and model weight size. The LLM as a service utilizes AI/ML techniques to automate the hyper-parameter tuning during model training. Hence, the users no longer need to acquire extensive knowledge of machine learning or worry about the complexities of hyper-parameter tuning. The system having LLM as a service automatically evaluates the user's data, identifies the most suitable hyper-parameters to the algorithm that is most likely to yield the best results. This provides user flexibility to skip the step of manual tuning hyper-parameters and focus on obtained results. This approach enhances the efficiency and effectiveness of model training, making LLM as a service accessible to a wider range of users.
[0040] FIG. 1 illustrates an exemplary block diagram of an environment (100) for training a model (e.g., ML model, AI model, supervised learning, deep learning, LLM or the like), according to various embodiments of the present disclosure. The environment (100) comprises a plurality of user equipment’s (UEs) (102-1, 102-2, ……,102-n). The at least one UE (102-n) from the plurality of the UEs (102-1, 102-2, ……102-n) is configured to connect to a system (108) via a communication network (106). Hereafter, label for the plurality of UEs or one or more UEs is 102.
[0041] In accordance with yet another aspect of the exemplary embodiment, the plurality of UEs (102) may be a wireless device or a communication device that may be a part of the system (108). The wireless device or the UE (102) may include, but are not limited to, a handheld wireless communication device (e.g., a mobile phone, a smart phone, a phablet device, and so on), a wearable computer device (e.g., a head-mounted display computer device, a head-mounted camera device, a wristwatch, a computer device, and so on), a laptop computer, a tablet computer, or another type of portable computer, a media playing device, a portable gaming system, and/or any other type of computer device with wireless communication or Voice Over Internet Protocol (VoIP) capabilities. In an embodiment, the UEs (102) may include, but are not limited to, any electrical, electronic, electro-mechanical or an equipment or a combination of one or more of the above devices such as virtual reality (VR) devices, augmented reality (AR) devices, laptop, a general-purpose computer, desktop, personal digital assistant, tablet computer, mainframe computer, or any other computing device, where the computing device may include one or more in-built or externally coupled accessories including, but not limited to, a visual aid device such as camera, audio aid, a microphone, a keyboard, input devices for receiving input from a user such as touch pad, touch enabled screen, electronic pen and the like. It may be appreciated that the UEs (102) may not be restricted to the mentioned devices and various other devices may be used. A person skilled in the art will appreciate that the plurality of UEs (102) may include a fixed landline, and a landline with assigned extension within the communication network (106).
[0042] The communication network (106), may use one or more communication interfaces/protocols such as, for example, Voice Over Internet Protocol (VoIP), 802.11 (Wi-Fi), 802.15 (including Bluetooth™), 802.16 (Wi-Max), 802.22, Cellular standards such as Code Division Multiple Access (CDMA), CDMA2000, Wideband CDMA (WCDMA), Radio Frequency Identification (e.g., RFID), Infrared, laser, Near Field Magnetics, etc.
[0043] The communication network (106) includes, by way of example but not limitation, one or more of a wireless network, a wired network, an internet, an intranet, a public network, a private network, a packet-switched network, a circuit-switched network, an ad hoc network, an infrastructure network, a Public-Switched Telephone Network (PSTN), a cable network, a cellular network, a satellite network, a fiber optic network, or some combination thereof. The communication network (106) may include, but is not limited to, a Third Generation (3G) network, a Fourth Generation (4G) network, a Fifth Generation (5G) network, a Sixth Generation (6G) network, a New Radio (NR) network, a Narrow Band Internet of Things (NB-IoT) network, an Open Radio Access Network (O-RAN), and the like.
[0044] The communication network (106) may also include, by way of example but not limitation, at least a portion of one or more networks having one or more nodes that transmit, receive, forward, generate, buffer, store, route, switch, process, or a combination thereof, etc. one or more messages, packets, signals, waves, voltage or current levels, some combination thereof, or so forth. The communication network (106) may also include, by way of example but not limitation, one or more of a wireless network, a wired network, an internet, an intranet, a public network, a private network, a packet-switched network, a circuit-switched network, an ad hoc network, an infrastructure network, a Public-Switched Telephone Network (PSTN), a cable network, a cellular network, a satellite network, a fiber optic network, a VOIP or some combination thereof.
[0045] One or more network elements can be, for example, but not limited to a base station that is located in the fixed or stationary part of the communication network (106). The base station may correspond to a remote radio head, a transmission point, an access point or access node, a macro cell, a small cell, a micro cell, a femto cell, a metro cell. The base station enables transmission of radio signals to the UE (102) or a mobile transceiver. Such a radio signal may comply with radio signals as, for example, standardized by a 3rd Generation Partnership Project (3GPP) or, generally, in line with one or more of the above listed systems. Thus, a base station may correspond to a NodeB, an eNodeB, a Base Transceiver Station (BTS), an access point, a remote radio head, a transmission point, which may be further divided into a remote unit and a central unit. The 3GPP specifications cover cellular telecommunications technologies, including radio access, core network, and service capabilities, which provide a complete system description for mobile telecommunications.
[0046] The system (108) is communicatively coupled to a server (104) via the communication network (106). The server (104) can be, for example, but not limited to a standalone server, a server blade, a server rack, an application server, a bank of servers, a business telephony application server (BTAS), a server farm, a cloud server, an edge server, home server, a virtualized server, one or more processors executing code to function as a server, or the like. In an implementation, the server (104) may operate at various entities or a single entity (include, but is not limited to, a vendor side, a service provider side, a network operator side, a company side, an organization side, a university side, a lab facility side, a business enterprise side, a defense facility side, or any other facility) that provides service.
[0047] The environment (100) further includes the system (108) communicably coupled to the server (e.g., remote server or the like) (104) and each UE of the plurality of UEs (102) via the communication network (106). The remote server (104) is configured to execute the requests in the communication network (106).
[0048] The system (108) is adapted to be embedded within the remote server (104) or is embedded as an individual entity. The system (108) is designed to provide a centralized and unified view of data and facilitate efficient business operations. The system (108) is authorized to access to update/create/delete one or more parameters of their relationship between the requests for training the model, which gets reflected in real-time independent of the complexity of network.
[0049] In another embodiment, the system (108) may include an enterprise provisioning server (for example), which may connect with the remote server (104). The enterprise provisioning server provides flexibility for enterprises, ecommerce, finance to update/create/delete information related to the requests for the training the model in real time as per their business needs.
[0050] The system (108) may include, by way of example but not limitation, one or more of a standalone server, a server blade, a server rack, a bank of servers, a business telephony application server (BTAS), a server farm, hardware supporting a part of a cloud service or system, a home server, hardware running a virtualized server, one or more processors executing code to function as a server, one or more machines performing server-side functionality as described herein, at least a portion of any of the above, some combination thereof. In an implementation, system (108) may operate at various entities or single entity (for example include, but is not limited to, a vendor side, service provider side, a network operator side, a company side, an organization side, a university side, a lab facility side, a business enterprise side, ecommerce side, finance side, a defense facility side, or any other facility) that provides service.
[0051] However, for the purpose of description, the system (108) is described as an integral part of the remote server (104), without deviating from the scope of the present disclosure. Operational and construction features of the system (108) will be explained in detail with respect to the following figures.
[0052] FIG. 2 illustrates a block diagram of the system (108) provided for training the model (e.g., AI model, ML model (such as LLMs), or the like), according to one or more embodiments of the present invention. As per the illustrated embodiment, the system (108) includes the one or more processors (202), the memory (204), an input/output interface unit (206), a display (208), an input device (210), and the database (214). Further the system (108) may comprise one or more processors (202). The one or more processors (202), hereinafter referred to as the processor (202) may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, single board computers, and/or any devices that manipulate signals based on operational instructions. As per the illustrated embodiment, the system (108) includes one processor. However, it is to be noted that the system (108) may include multiple processors as per the requirement and without deviating from the scope of the present disclosure.
[0053] An information related to the trained model may be provided or stored in the memory (204) of the system (108). Among other capabilities, the processor (202) is configured to fetch and execute computer-readable instructions stored in the memory (204). The memory (204) may be configured to store one or more computer-readable instructions or routines in a non-transitory computer-readable storage medium, which may be fetched and executed to create or share data packets over a network service. The memory (204) may include any non-transitory storage device including, for example, volatile memory such as RAM, or non-volatile memory such as disk memory, EPROMs, FLASH memory, unalterable memory, and the like.
[0054] The memory (204) may comprise any non-transitory storage device including, for example, volatile memory such as Random-Access Memory (RAM), or non-volatile memory such as Electrically Erasable Programmable Read-only Memory (EPROM), flash memory, and the like. In an embodiment, the system (108) may include an interface(s). The interface(s) may comprise a variety of interfaces, for example, interfaces for data input and output devices, referred to as input/output (I/O) devices, storage devices, and the like. The interface(s) may facilitate communication for the system. The interface(s) may also provide a communication pathway for one or more components of the system. Examples of such components include, but are not limited to, processing unit/engine(s) and the database (214). The processing unit/engine(s) may be implemented as a combination of hardware and programming (for example, programmable instructions) to implement one or more functionalities of the processing engine(s).
[0055] The information related to the trained model may further be configured to render on the user interface (206). The user interface (206) may include functionality similar to at least a portion of functionality implemented by one or more computer system interfaces such as those described herein and/or generally known to one having ordinary skill in the art. The user interface (206) may be rendered on the display (208), implemented using Liquid Crystal Display (LCD) display technology, Organic Light-Emitting Diode (OLED) display technology, and/or other types of conventional display technology. The display (208) may be integrated within the system (108) or connected externally. Further the input device(s) (210) may include, but not limited to, keyboard, buttons, scroll wheels, cursors, touchscreen sensors, audio command interfaces, magnetic strip reader, optical scanner, etc.
[0056] The database (214) may be communicably connected to the processor (202) and the memory (204). The database (214) may be configured to store and retrieve the request pertaining to features, or services or workflow of the system (108), access rights, attributes, approved list, and authentication data provided by an administrator. In another embodiment, the database (214) may be outside the system (108) and communicated through a wired medium and a wireless medium.
[0057] Further, the processor (202), in an embodiment, may be implemented as a combination of hardware and programming (for example, programmable instructions) to implement one or more functionalities of the processor (202). In the examples described herein, such combinations of hardware and programming may be implemented in several different ways. For example, the programming for the processor (202) may be processor-executable instructions stored on a non-transitory machine-readable storage medium and the hardware for the processor (202) may comprise a processing resource (for example, one or more processors), to execute such instructions. In the present examples, the memory (204) may store instructions that, when executed by the processing resource, implement the processor (202). In such examples, the system (108) may comprise the memory (204) storing the instructions and the processing resource to execute the instructions, or the memory (204) may be separate but accessible to the system (108) and the processing resource. In other examples, the processor (202) may be implemented by an electronic circuitry.
[0058] In order for the system (108) to train the model, the processor (202) includes a retrieving unit (216), a categorizing unit (218), a pre-processing unit (220), a tuning unit (222), a training unit (224), a notifying unit (226), a creating unit (228), a version setting unit (230), and an allocating unit (232). The retrieving unit (216), the categorizing unit (218), the pre-processing unit (220), the tuning unit (222), the training unit (224), the notifying unit (226), the creating unit (228), the version setting unit (230), and the allocating unit (232) may be implemented as a combination of hardware and programming (for example, programmable instructions) to implement one or more functionalities of the processor (202). In the examples described herein, such combinations of hardware and programming may be implemented in several different ways. For example, the programming for the processor (202) may be processor-executable instructions stored on a non-transitory machine-readable storage medium and the hardware for the processor (202) may comprise a processing resource (for example, one or more processors), to execute such instructions. In the present examples, the memory (204) may store instructions that, when executed by the processing resource, implement the processor. In such examples, the system (108) may comprise the memory (204) storing the instructions and the processing resource to execute the instructions, or the memory (204) may be separate but accessible to the system (108) and the processing resource. In other examples, the processor (202) may be implemented by the electronic circuitry.
[0059] In order for the system (108) to train the model, the retrieving unit (216), the categorizing unit (218), the pre-processing unit (220), the tuning unit (222), the training unit (224), the notifying unit (226), the creating unit (228), the version setting unit (230), and the allocating unit (232) are communicably coupled to each other.
[0060] The retrieving unit (216) retrieves data from an existing data source or a new data source. The existing data source or one or more data sources may be, for example, but not limited to file input, source path, input stream, Hypertext Transfer Protocol version 2 (HTTP2), Hadoop Distributed File System (HDFS) and Network Attached Storage.
[0061] In an embodiment, the retrieving unit (216) retrieves the data from the existing data source by selecting the existing data source via the user interface (206) by the user. In an example, if the user selects the existing data source then, the user of the system (108) will select the data source from the list of existing data sources. In another embodiment, the retrieving unit (216) retrieves the data from the existing data source by selecting the existing data source via the Command Line Interface (CLI). In another embodiment, one of: a service, a microservice, a component and an application transmits a command to the system (108) to select the one of the existing data source or one or more data sources via at least one of, but not limited to, a Hypertext Transfer Protocol (HTTP) REST Application Programming Interface (API) based using JavaScript Object Notation (JSON)/ eXtensible Markup Language (XML) communication. In particular, a handling unit (234) of the system (108), is configured to receive the command from one of, the service, the microservice, the component and the application. Thereafter, the handling unit (234) provides information derived from the command pertaining to selection of the existing data source to the retrieving unit 216. Based on which, the retrieving unit (216) retrieves the data from the existing data source. In an implementation, the data source is selected/transferred or received (may be via an automatic request) by executing some command at the user interface (206) or the CLI. In an example, a customer support manager logs into the application’s dashboard (UI 206) and selects a new dataset containing recent customer inquiries. They specify parameters for retraining, such as increasing the focus on specific intents (e.g., billing questions). In another example, the developer runs a command using the CLI to initiate retraining with a script that pulls in historical chat logs and current interaction data. In another example, the analytics microservice detects a spike in inquiries about a new feature and automatically sends a request to the training unit (224) to retrain the model with this specific context.
[0062] In another embodiment, the retrieving unit (216) creates the new data source when the user selects one or more data sources from a list of data sources via the user interface (206) or the CLI, or at least one of: the service, the microservice, the component and the application transmits the command to the retrieving unit (216) to select the one or more data sources. Further, the retrieving unit (216) pulls the data from one or more data sources and stores the data at the new data source. Further, the retrieving unit (216) retrieves the data from the new data source. In simple terms, if the user of the system (108) selects the new data source then, the user of the system (108) creates the new data source by retrieving the data from data as file input or the data from the source path, an input stream, a Hypertext Transfer Protocol (HTTP), Hadoop Distributed File Systems (HDFS) and a network attached storage (NAS).
[0063] Further, the categorizing unit (218) categorizes the data as at least one of historic data and current data. In an embodiment, the data is categorized as at least one of: the historic data and the current data based on a time of generation of the data.
[0064] In an example, consider a retail shop company that uses the machine learning model to predict sales trends based on the historical data and the current data. The historic data means sales data from the last five years (e.g., daily sales figures, customer details or the like). This data is categorized based on the time it was generated, so any data prior to the last six months is labeled as historic. The current data means the sales data from the last six months, which reflects recent trends and changes in consumer behavior. The current data includes daily sales figures, promotional events, seasonal influences or the like. When the new sales data is generated, the categorizing unit (218) analyzes the timestamp of this data. If the data is from today or the past six months, the categorizing unit (218) categorizes the data as current data. If the data is older than six months, the categorizing unit (218) categorizes the data as the historic data.
[0065] The pre-processing unit (220) automatically pre-processes the historic data and the current data. In an embodiment, the pre-processing unit (220) cleans the historic data and the current data by removing at least one of, duplicates and correcting data types.
[0066] The tuning unit (222) autotunes one or more hyperparameters for the pre-processed historic data and the current data based on one or more parameters. The one or more parameters include at least one of: patterns and trends of the historic data and the current data, size of the data, size of the model and weight of the model. In an embodiment, the tuning unit (222) identifies the one or more parameters from the historic data and the current data. Further, the tuning unit (222) selects the one or more hyperparameters, from one or more sources, used previously which represents similar type of the one or more parameters. Based on the selected one or more hyperparameters, the tuning unit (222) autotunes the one or more hyperparameters for the historic data and the current data. In an example, the tuning unit (222) effectively automates a hyperparameter tuning process, so as to improve the ML model accuracy by utilizing both historical and current datasets. This approach helps ensure that the ML model is robust and well-suited to predict customer churn accurately.
[0067] For example, during the hyperparameter tuning, the tuning unit (222) uses a GridSearchCV to autotune the hyperparameters. In an example,
from sklearn.model_selection import GridSearchCV
param_grid = {
'n_estimators': [100, 200],
'max_depth': [None, 10, 20],
'min_samples_split': [2, 5, 10]
}
grid_search = GridSearchCV(model, param_grid, cv=5)
grid_search.fit(historical_data.drop('target', axis=1), historical_data['target'])
[0068] Consider an example, the company uses a machine learning model to predict future sales based on historic and current sales data. The tuning unit (222) is responsible for optimizing the hyperparameters to improve model performance. The company has a dataset containing sales figures over several months (as shown in table 3):
Month Sales Promotions Season
January 2000 Yes Winter
February 2500 No Winter
March 3000 Yes Spring
April 3500 No Spring
May 4000 Yes Spring
Table 3
[0069] The tuning unit (222) analyzes the dataset and identifies key parameters, including patterns and trends observing seasonal sales fluctuations, the size of the data (for example, the dataset contains 5 months of sales data), size of the model (for example, using a regression model with a moderate number of features) and weight of the model used for adjusting for the importance of promotional effects.
[0070] Further, the tuning unit (222) reviews historical model tuning records to find previously used hyperparameters for similar datasets, such as learning rate about 0.01, regularization strength about 0.1 and number of trees (for ensemble methods) about 100. Based on the identified parameters and selected hyperparameters, the tuning unit (222) applies techniques such as grid search techniques for testing various combinations of learning rates and regularization strengths, and random search techniques for exploring a broader range of hyperparameters given the moderate data size. After autotuning, the tuning unit (222) determines the optimal hyperparameters such as learning rate about 0.005, regularization strength about 0.05 and number of trees about 150 (this increased for better prediction. The optimized hyperparameters are set for the model so as to improve its ability to accurately predict future sales.
[0071] In the above example, the tuning unit (222) effectively autotunes hyperparameters for a sales forecasting model by analyzing historic and current data, selecting relevant parameters, and applying previous tuning insights. This results in improved model performance tailored to the dataset characteristics.
[0072] The training unit (224) trains the trained model with the historic data and the current data. The historic data and the current data are pre-processed and autotuned with the one or more hyperparameters. After the pre-processing and autotuning, the cleaned dataset (comprising both historic and current data) can be fed into the machine learning model. This improves the model's accuracy and effectiveness in understanding sentiment by reducing variability and noise in the data.
[0073] The training unit (224) retrains the one or more trained ML/AI models based on receiving configurations from at least one of, the user via the user interface (206) or the CLI, or receiving configurations from one of, the service, the microservice, the component and the application. In an embodiment, the one or more trained models include at least one of, but not limited to, Generative Pre-Trained Transformers-Jumbo (GPT-J), Large Language Model Meta AI2 (LAMA2), Bloom, Generative Pre-Trained Transformers -neo (GPT-neo) and Falcon etc. In an example, a customer support manager logs into the application’s dashboard (UI 206) and selects a new dataset containing recent customer inquiries. They specify parameters for retraining, such as increasing the focus on specific intents (e.g., billing questions). In another example, the developer runs a command using the CLI to initiate retraining with a script that pulls in historical chat logs and current interaction data. In another example, the analytics microservice detects a spike in inquiries about a new feature and automatically sends a request to the training unit (224) to retrain the model with this specific context.
[0074] In an example, the training unit (224) receives the configurations from one or more sources (for example, from the user interface (UI) for the specified dataset, the CLI command for an updated model version, and the microservice for recent data trends or the like). Further, the training unit (224) then combines the historical data (e.g., past 6 months of chat logs) with the current data (e.g., latest interactions) to prepare for retraining.
[0075] Further, the training unit (224) collects results of retraining the one or more trained models. Further, the training unit (224) notifies the results of retraining to the user by displaying the results on the user interface (206), or notifies the results to one of, the service, the microservice, the component and the application using the handling unit (234). The one or more models are retrained using different sets of the historic and current data, the different sets of the historic and the current data are allocated by at least one of, the user or one of, the service, the microservice, the component and the application. In short, the training unit (224) effectively manages retraining by collecting user configurations, gathering data, retraining models, and notifying users and services of the results, ensuring that the model remains accurate and relevant based on the latest data. In other words, the training unit (224) retrains the machine learning model using the specified configurations and data sets. the training unit (224) may involve fine-tuning parameters to improve accuracy for the identified intents. Once retraining is complete, the training unit (224) evaluates the model’s performance on validation data to ensure improvements. The updated model is deployed to the chatbot (for example).
[0076] The notifying unit (226) notifies a status of training the model to the user. In another example, the notifying unit (226) notifies the status of training the model to one of, the service, the microservice, the component and the application In an embodiment, the notifying unit (226) notifies status of completion of training the trained model utilizing one or more identifiers. The one or more identifier can be, for example, but not limited to training name, model name, version, type and/or name of the data source used. The notifying unit (226) performs the one or more actions. The one or more action can be for example but not limited to retrain the trained model or delete the trained model. The user is notified of training the model by at least one of, alerts or notifications (e.g., short message, email, push notification or the like).
[0077] In an example, a company regularly updates its machine learning model for predicting customer behaviour based on the new data or the existing data. The training process is automated, and the team needs to be notified once the training is completed. The notifying unit (226) is responsible for sending the notifications to the users about the status of the model training. The notifying unit (226) is used to specify which model has been trained and relevant details (e.g., Model Name: CustomerChurnPredictor, Version: 2.1, Training Name: Analysis_2024, Data Source: Customer_Dataset_2024, Status: Completed Successfully).
[0078] In an alternate embodiment, the notifying unit (226) is configured to notify one of, the service, the microservice, the application and the component pertaining to the status of training the model. For example, the notifying unit (226) is configured to notify the status of training the model by transmitting an acknowledgment to one of, the service, the microservice, the application using a handling unit (234). The handling unit (234) is configured to keep a record of mappings of interaction of the entities (such as the service, microservice, application, component) with the system (108). Mappings of the interaction of the entities with the system (108) pertains to at least one of, the entities transmitting commands and/or requests to the system (108) to perform one or more actions such as for example selection of one of, the existing data source or the one or more data sources, or allocating the one or more network elements to train the model. Based on the mapping, the handling unit (234) informs the notifying unit 226 to which entity the acknowledgment has to be transmitted pertaining to the status of the training of the model. For example, let us consider that the microservice 1 had transmitted the command at the outset to select one of the existing data source or the one or more data sources, the handling unit (234) keeps a track of this event. Basis which, the handling unit (234) informs the notifying unit (226) to transmit the acknowledgment (response) to the microservice 1 pertaining to the status of training the model.
[0079] Further, the creating unit (228) create a training name and a model name (e.g., CustomerChurnPredictor or the like) for training the model based on receiving the training name and the user selecting the model’s name from the list of model names via the user interface (206). In an example, the creating unit (228) selects different set of combined historic data and current data. Each set can be allocated to different model selected by the creating unit (228). The re-training execution may happen over one or more processor (202) and/or over distributive computing. Further, the version setting unit (230) sets a version based on the created training name. In an example, the version setting unit (230) sets the version 2.1 based on the created training name.
[0080] Further, the allocating unit (232) allocates one or more network elements (e.g., server, network functions (e.g., Access and Mobility Management Function (AMF) entity or the like)) for training the trained model based on the user selecting the one or more network elements for training the trained model via the user interface (206). In an alternate embodiment, the allocating unit (232) allocates one or more network elements for training the model based on one of, the service, the microservice, the application, or the component transmitting a request to the allocating unit to (232) to select the one or more network elements for retaining the existing trained model.
[0081] The example for training the model is explained in FIG. 4 to FIG. 6.
[0082] FIG. 3 is an example schematic representation of the system (300) of FIG. 1 in which various entities operations are explained, according to various embodiments of the present system. It is to be noted that the embodiment with respect to FIG. 3 will be explained with respect to the first UE (102-1) and the system (108) for the purpose of description and illustration and should nowhere be construed as limited to the scope of the present disclosure.
[0083] As mentioned earlier, the first UE (102-1) includes one or more primary processors (305) communicably coupled to the one or more processors (202) of the system (108). The one or more primary processors (305) are coupled with a memory (310) storing instructions which are executed by the one or more primary processors (305). Execution of the stored instructions by the one or more primary processors (305) causes the UE (102-1) to transmit, a command to select at least one of, the existing data sources or the one or more data sources to one or more processor. The execution of the stored instructions by the one or more primary processors (305) further enables the UE (102-1) to execute the requests in the communication network (106).
[0084] As mentioned earlier, the one or more processors (202) is configured to transmit a response content related to the trained model to the UE (102-1). More specifically, the one or more processors (202) of the system (108) is configured to transmit the response content to at least one of the UE (102-1). A kernel (315) is a core component serving as the primary interface between hardware components of the UE (102-1) and the system (108). The kernel (315) is configured to provide the plurality of response contents hosted on the system (108) to access resources available in the communication network (106). The resources include one of a Central Processing Unit (CPU), memory components such as Random Access Memory (RAM) and Read Only Memory (ROM).
[0085] As per the illustrated embodiment, the system (108) includes the one or more processors (202), the memory (204), the input/output interface unit (206), the display (208), and the input device (210). The operations and functions of the one or more processors (202), the memory (204), the input/output interface unit (206), the display (208), and the input device (210) are already explained in FIG. 2. For the sake of brevity, we are not explaining the same operations (or repeated information) in the patent disclosure. Further, the processor (202) includes retrieving unit (216), the categorizing unit (218), the pre-processing unit (220), the tuning unit (222), the training unit (224), the notifying unit (226), the creating unit (228), the version setting unit (230), and the allocating unit (232). The operations and functions of the retrieving unit (216), the categorizing unit (218), the pre-processing unit (220), the tuning unit (222), the training unit (224), the notifying unit (226), the creating unit (228), the version setting unit (230), and the allocating unit (232) are already explained in FIG. 2. For the sake of brevity, we are not explaining the same operations (or repeated information) in the patent disclosure.
[0086] FIG. 4 illustrates a system architecture (400) for training of the model, according to various embodiments of the present system. The system architecture (400) includes the one or more processors (202), the memory (204), the input/output interface unit (206), the display (208), and the input device (210). The operations and functions of the one or more processors (202), the memory (204), the input/output interface unit (206), the display (208), and the input device (210) are already explained in FIG. 2. For the sake of brevity, we are not explaining the same operations (or repeated information) in the patent disclosure. Further, the processor (202) includes retrieving unit (216), the categorizing unit (218), the pre-processing unit (220), the tuning unit (222), the training unit (224), the notifying unit (226), the creating unit (228), the version setting unit (230), and the allocating unit (232). The operations and functions of the retrieving unit (216), the categorizing unit (218), the pre-processing unit (220), the tuning unit (222), the training unit (224), the notifying unit (226), the creating unit (228), the version setting unit (230), and the allocating unit (232) are already explained in FIG. 2. For the sake of brevity, we are not explaining the same operations (or repeated information) in the patent disclosure.
[0087] Further, the system architecture (400) includes the system (108) configured to interact with an integrated system (402) via a load balancer (404) and a data-lake (406). The system (108) is integrated with a Large Language Model (LLM) to provide provisions for optimal retaining.
[0088] The integrated system (402) collects raw data from different data sources. The load balancer (404) distributes the data source request traffic across the system architecture (400). The input device (210) is for taking the inputs from the user. By means of the input device (210), the user will give the training name for the model name. The model name is selected by the user from the list of models like GPT-J, LAMA2, Bloom, GPT-neo and Falcon etc (for example). The GPT-J is a generative pre-trained transformer model designed to produce human-like text that continues from a prompt. The GPT-J is used for various natural language processing tasks, including text generation, translation, and question-answering. The LAMA 2 (Large Language Model Meta AI) is a series of state-of-the-art language models developed by Meta®. The LAMA 2 models are trained on a diverse dataset, allowing them to generate coherent and contextually relevant text. They are available for both research and commercial use, promoting open access and collaboration in the AI community. The Bloom also known as, BigScience Large Open-science Open-access Multilingual Language Model, is an open-access multilingual language model developed by the BigScience research collective. The Bloom aims to provide a powerful and accessible tool for natural language processing, promoting multilingual applications and research in a machine learning field. The GPT-Neo is an open-source language model developed by EleutherAI®, designed to replicate the capabilities of OpenAI's GPT-3. The Falcon is a series of open-source language models developed by the Technology Innovation Institute (TII). The Falcon aims to provide high-performance language models that are freely accessible to the research community and industry, supporting innovation in the field of artificial intelligence. The LLM as a service sets the training version by default for a specified training name (for example). The user will select the execution group where the training is going to execute from the list provided by the LLM as a service. Further, the system (108) allows an entry to a new data source. In simple terms, if the user selects the new data source then, the system (108) creates a new data source by retrieving the data from data as file input, data from the source path, data as the input stream, data from the HTTP2, data from the HDFS (Hadoop Distributed File Systems) and data from NAS (network attached storage).
[0089] If the user selects existing data source then, the user will select the data source from the list of existing data sources. Further, the system (108) pre-processes the data received to normalize and clean it. On the basis of the historic data and the current data, the LLM as a service does the preprocessing such as cleaning and normalizing the text content while removing formatting tags and irrelevant elements from the data source which contain various formatting elements such as headings, tables, footnotes, page numbers, other structural components and images. The cleaning and normalizing the data is performed by removing the noise from extra rows which contain invalid column values, such as NaN, None, 0, null, or empty strings. By using the tuning unit (222), the system (108) automatically evaluates the user's data and identifies the most suitable hyper-parameters to the ML algorithm that is most likely to yield the best results. Further, the system (108) auto tunes the hyper-parameters based on data size, model weight size and suitable for provided input. ML training unit, the execution of ML training is performed in this unit. The hyper-parameters will be set by auto tuning by studying the data trends and patterns for the given data. The ML training will be performed on the given data.
[0090] The display (208) displays a training status list which contains tabular view of the training name, the model name, the version, the data source type, the status and action like retrain and delete.
[0091] Further, the system (108) is connected to the data-lake (406) which is a distributed database used to store the processed data and algorithm outputs. The LLM as a service stores the trained model by performing retrain of existing model on new data source or existing data source which can be used by other users also for further retraining and inference.
[0092] The system (108) is configured to interact with an external and internal data sources (not shown). The system (108) may further include one or more database and is capable of interacting with one or more application server in the network (106).
[0093] Further, the system (108) may be configured to interact with various component of the network and external network by means of various Application Programming Interface (API), databases and servers or any other compatible element. The data bases/data lakes (406) are configured to store past data, dynamic data, and trained models for future necessity.
[0094] The system (108) may further be configured to incorporate even more data into pre-processing steps if required to refine the data analysis. The pre-processing step involves extracting and normalizing the data by applying suitable operation filter, normalization, cleaning and standardization of data.
[0095] In an example, the user, by means of the present system (108), first creates a training name and selects the model name. The user will give the training name. Model name is selected by the user from the list of models like GPT-J, LAMA2, Bloom, GPT-neo and Falcon (for example). The system (108) sets the version by the default. The LLM as a service sets the training version by default for the specified training name. For example, if the user given training name as trainingname1 then, a first time version will be set to version1.1 by default. When a second time training name is given as TrainingName1 then, the version will be set as version1.2 by default.
[0096] Further, the user will select the execution group where the training is going to execute from the list provided by the LLM as a service. The execution group is the group of different servers. Whatever execution groups are accessible to that particular user is displayed in the list and user has to select the execution from the list. The ML training will be performed on the selected execution group. After that, the user has to select the data source. If the user selects a new data source then, the system (108) creates a new data source by retrieving the data from the data as file input, the data from source path, the data as input stream, the data from HTTP2, the data from HDFS and the data from NAS.
[0097] If the user selects existing data source then, the user will select the data source from the list of existing data sources. The user has to select the source path. The data path is determined by providing valid credentials for the new data source and for the existing data source. Further, the user has to select the data source from drop down list of the LLM. On the basis of historic and current data, the LLM as a service does the preprocessing such as cleaning and normalizing the text content while removing formatting tags and irrelevant elements from the data source which contain various formatting elements such as headings, tables, footnotes, page numbers, other structural components and images.
[0098] The system auto-tunes the hyper-parameters based on data size, model weight size and suitable for provided input. The system (108) then executes ML training. The ML training will be performed on the given data. The LLM as a service stores the trained model by performing retrain of existing model on new data source or existing data-source which can be used by other users also for further retraining and inference. The LLM as a service displays the training status list which contains tabular view of the training name, the model name, the version, the data source type, the status and the action like retrain and delete.
[0099] The most unique aspect of this invention is integration of LLM as service for the AI/ML model training where the system (108) is configured to selects the hyper-parameters automatically based on the data size, the data volume, the model weight and the model size. The system (108) is further configured to obtain and process various types of data including data as the file input, the data from the source path, the data as the input stream, the data from the HTTP2, the data from the HDFS and the data from the NAS to retrain the existing model with new data source or existing data source. The system (108) provides option for user to retrain the existing model such as GPT-J, LAMA2, Bloom, GPT-neo and Falcon on new data source as well as existing data source. The system (108) gives the trained model after retraining the existing model on data as file input, the data from the source path, the data as the input stream, the data from the HTTP2, the data from the HDFS and data from NAS. The system (108) is configured to auto-select hyper-parameters for ML model based on data size and model weight size. The system (108) is configured to efficiently find the best parameters for training methodology. By incorporating these insights, the present system (108) can auto-tune the ML algorithm to better suit the specific requirements of the data and improve its overall effectiveness. By leveraging machine learning techniques, the present system (108) automates the process of parameter selection, reducing the burden on users and ensuring optimal performance.
[00100] The system (108) is configured to receive various data including the data as file input, the data from the source path, the data as input stream, the data from HTTP2, the data from the HDFS and the data from the NAS to retrain the existing model with new data source or existing data source. The integration with LLM as a service, provides an option for the user to retrain the existing model like GPT-J, LAMA2, Bloom, GPT-neo and Falcon etc. on the new data source as well as existing data source. The system (108) gives the trained model after retraining the existing model on the data as file input, the data from the source path, the data as input stream, the data from the HTTP2, the data from the HDFS and the data from the NAS. As the system (108) is configured to automatically preprocess and tune data, it becomes easier for the user to train/retrain the ML model. The system (108) also provides a training status list which contains a tabular view of training name, model name, version, data source type, status and action. In the action column user can delete or retrain the trained model by seeing the training name, model name and version. This helps the user to retrain the model as many times on same model with new data source or existing data source with different versions.
[00101] The system (108) may further be configured to interact with the application servers, an IPM (integrated performance management) system, an FMS (Fulfillment Management System),an NMS (network management system) modules in the network (106) via an application programming interface (API) as medium of communication and may perform the process by means of various formats like JSON(JavaScript Object Notation), Python or any other compatible formats.
[00102] FIG. 5 is an exemplary flow diagram (500) illustrating the method for training the model, according to various embodiments of the present disclosure.
[00103] At 502, the method includes retrieving the data from at least one of: the existing data source or the new data source. In an embodiment, the method allows the retrieving unit (216) to retrieve data from at least one of: the existing data source or the new data source.
[00104] At 504, the method includes categorizing the data as at least one of the historic data and the current data. In an embodiment, the method allows the categorizing unit (218) to categorize, the data as at least one of: the historic data and the current data.
[00105] At 506, the method includes automatically pre-processing the historic data and the current data. In an embodiment, the method allows the pre-processing unit (220) to automatically pre-process the historic data and the current data.
[00106] At 508, the method includes autotuning the one or more hyperparameters for the pre-processed historic data and the current data based on the one or more parameters. In an embodiment, the method allows the tuning unit (222) to autotune the one or more hyperparameters for the pre-processed historic data and the current data based on the one or more parameters. The one or more parameters include at least one of: the patterns and trends of the historic data and the current data, the size of the data, the size of the model and the weight of the model.
[00107] At 510, the method includes training the model with the historic data and the current data. The historic data and the current data are pre-processed and autotuned with the one or more hyperparameters. In an embodiment, the method allows the training unit (224) to train the model with the historic data and the current data.
[00108] At 512, the method includes notifying, at least one of: the user, the service, the microservice, the component and the application, the status of training the model. In an embodiment, the method allows the notifying unit (226), to, notify, at least one of: the user, the service, the microservice, the component and the application, the status of training the model.
[00109] FIG. 6 is an example flow diagram (600) illustrating the method for training the model, according to various embodiments of the present disclosure.
[00110] At 602, the user of the system (108) creates the training name and selects the model name from the list of models like GPT-J, LAMA2, Bloom, GPT-neo and Falcon, by using the user interface (206). At 604, the system (108) automatically sets the version by default for the particular training name. For example, if the user is given the training name as TrainingName1 then, the first time version will be set to version1.1 by default. When the second time training name is given as TrainingName1 then, the version will be set as version1.2 by default.
[00111] At 606, the user of the system (108) selects the execution group. The execution group is the group of different servers. Whatever the execution groups are accessible to that particular user is displayed in the list and the user has to select the execution from the list. The ML training will be performed on selected execution group.
[00112] At 608, the user of the system (108) selects the data source as existing or new. If the user selects the new data source then, the system (108) creates the new data source by retrieving the data from data as file input, the data from source path, the data as input stream, the data from HTTP2, the data from HDFS and the data from NAS. If the user selects the existing data source then user will select the data source from the list of existing data sources.
[00113] At 610, the system (108) with LLM as service preprocesses the data source. On the basis of historic and current data does the preprocessing. The cleaning and normalizing the text content is performed while removing formatting tags and irrelevant elements from the data source which contain various formatting elements such as headings, tables, footnotes, page numbers, other structural components and images. The cleaning and normalizing the data is performed by removing the noise from extra rows which contain invalid column values, such as NaN, None, 0, null, or empty strings.
[00114] At 612, the system and integrated LLM automatically tunes the hyper-parameters based on data size, model weight size and suitable for provided input. At 614, the system (108) executes the ML training. At 616, the LLM as a service stores the trained model by performing retrain of existing model on new data source or existing data source. At 618, LLM as a service displays the training status list which contains tabular view of training name, model name, version, data source type, status and action like retrain and delete.
[00115] In preferred embodiments, the method may also include various steps to collect information from network elements like servers and other network functions, trigger consecutive operational procedures etc., improve learning methodology for retraining the machine learning models and may not be considered strictly limited to the above method steps.
[00116] Below is the technical advancement of the present invention:
[00117] The system (108) can be used to select the hyper-parameters automatically based on specific criteria like data size, model weight and model size. The system (108) utilizes large language models (LLM) and advanced AI/ML techniques to automate the hyper-parameter tuning during model training by evaluating input data and identifying the most suitable hyper-parameters to the selected methodology to yield the best results and thus reducing the requirement of manual intervention and minimizing resource consumption. The system (108) can be used to enhance the efficiency and effectiveness of model training. The system (108) and method are versatile and compatible with a wide range of data source.
[00118] The system (108) has a capability to train the ML model using various data sources. The system (108) has an ability to preprocess different type of data sources. In an example, the preprocessing is done by the system having LLM as a service on various data including the data as file input, the data from source path, the data as input stream, the data from HTTP2, the data from HDFS and the data from NAS for seamless and smooth ML training. The system (108) reduces the time for hyper-parameter selection. In an embodiment, the time required for configuring the best-suited hyper-parameters for different algorithms has significantly reduced.
[00119] The system (108) automates the hyper-parameter selection. In other words, the system having LLM as a service automated the evaluation of various hyper-parameters within a specified value range, enabling us to efficiently optimize algorithm performance and enhance the accuracy of the analysis results.
[00120] The system (108) eliminates the need for manual intervention. In other words, by automating the hyper-parameter evaluation process, the system (108) having LLM as a service eliminates the need for manual and time-consuming exploration of different hyper-parameter combinations. The system (108) having LLM as a service intelligently assesses the impact of various hyper-parameters on algorithm performance, allowing for the optimal settings more quickly and accurately.
[00121] The system (108) improves the efficiency and faster operation. This efficiency improvement not only saves time but also enhances the overall performance of the algorithms. By auto-tuning the hyper-parameters, the system having LLM as a service achieve better accuracy, robustness, and adaptability to different data sources.
[00122] A person of ordinary skill in the art will readily ascertain that the illustrated embodiments and steps in description and drawings (FIGS. 1-6) are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope and spirit of the disclosed embodiments.
[00123] Method steps: A person of ordinary skill in the art will readily ascertain that the illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope and spirit of the disclosed embodiments.
[00124] The present invention offers multiple advantages over the prior art and the above listed are a few examples to emphasize on some of the advantageous features. The listed advantages are to be read in a non-limiting manner.
REFERENCE NUMERALS
[00125] Environment - 100
[00126] UEs– 102, 102-1-102-n
[00127] Server - 104
[00128] Communication network – 106
[00129] System – 108
[00130] Processor – 202
[00131] Memory – 204
[00132] User Interface – 206
[00133] Display – 208
[00134] Input device – 210
[00135] Database – 214
[00136] Retrieving unit– 216
[00137] Categorizing unit – 218
[00138] Pre-processing unit – 220
[00139] Tuning unit - 222
[00140] Training unit – 224
[00141] Notifying unit – 226
[00142] Creating unit – 228
[00143] Version setting unit – 230
[00144] Allocating unit - 232
[00145] System - 300
[00146] Primary processors -305
[00147] Memory– 310
[00148] Kernel– 315
[00149] System architecture – 400
[00150] Integrated system – 402
[00151] Load balancer – 404
[00152] Data-lake – 406
[00153] A handling unit-234
,CLAIMS:CLAIMS:
We Claim:
1. A method for training one or more models, the method comprising the steps of:
retrieving, by one or more processors (202), data from at least one of, an existing data source or a new data source;
categorizing, by the one or more processors (202), the data as at least one of, historic data and current data;
pre-processing, by the one or more processors (202), the historic data and the current data;
autotuning, by the one or more processors (202), one or more hyperparameters for the pre-processed historic data and the current data based on one or more parameters;
training, by the one or more processors (202), the one or more models with the historic data and the current data, wherein the historic data and the current data are pre-processed and autotuned with the one or more hyperparameters; and
notifying, by the one or more processors (202), at least one of, a user or one of, a service, a microservice, a component and an application, a status of training the one or more models.
2. The method as claimed in claim 1, wherein the step of, retrieving, data from an existing data source, includes the step of:
retrieving, by the one or more processors (202), the data from the existing data source based on at least one of:
the user selecting the existing data source via at least one of, a user interface (206) or a Command Line Interface (CLI); or
one of, a service, microservice, component and an application transmits a command to the one or more processors to select the existing data source.
3. The method as claimed in claim 1, wherein the step of, retrieving, data from a new data source, includes the steps of:
creating, by the one or more processors (202), the new data source when at least one of:
the user selects one or more data sources from a list of data sources via at least one of, the user interface (206) or a Command Line Interface (CLI); or
one of, the service, the microservice, the component and the application transmits a command to the one or more processors to select the one or more data sources;
pulling, by the one or more processors (202), the data from one or more data sources and storing the data at the new data source; and
retrieving, by the one or more processors (202), the data from the new data source.
4. The method as claimed in claim 1, wherein the data is categorized as at least one of, the historic data and the current data based on a time of generation of the data.
5. The method as claimed in claim 1, wherein the one or more parameters include at least one of, size of the data, size of the model and weight of the model.
6. The method as claimed in claim 1, wherein the step of, autotuning, one or more hyperparameters for the pre-processed historic data and the current data, includes the steps of:
identifying, by the one or more processors (202), the one or more parameters from the historic data and the current data;
selecting, by the one or more processors (202), the one or more hyperparameters from one or more sources based on the identified one or more parameters, wherein the one or more sources includes the one or more hyperparameters which are at least one of, used previously representing similar types of the one or more parameters, or pre-defined for the one or mor parameters; and
autotuning, by the one or more processors (202), the one or more hyperparameters selected for the historic data and the current data based on the selected one or more hyperparameters.
7. The method as claimed in claim 1, wherein the method further comprising the steps of:
creating, by the one or more processors (202), a training name and a model name for training the model based on receiving the training name and the user selecting the model’s name from a list of model names via a user interface (206);
setting, by the one or more processors (202), a version based on the created training name; and
allocating, by the one or more processors (202), one or more network elements for training the model based on at least one of, the user selecting the one or more network elements for training the model via the user interface (206), or one of, the service, the microservice, the application or the component transmitting a request to the one or more processors to select the one or more network elements for retaining the existing trained model.
8. The method as claimed in claim 1, wherein the status of notifying the user of training the model, includes at least one of, status of completion of training the model utilizing one or more identifiers including at least one of, training name, model name, version, type and/or name of the data source used and one or more actions including at least one of, retrain or delete the trained model, wherein the user is notified of training the model by at least one of, alerts or notifications..
9. The method as claimed in claim 1, wherein the method further comprising the steps of:
retraining, by the one or more processors, the one or more trained models based on receiving configurations from at least one of:
the user via the user interface or the CLI; or
receiving configurations from one of, the service, the microservice, the component and the application;
collecting, by the one or more processors, results of retraining the one or more trained models;
notifying, by the one or more processors, the results of retraining to the user by displaying the results on the user interface, or notifying the results to one of, the service, the microservice, the component and the application,
wherein, the one or more models are retrained using different sets of the historic and current data and one or more hyperparameters which are autotuned, the different sets of the historic and the current data are allocated by at least one of, the user or one of, the service, the microservice, the component and the application.
10. The method as claimed in claim 1, wherein the one or more processors notifies one of, the service, the microservice, the component and the application by, transmitting, an acknowledgement to at least one of, the service, the microservice, the component and the application pertaining to the status of the training of the one or more models.
11. A system (108) for training one or more models, the system (108) comprising:
a retrieving unit (216), configured to, retrieve, data from at least one of, an existing data source or a new data source;
a categorizing unit (218), configured to, categorize, the data as at least one of, historic data and current data;
a pre-processing unit (220), configured to, automatically pre-process, the historic data and the current data;
a tuning unit (222), configured to, autotune, one or more hyperparameters for the pre-processed historic data and the current data based on one or more parameters;
a training unit (224), configured to, train, the model with the historic data and the current data, wherein the historic data and the current data are pre-processed and autotuned with the one or more hyperparameters; and
a notifying unit (226), configured to, notify, at least one of, a user or one of, a service, a microservice, a component and an application, a status of training the one or more models.
12. The system (108) as claimed in claim 11, wherein the retrieving unit (216), retrieves, the data from the existing data source, by:
retrieving, the data from the existing data source based on at least one of, the user selecting the existing data source via at least one of:
a user interface (206) or a Command Line Interface (CLI); or
one of, a service, microservice, component and an application transmits a command to the one or more processors to select the existing data source.
13. The system (108) as claimed in claim 11, wherein the retrieving unit (216), retrieves the data from the new data source, by:
creating, the new data source when at least one of:
the user selects one or more data sources from a list of data sources via at least one of, the user interface (206) or a Command Line Interface (CLI); or
one of, the service, the microservice, the component and the application transmits a command to the retrieving unit to select the one or more data sources;
pulling, the data from one or more data sources and storing the data at the new data source; and
retrieving, the data from the new data source.
14. The system (108) as claimed in claim 11, wherein the data is categorized as at least one of, the historic data and the current data based on a time of generation of the data.
15. The system (108) as claimed in claim 11, wherein the one or more parameters include at least one of, size of the data, size of the model and weight of the model.
16. The system (108) as claimed in claim 11, wherein the tuning unit, autotunes, the one or more hyperparameters for the pre-processed historic data and the current data, by:
identifying, the one or more parameters from the historic data and the current data;
selecting, the one or more hyperparameters from one or more sources based on the identified one or more parameters, wherein the one or more sources includes the one or more hyperparameters which are at least one of, used previously representing similar types of the one or more parameters, or pre-defined for the one or more parameters; and
autotuning, the one or more hyperparameters selected for the historic data and the current data based on the selected one or more hyperparameters.
17. The system (108) as claimed in claim 11, wherein the system (108) further comprising:
a creating unit (228), configured to, create, a training name and a model name for training the model based on receiving the training name and the user selecting the model’s name from a list of model names via the user interface (206);
a version setting unit (230), configured to, set, a version based on the created training name; and
an allocating unit (232), configured to, allocate, one or more network elements for training the model based on at least one of, the user selecting the one or more network elements for training the model via the user interface (206) or, one of, the service, the microservice, the application, or the component transmitting a request to the allocating unit (232) to select the one or more network elements for retaining the existing trained model.
18. The system (108) as claimed in claim 11, wherein the status of notifying the user of training the model, includes at least one of, status of completion of training the model utilizing one or more identifiers including at least one of, training name, model name, version, type and/or name of the data source used and one or more actions including at least one of, retrain or delete the trained model, wherein the user is notified of training the model by at least one of, alerts or notifications..
19. The system (108) as claimed in claim 11, wherein the training unit (224) is further configured to:
retrain, the one or more trained models based on receiving configurations from at least one of, the user via the user interface or the CLI, or receiving configurations from one of, the service, the microservice, the component and the application;
collect, results of retraining the one or more trained models;
notify, the results of retraining to the user by displaying the results on the user interface, or notifying the results to one of, the service, the microservice, the component and the application,
wherein, the one or more models are retrained using different sets of the historic and current data and one or more hyperparameters which are autotuned, the different sets of the historic and the current data are allocated by at least one of, the user or one of, the service, the microservice, the component and the application.
20. The system (108) as claimed in claim 11, wherein the notifying unit (226) is further configured to:
transmit, an acknowledgement to at least one of, the service, the microservice, the component and the application pertaining to the status of the training of the one or more models based on information of mapping received from a handling unit (234).
21. A User Equipment (UE) (102), comprising:
one or more primary processors (305), communicatively coupled to one or more processors (202) in a network (106), wherein the one or more primary processors (305) are coupled with a memory (310) stores instructions, when executed by the one or more primary processors (305), cause the UE (102) to:
transmit, a command to select at least one of, the existing data sources or the one or more data sources to one or more processor (202), wherein the one or more processors (202) is configured to perform the steps of claim 1.
| # | Name | Date |
|---|---|---|
| 1 | 202321067277-STATEMENT OF UNDERTAKING (FORM 3) [06-10-2023(online)].pdf | 2023-10-06 |
| 2 | 202321067277-PROVISIONAL SPECIFICATION [06-10-2023(online)].pdf | 2023-10-06 |
| 3 | 202321067277-FORM 1 [06-10-2023(online)].pdf | 2023-10-06 |
| 4 | 202321067277-FIGURE OF ABSTRACT [06-10-2023(online)].pdf | 2023-10-06 |
| 5 | 202321067277-DRAWINGS [06-10-2023(online)].pdf | 2023-10-06 |
| 6 | 202321067277-DECLARATION OF INVENTORSHIP (FORM 5) [06-10-2023(online)].pdf | 2023-10-06 |
| 7 | 202321067277-FORM-26 [27-11-2023(online)].pdf | 2023-11-27 |
| 8 | 202321067277-Proof of Right [12-02-2024(online)].pdf | 2024-02-12 |
| 9 | 202321067277-DRAWING [07-10-2024(online)].pdf | 2024-10-07 |
| 10 | 202321067277-COMPLETE SPECIFICATION [07-10-2024(online)].pdf | 2024-10-07 |
| 11 | Abstract.jpg | 2024-12-30 |
| 12 | 202321067277-Power of Attorney [24-01-2025(online)].pdf | 2025-01-24 |
| 13 | 202321067277-Form 1 (Submitted on date of filing) [24-01-2025(online)].pdf | 2025-01-24 |
| 14 | 202321067277-Covering Letter [24-01-2025(online)].pdf | 2025-01-24 |
| 15 | 202321067277-CERTIFIED COPIES TRANSMISSION TO IB [24-01-2025(online)].pdf | 2025-01-24 |
| 16 | 202321067277-FORM 3 [31-01-2025(online)].pdf | 2025-01-31 |