Abstract: ABSTRACT SYSTEM AND METHOD FOR DEPLOYING LARGE LANGUAGE MODEL (LLM) ON ONE OR MORE EDGE DEVICES The present disclosure relates to a method for deploying an LLM on one or more edge devices by one or more processors (202). The method includes selecting at least one of, a server and one or more resources from a server pool. Further, the method includes collecting textual data from one or more data sources upon selecting the at least one of, the server and the one or more resources. Further, the method includes pre-training the LLM with a corpus of textual data to enable the LLM to learn general language representation. Further, the method includes training the pre-trained model with the collected textual data using at least one of, the server and the one or more resources. Further, the method includes compressing the trained model. Further, the method includes deploying the compressed trained model onto one or more edge devices. Ref. FIG. 5
DESC:
FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENTS RULES, 2003
COMPLETE SPECIFICATION
(See section 10 and rule 13)
1. TITLE OF THE INVENTION
SYSTEM AND METHOD FOR DEPLOYING LARGE LANGUAGE MODEL (LLM) ON ONE OR MORE EDGE DEVICES
2. APPLICANT(S)
NAME NATIONALITY ADDRESS
JIO PLATFORMS LIMITED INDIAN OFFICE-101, SAFFRON, NR. CENTRE POINT, PANCHWATI 5 RASTA, AMBAWADI, AHMEDABAD 380006, GUJARAT, INDIA
3.PREAMBLE TO THE DESCRIPTION
THE FOLLOWING SPECIFICATION PARTICULARLY DESCRIBES THE NATURE OF THIS INVENTION AND THE MANNER IN WHICH IT IS TO BE PERFORMED.
FIELD OF THE INVENTION
[0001] The present invention relates generally to an artificial intelligence/machine learning (AI/ML) system, and in particular, to a system and a method to deploy Large Language Model (LLM) on one or more edge devices.
BACKGROUND OF THE INVENTION
[0002] With the increase in number of users, the network service provisions have been upgraded to enhance a service quality so as to keep pace with such high demand. With advancement of technology, there is a demand for the telecommunication service to induce up-to-date features into the scope of provision. To enhance user experience and implement advanced monitoring mechanisms, prediction methodologies are being incorporated in a network management. An advanced prediction system integrated with an Artificial intelligence/machine learning (AI/ML) system excels in executing a wide array of algorithms and predictive tasks. An edge-level inference hosting, also known as on-device inference hosting or edge deployment of machine learning models, refers to the practice of deploying and running machine learning models directly on edge devices or at the edge of a telecommunication network.
[0003] Traditionally, Machine Learning (ML) with edge level hosting caters to only forecasting and anomaly detection-based models that are usually applied for temporal data only. Large Language Models (LLMs) need to be trained based on textual data that may consume a lot of time and may utilize more resources. This may lead to latency in the telecommunication network that may degrade the performance of the telecommunication network. The data implored mainly is tabular format (like excel format or the like). There is no available mechanism to process the textual data which are large in size at edge devices. There is a need to convert the data across all formats such as PDFs and DOCs into the text format that leads to efficient model training.
[0004] There is a requirement of a system and a method thereof to enable the predictive system to analyze large files like textual files and generate human-like text when questioned around trained information on edge devices.
SUMMARY OF THE INVENTION
[0005] One or more embodiments of the present disclosure provide a system and a method for deploying a Large Language Model (LLM) on one or more edge devices.
[0006] In one aspect of the present invention, the method for deploying the LLM on one or more edge devices is disclosed. The method includes selecting, by one or more processors, at least one of, a server and associated one or more resources from a server pool. Further, the method includes collecting, by the one or more processors, textual data from one or more data sources upon selecting the at least one of, the server and the associated one or more resources. Further, the method includes pre-training, by the one or more processors, the LLM with a corpus of textual data to enable the LLM to learn general language representation. Further, the method includes training, by the one or more processors, the pre-trained model with the collected textual data using at least one of, the server and the associated one or more resources. Further, the method includes compressing, by the one or more processors, the trained model. Further, the method includes deploying, by the one or more processors, the compressed trained model onto one or more edge devices.
[0007] In an embodiment, the at least one of, the server and the associated one or more resources are selected based on receiving an input from a user via a user interface pertaining to selection of the at least one of, the server and the associated one or more resources.
[0008] In an embodiment, the server is at least one of, an edge server.
[0009] In an embodiment, the step of, training, the pre-trained model with the collected textual data, includes the step of fine tuning, by the one or more processors, the pre-trained model to perform one or more tasks at an edge level. The one or more tasks pertain to at least one of, text classification, language modelling, named entity recognition (NER), part of speech tagging and text summarization.
[0010] In an embodiment, the step of collecting textual data from one or more data sources, further includes the step of: pre-processing, by the one or more processors, the collected textual data using the selected at least one of, the server and the associated one or more resources.
[0011] In an embodiment, the method further includes the step of synchronizing, by the one or more processors, the one or more trained models deployed onto the one or more edge devices with a centralized system. Further, the method includes updating, by the one or more processors, the one or more trained models with updated historic data which is retrieved from the centralized system.
[0012] In one aspect of the present invention, the system for deploying a LLM on one or more edge devices is disclosed. The system includes a selecting unit, a collecting unit, a training unit, a compressing engine and a deploying unit. The selecting unit is configured to select at least one of, a server and associated one or more resources from a server pool. The collecting unit is configured to, collect, textual data from one or more data sources upon selecting the at least one of, the server and the associated one or more resources. The training unit is configured to, pre-train, the LLM with a corpus of textual data to enable the LLM to learn general language representation. The training unit is configured to, train, the pre-trained model with the collected textual data using at least one of, the server and the associated one or more resources. The compressing engine is configured to, compress, the trained model. The deploying unit is configured to deploy the compressed trained model onto one or more edge devices.
[0013] In one aspect of the present invention, a non-transitory computer-readable medium having stored thereon computer-readable instructions is disclosed. The computer-readable instructions cause the processor to select, at least one of, a server and one or more resources from a server pool. The processor collects textual data from one or more data sources upon selecting the at least one of, the server and the associated one or more resources. Further, the processor pre-trains an LLM with a corpus of textual data to enable the LLM to learn general language representation. Further, the processor trains the pre-trained model with the collected textual data using at least one of: the server and the associated one or more resources. Further, the processor compresses the trained model. Further, the processor deploys the compressed trained model onto one or more edge devices.
[0014] Other features and aspects of this invention will be apparent from the following description and the accompanying drawings. The features and advantages described in this summary and in the following detailed description are not all-inclusive, and particularly, many additional features and advantages will be apparent to one of ordinary skill in the relevant art, in view of the drawings, specification, and claims hereof. Moreover, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes and may not have been selected to delineate or circumscribe the inventive subject matter, resort to the claims being necessary to determine such inventive subject matter.
BRIEF DESCRIPTION OF THE DRAWINGS
[0015] The accompanying drawings, which are incorporated herein, and constitute a part of this disclosure, illustrate exemplary embodiments of the disclosed methods and systems in which like reference numerals refer to the same parts throughout the different drawings. Components in the drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the present disclosure. Some drawings may indicate the components using block diagrams and may not represent the internal circuitry of each component. It will be appreciated by those skilled in the art that disclosure of such drawings includes disclosure of electrical components, electronic components or circuitry commonly used to implement such components.
[0016] FIG. 1 is an exemplary block diagram of an environment for deploying a Large Language Model (LLM) on one or more edge devices, according to various embodiments of the present disclosure.
[0017] FIG. 2 is a block diagram of a system of FIG. 1, according to various embodiments of the present disclosure.
[0018] FIG. 3 is an example schematic representation of the system of FIG. 1 in which various entities operations are explained, according to various embodiments of the present system.
[0019] FIG. 4 illustrates a system architecture for deploying the LLM on the one or more edge devices, in accordance with some embodiments.
[0020] FIG. 5 is an exemplary flow diagram illustrating the method for deploying the LLM on the one or more edge devices, according to various embodiments of the present disclosure.
[0021] FIG. 6 is a flow diagram illustrating an internal call flow for deploying the LLM on the one or more edge devices, in accordance with some embodiments.
[0022] Further, skilled artisans will appreciate that elements in the drawings are illustrated for simplicity and may not have necessarily been drawn to scale. For example, the flow charts illustrate the method in terms of the most prominent steps involved to help to improve understanding of aspects of the present invention. Furthermore, in terms of the construction of the device, one or more components of the device may have been represented in the drawings by conventional symbols, and the drawings may show only those specific details that are pertinent to understanding the embodiments of the present invention so as not to obscure the drawings with details that will be readily apparent to those of ordinary skill in the art having benefit of the description herein.
[0023] The foregoing shall be more apparent from the following detailed description of the invention.
DETAILED DESCRIPTION OF THE INVENTION
[0024] Some embodiments of the present disclosure, illustrating all its features, will now be discussed in detail. It must also be noted that as used herein and in the appended claims, the singular forms "a", "an" and "the" include plural references unless the context clearly dictates otherwise.
[0025] Various modifications to the embodiment will be readily apparent to those skilled in the art and the generic principles herein may be applied to other embodiments. However, one of the ordinary skill in the art will readily recognize that the present disclosure including the definitions listed here below are not intended to be limited to the embodiments illustrated but is to be accorded the widest scope consistent with the principles and features described herein.
[0026] A person of ordinary skill in the art will readily ascertain that the illustrated steps detailed in the figures and here below are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope and spirit of the disclosed embodiments.
[0027] Before discussing example, embodiments in more detail, it is to be noted that the drawings are to be regarded as being schematic representations and elements that are not necessarily shown to scale. Rather, the various elements are represented such that their function and general purpose becomes apparent to a person skilled in the art. Any connection or coupling between functional blocks, devices, components, or other physical or functional units shown in the drawings or described herein may also be implemented by an indirect connection or coupling. A coupling between components may also be established over a wireless connection. Functional blocks may be implemented in hardware, firmware, software or a combination thereof.
[0028] Further, the flowcharts provided herein, describe the operations as sequential processes. Many of the operations may be performed in parallel, concurrently or simultaneously. In addition, the order of operations maybe re-arranged. The processes may be terminated when their operations are completed but may also have additional steps not included in the figured. It should be noted, that in some alternative implementations, the functions/acts/ steps noted may occur out of the order noted in the figured. For example, two figures shown in succession may, in fact, be executed substantially concurrently, or may sometimes be executed in the reverse order, depending upon the functionality/acts involved.
[0029] Further, the terms first, second etc… may be used herein to describe various elements, components, regions, layers and/or sections, it should be understood that these elements, components, regions, layers and/or sections should not be limited by these terms. These terms are used only to distinguish one element, component, region, layer or section from another region, layer, or a section. Thus, a first element, component, region layer, or section discussed below could be termed a second element, component, region, layer, or section without departing form the scope of the example embodiments.
[0030] Spatial and functional relationships between elements (for example, between modules) are described using various terms, including “connected,” “engaged,” “interfaced,” and “coupled.” Unless explicitly described as being “direct,” when a relationship between first and second elements is described in the description below, that relationship encompasses a direct relationship where no other intervening elements are present between the first and second elements, and also an indirect relationship where one or more intervening elements are present (either spatially or functionally) between the first and second elements. In contrast, when an element is referred to as being "directly” connected, engaged, interfaced, or coupled to another element, there are no intervening elements present. Other words used to describe the relationship between elements should be interpreted in a like fashion (e.g., "between," versus "directly between," "adjacent," versus "directly adjacent," etc.).
[0031] The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting. Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which example embodiments belong. It will be further understood that terms, e.g., those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
[0032] As used herein, the singular forms “a,” “an,” and “the,” are intended to include the plural forms as well, unless the context clearly indicates otherwise. As used herein, the terms “and/or” and “at least one of” include any and all combinations of one or more of the associated listed items. It will be further understood that the terms “comprises,” “comprising,” “includes,” and/or “including,” when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
[0033] Unless specifically stated otherwise, or as is apparent from the description, terms such as “processing” or “computing” or “calculating” or “determining” of “displaying” or the like, refer to the action and processes of a computer system, or similar electronic computing device/hardware, that manipulates and transforms data represented as physical, electronic quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.
[0034] Various embodiments of the invention provide a system that is an Artificial Intelligence/Machine Learning (AI/ML)-based platform that is designed for executing diverse range of techniques/models and predictive tasks. The proposed system may be powered by the large language model (LLM). Its primary role revolves around comprehensive analysis of network data using machine learning models.
[0035] To this end, the present subject matter provides the ability to easily fine tune an LLM model instead of training LLMs model. By using the transfer learning, the initial training of the LLM may be avoided and the LLM may undergo a quick and less computing intensive fine-tuning process on domain-specific data at network edge. While transfer learning, the standard model is present that's already been trained on a large amount of data, and the proposed system is using the same standard model to retrain/finetune the LLM using particular textual data. The techniques of the present subject matter, therefore, provide for:
I. Fine tuning the LLM model by using the transfer learning that may lead to avoiding initial training of the LLM model.
II. The trained model is compressed using the techniques like model quantization in order to reduce the size of the models. This is done due to the limited storage and memory in edge devices.
[0036] FIG. 1 illustrates an exemplary block diagram of an environment (100) for deploying a LLM on one or more edge devices, according to various embodiments of the present disclosure. The environment (100) comprises a plurality of user equipment’s (UEs) or edge device(s) (102-1, 102-2, ……,102-n). The at least one UE (102-n) from the plurality of the UEs (102-1, 102-2, ……102-n) is configured to connect to a system (108) via a communication network (106). Hereafter, label for the plurality of UEs or one or more UEs or edge device(s) is 102.
[0037] In accordance with yet another aspect of the exemplary embodiment, the plurality of UEs or edge device(s) (102) may be a wireless device or a communication device that may be a part of the system (108). The wireless device or the UE or edge device(s) (102) may include, but are not limited to, a handheld wireless communication device (e.g., a mobile phone, a smart phone, a phablet device, and so on), a wearable computer device (e.g., a head-mounted display computer device, a head-mounted camera device, a wristwatch, a computer device, and so on), a laptop computer, a tablet computer, or another type of portable computer, a media playing device, a portable gaming system, and/or any other type of computer device with wireless communication or Voice Over Internet Protocol (VoIP) capabilities. In an embodiment, the UEs or edge device(s) (102) may include, but are not limited to, any electrical, electronic, electro-mechanical or an equipment or a combination of one or more of the above devices such as virtual reality (VR) devices, augmented reality (AR) devices, laptop, a general-purpose computer, desktop, personal digital assistant, tablet computer, mainframe computer, or any other computing device, where the computing device may include one or more in-built or externally coupled accessories including, but not limited to, a visual aid device such as camera, audio aid, a microphone, a keyboard, input devices for receiving input from a user such as touch pad, touch enabled screen, electronic pen and the like. It may be appreciated that the UEs or edge device(s) (102) may not be restricted to the mentioned devices and various other devices may be used. A person skilled in the art will appreciate that the plurality of UEs or edge device(s) (102) may include a fixed landline, and a landline with assigned extension within the communication network (106).
[0038] The communication network (106), may use one or more communication interfaces/protocols such as, for example, Voice Over Internet Protocol (VoIP), 802.11 (Wi-Fi), 802.15 (including Bluetooth™), 802.16 (Wi-Max), 802.22, Cellular standards such as Code Division Multiple Access (CDMA), CDMA2000, Wideband CDMA (WCDMA), Radio Frequency Identification (e.g., RFID), Infrared, laser, Near Field Magnetics, etc.
[0039] The communication network (106) includes, by way of example but not limitation, one or more of a wireless network, a wired network, an internet, an intranet, a public network, a private network, a packet-switched network, a circuit-switched network, an ad hoc network, an infrastructure network, a Public-Switched Telephone Network (PSTN), a cable network, a cellular network, a satellite network, a fiber optic network, or some combination thereof. The communication network (106) may include, but is not limited to, a Third Generation (3G) network, a Fourth Generation (4G) network, a Fifth Generation (5G) network, a Sixth Generation (6G) network, a New Radio (NR) network, a Narrow Band Internet of Things (NB-IoT) network, an Open Radio Access Network (O-RAN), and the like.
[0040] The communication network (106) may also include, by way of example but not limitation, at least a portion of one or more networks having one or more nodes that transmit, receive, forward, generate, buffer, store, route, switch, process, or a combination thereof, etc. one or more messages, packets, signals, waves, voltage or current levels, some combination thereof, or so forth. The communication network (106) may also include, by way of example but not limitation, one or more of a wireless network, a wired network, an internet, an intranet, a public network, a private network, a packet-switched network, a circuit-switched network, an ad hoc network, an infrastructure network, a Public-Switched Telephone Network (PSTN), a cable network, a cellular network, a satellite network, a fiber optic network, a VOIP or some combination thereof.
[0041] One or more network elements can be, for example, but not limited to a base station that is located in the fixed or stationary part of the communication network (106). The base station may correspond to a remote radio head, a transmission point, an access point or access node, a macro cell, a small cell, a micro cell, a femto cell, a metro cell. The base station enables transmission of radio signals to the UE (102) or a mobile transceiver. Such a radio signal may comply with radio signals as, for example, standardized by a 3rd Generation Partnership Project (3GPP) or, generally, in line with one or more of the above listed systems. Thus, a base station may correspond to a NodeB, an eNodeB, a Base Transceiver Station (BTS), an access point, a remote radio head, a transmission point, which may be further divided into a remote unit and a central unit. The 3GPP specifications cover cellular telecommunications technologies, including radio access, core network, and service capabilities, which provide a complete system description for mobile telecommunications.
[0042] The system (108) is communicatively coupled to a server (104) via the communication network (106). The server (104) can be, for example, but not limited to a standalone server, a server blade, a server rack, an application server, a bank of servers, a business telephony application server (BTAS), a server farm, a cloud server, an edge server, home server, a virtualized server, one or more processors executing code to function as a server, or the like. In an implementation, the server (104) may operate at various entities or a single entity (include, but is not limited to, a vendor side, a service provider side, a network operator side, a company side, an organization side, a university side, a lab facility side, a business enterprise side, a defense facility side, or any other facility) that provides service.
[0043] The environment (100) further includes the system (108) communicably coupled to the server (e.g., remote server or the like) (104) and each UE of the plurality of UEs (102) via the communication network (106). The remote server (104) is configured to execute the requests in the communication network (106).
[0044] The system (108) is adapted to be embedded within the remote server (104) or is embedded as an individual entity. The system (108) is designed to provide a centralized and unified view of data and facilitate efficient business operations. The system (108) is authorized to access to update/create/delete one or more parameters of their relationship between the requests for deploying the LLM, which gets reflected in real-time independent of the complexity of network.
[0045] In another embodiment, the system (108) may include an enterprise provisioning server (for example), which may connect with the remote server (104). The enterprise provisioning server provides flexibility for enterprises, ecommerce, finance to update/create/delete information related to the requests for deploying the LLM in real time as per their business needs. A user with administrator rights can access and retrieve the requests for the deploying the LLM and perform real-time analysis in the system (108).
[0046] The system (108) may include, by way of example but not limitation, one or more of a standalone server, a server blade, a server rack, a bank of servers, a business telephony application server (BTAS), a server farm, hardware supporting a part of a cloud service or system, a home server, hardware running a virtualized server, one or more processors executing code to function as a server, one or more machines performing server-side functionality as described herein, at least a portion of any of the above, some combination thereof. In an implementation, system (108) may operate at various entities or single entity (for example include, but is not limited to, a vendor side, service provider side, a network operator side, a company side, an organization side, a university side, a lab facility side, a business enterprise side, ecommerce side, finance side, a defense facility side, or any other facility) that provides service.
[0047] However, for the purpose of description, the system (108) is described as an integral part of the remote server (104), without deviating from the scope of the present disclosure. Operational and construction features of the system (108) will be explained in detail with respect to the following figures.
[0048] FIG. 2 illustrates a block diagram of the system (108) provided for deploying the LLM on the one or more edge devices, according to one or more embodiments of the present invention. As per the illustrated embodiment, the system (108) includes the one or more processors (202), a memory (204), a user interface (206), a display (208), an input device (210), and the database (214). Further the system (108) may comprise one or more processors (202). The one or more processors (202), hereinafter referred to as the processor (202) may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, single board computers, and/or any devices that manipulate signals based on operational instructions. As per the illustrated embodiment, the system (108) includes one processor. However, it is to be noted that the system (108) may include multiple processors as per the requirement and without deviating from the scope of the present disclosure.
[0049] Information related to deploying the LLM may be provided or stored in the memory (204) of the system (108). Among other capabilities, the processor (202) is configured to fetch and execute computer-readable instructions stored in the memory (204). The memory (204) may be configured to store one or more computer-readable instructions or routines in a non-transitory computer-readable storage medium, which may be fetched and executed to create or share data packets over a network service. The memory (204) may include any non-transitory storage device including, for example, volatile memory such as RAM, or non-volatile memory such as disk memory, EPROMs, FLASH memory, unalterable memory, and the like.
[0050] The memory (204) may comprise any non-transitory storage device including, for example, volatile memory such as Random-Access Memory (RAM), or non-volatile memory such as Electrically Erasable Programmable Read-only Memory (EPROM), flash memory, and the like. In an embodiment, the system (108) may include an interface(s). The interface(s) may comprise a variety of interfaces, for example, interfaces for data input and output devices, referred to as input/output (I/O) devices, storage devices, and the like. The interface(s) may facilitate communication for the system. The interface(s) may also provide a communication pathway for one or more components of the system. Examples of such components include, but are not limited to, processing unit/engine(s) and the database (214). The processing unit/engine(s) may be implemented as a combination of hardware and programming (for example, programmable instructions) to implement one or more functionalities of the processing engine(s).
[0051] The information related to deploy the LLM may be rendered on the user interface (206). The user interface (206) may include functionality similar to at least a portion of functionality implemented by one or more computer system interfaces such as those described herein and/or generally known to one having ordinary skill in the art. The user interface (206) may be rendered on the display (208), implemented using Liquid Crystal Display (LCD) display technology, Organic Light-Emitting Diode (OLED) display technology, and/or other types of conventional display technology. The display (208) may be integrated within the system (108) or connected externally. Further the input device(s) (210) may include, but not limited to, keyboard, buttons, scroll wheels, cursors, touchscreen sensors, audio command interfaces, magnetic strip reader, optical scanner, etc.
[0052] The database (214) may be communicably connected to the processor (202) and the memory (204). The database (214) may be configured to store and retrieve the request pertaining to features, or services or workflow of the system (108), access rights, attributes, approved list, and authentication data provided by an administrator. In another embodiment, the database (214) may be outside the system (108) and communicated through a wired medium and a wireless medium.
[0053] Further, the processor (202), in an embodiment, may be implemented as a combination of hardware and programming (for example, programmable instructions) to implement one or more functionalities of the processor (202). In the examples described herein, such combinations of hardware and programming may be implemented in several different ways. For example, the programming for the processor (202) may be processor-executable instructions stored on a non-transitory machine-readable storage medium and the hardware for the processor (202) may comprise a processing resource (for example, one or more processors), to execute such instructions. In the present examples, the memory (204) may store instructions that, when executed by the processing resource, implement the processor (202). In such examples, the system (108) may comprise the memory (204) storing the instructions and the processing resource to execute the instructions, or the memory (204) may be separate but accessible to the system (108) and the processing resource. In other examples, the processor (202) may be implemented by an electronic circuitry.
[0054] In order for the system (108) to deploy the LLM on the one or more edge devices, the processor (202) includes a selecting unit (216), a collecting unit (218), a training unit (220), a compressing engine (222), a deploying unit (224), and a synchronizing unit (226). The selecting unit (216), the collecting unit (218), the training unit (220), the compressing engine (222), the deploying unit (224), and the synchronizing unit (226) may be implemented as a combination of hardware and programming (for example, programmable instructions) to implement one or more functionalities of the processor (202). In the examples described herein, such combinations of hardware and programming may be implemented in several different ways. For example, the programming for the processor (202) may be processor-executable instructions stored on a non-transitory machine-readable storage medium and the hardware for the processor (202) may comprise a processing resource (for example, one or more processors), to execute such instructions. In the present examples, the memory (204) may store instructions that, when executed by the processing resource, implement the processor. In such examples, the system (108) may comprise the memory (204) storing the instructions and the processing resource to execute the instructions, or the memory (204) may be separate but accessible to the system (108) and the processing resource. In other examples, the processor (202) may be implemented by the electronic circuitry.
[0055] In order for the system (108) to deploy the LLM on the one or more edge devices, the selecting unit (216), the collecting unit (218), the training unit (220), the compressing engine (222), the deploying unit (224), and the synchronizing unit (226) are communicably coupled to each other. The selecting unit (216) selects at least one of, the server (104) (e.g., edge server or the like) and one or more resources (e.g., memory, CPU, bandwidth or the like) from a server pool (aka “server bundle” (404) as shown in FIG. 4). The server (102) and the one or more resources are selected based on receiving the input from the user via the user interface (206) pertaining to selection of the at least one of, the server and the one or more resources. In an example, a consumer or a service provider needs to select or set up the server (104) that may act as the edge server. The server (104) may have the capability to receive model updates from the edge devices (102), aggregate the data, and then perform model training or fine-tuning.
[0056] The collecting unit (218) collects textual data from one or more data sources (410) (e.g., website, online source, chat message or the like) upon selecting the at least one of, the server (104) and the associated one or more resources. The collecting unit (218) pre-processes the collected textual data using the selected at least one of, the server (104) and the one or more resources. In an example, once a new server bundle is ready, a domain specific textual data from the edge devices (102) are collected in large amounts and pre-processed to make data training ready. The textual data may be collected from multiple sources (410). The textual data may include various special characters, stop words that may be unsuitable for training. Therefore, the system (108) is converting raw input text into suitable format for training. For example, let us assume the use case of the chat bot for a customer service deployed on the server (104) and one of the servers is the edge server. So, in real time, there may be many customers interacting with the chat bot. The lot of textual data may be generated and that data is fed to the system (108). The pre-processing of the data may be performed by the system (108). The collecting unit (218) is configured to convert all data formats to text format and then perform required training. For example, the formats such as PDFs, DOCs, including large data may be converted into the suitable text format in order to perform the training. The data sources with images, tables etc. are extracted separately and a link for the data is generated and stored in the database (214) ensuring data points are referring to a particular image, tables etc.
[0057] The training unit (220) pre-trains the LLM with a corpus of textual data to enable the LLM to learn general language representation. In an example, by using the training unit (220), a model initialization pre training may be a transfer learning process. The LLM model may be chosen by the consumer and may initialize LLM with pre-trained weights if available. The pre-trained weights are set by the customer. The pre-training on the large corpus of text data facilitates the LLM model to learn general language representations, which can be fine-tuned for specific tasks. For example, multiple standard models may be already trained utilizing large corpus of text data. These trained models may be utilized depending on the requirements of the use cases. Here, the initial training of the LLM may be avoided and the LLM may undergo a quick and less compute intensive fine-tuning process.
[0058] In an example, the large corpus of text data is gathered, which includes diverse sources such as books, articles, and websites. This corpus contains a wide variety of topics, styles, and formats to ensure comprehensive language representation. The consumer selects a specific LLM architecture (e.g., GPT, BERT) and initializes the model with pre-trained weights. These weights could be from a previously trained model on similar data, allowing the new model to leverage existing knowledge. The training unit (220) processes the large corpus, using techniques like masked language modelling or next-token prediction. For instance, in masked language modelling, some words in a sentence are masked, and the model learns to predict them based on surrounding context. During this phase, the model learns general language patterns, grammar, semantics, and common knowledge across various topics. After pre-training, the LLM has a set of weights that captures a rich understanding of language. These pre-trained weights are stored and can be accessed by the consumer for further fine-tuning.
[0059] Further, the training unit (220) trains the pre-trained model with the collected textual data using at least one of, the server (104) and the one or more resources. In an embodiment, the training unit (220) trains the pre-trained model with the collected textual data, by fine tuning, the pre-trained model to perform one or more tasks at an edge level. The one or more tasks pertain to at least one of, text classification, language modelling, named entity recognition (NER), part of speech tagging and text summarization.
[0060] In an example, the consumer has a specific task, such as text classification for sentiment analysis. They gather a labelled dataset containing sentences labelled as positive, negative, or neutral. The consumer initializes the LLM with the pre-trained weights from the previous operations. The model is then fine-tuned on the sentiment analysis dataset, adjusting its weights based on this specific task. This involves training the model to minimize the error in predicting sentiment labels.
[0061] After fine-tuning, the LLM is now specialized in sentiment analysis. It can accurately classify new sentences as positive, negative, or neutral based on the general language understanding it acquired during pre-training.
[0062] The compressing engine (222) compresses the trained model. In an embodiment, the textual data may be large in size and maybe in PDFs, DOCs, or TXTs formats. The trained LLM model utilizing the textual data is compressed using techniques like model quantization in order to reduce the size of the models. This is due to the limited storage and memory in edge devices. For example, techniques like model compression, quantization, and knowledge distillation are applied to create smaller, more lightweight versions of LLMs which further ensure their smooth edge deployment.
[0063] The deploying unit (224) deploys the compressed trained model onto one or more edge devices (102). In an embodiment, the LLM model is then deployed locally on individual edge devices (102) or the edge servers (104).
[0064] Further, the synchronizing unit (226) synchronizes the one or more trained models deployed onto the one or more edge devices with a centralized system (e.g., central server) (not shown). Further, the synchronizing unit (226) updates the one or more trained models with updated historic data which is retrieved from the centralized system. These local models can be periodically synchronized with the central server or with each other in order to be updated with the model characteristics and improvise the performance for precise predictions. The example for deploying the LLM on the one or more edge devices (102) is explained in FIG. 4 to FIG. 6.
[0065] FIG. 3 is an example schematic representation of the system (300) of FIG. 1 in which various entities operations are explained, according to various embodiments of the present system. It is to be noted that the embodiment with respect to FIG. 3 will be explained with respect to the first UE (102-1) and the system (108) for the purpose of description and illustration and should nowhere be construed as limited to the scope of the present disclosure.
[0066] As mentioned earlier, the first UE (102-1) includes one or more primary processors (305) communicably coupled to the one or more processors (202) of the system (108). The one or more primary processors (305) are coupled with a memory (310) storing instructions which are executed by the one or more primary processors (305). Execution of the stored instructions by the one or more primary processors (305) enables the UE (102-1). The execution of the stored instructions by the one or more primary processors (305) further enables the UE (102-1) to execute the requests in the communication network (106).
[0067] As mentioned earlier, the one or more processors (202) is configured to transmit a response content related to deploy the LLM to the UE (102-1). More specifically, the one or more processors (202) of the system (108) is configured to transmit the response content to at least one of the UE (102-1). A kernel (315) is a core component serving as the primary interface between hardware components of the UE (102-1) and the system (108). The kernel (315) is configured to provide the plurality of response contents hosted on the system (108) to access resources available in the communication network (106). The resources include one of a Central Processing Unit (CPU), memory components such as Random Access Memory (RAM) and Read Only Memory (ROM).
[0068] As per the illustrated embodiment, the system (108) includes the one or more processors (202), the memory (204), the user interface (206), the display (208), and the input device (210). The operations and functions of the one or more processors (202), the memory (204), the user interface (206), the display (208), and the input device (210) are already explained in FIG. 2. For the sake of brevity, we are not explaining the same operations (or repeated information) in the patent disclosure. Further, the processor (202) includes the selecting unit (216), the collecting unit (218), the training unit (220), the compressing engine (222), the deploying unit (224), and the synchronizing unit (226). The operations and functions of the selecting unit (216), the collecting unit (218), the training unit (220), the compressing engine (222), the deploying unit (224), and the synchronizing unit (226) are already explained in FIG. 2. For the sake of brevity, we are not explaining the same operations (or repeated information) in the patent disclosure.
[0069] FIG. 4 illustrates a system architecture (400) for deploying the LLM on the one or more edge devices (102), in accordance with some embodiments. The system architecture (400) includes the edge server (104) that is one of a third party/network function (NF) cluster/user server that is at the edge of the network. The edge server (104) usually has limited storage and memory resource which is why the trained models are compressed and deployed on these servers. The system architecture (400) having a cluster (402) includes a server bundle (404) (aka server pool) consisting of multiple resources that can be commissioned and decommissioned based on the workload demand. The pre-processing unit (408) cleans the data such as removing stop words, handling special characters and then pre-processes by using techniques like tokenization. This results in converting the raw input text into suitable format for training. A model initialization unit (406) initializes the LLM model with the pre-trained weights if available. The pre-trained weights are set by the user. The pre-training on the large corpus of text data helps the model learn general language representations, which can be fine-tuned for specific tasks. The training unit (220) is responsible for the edge level training and then deployment of the customized trained models on the edge devices (102). The training unit (220) internally performs past data pre-processing, feature selection, hyper parameter configuration, train test split and then finally model training. The compression engine (222) compresses the trained LLM model using techniques like model quantization and pruning.
[0070] In an example, a company wants to develop a text classification model to categorize customer feedback into "positive," "neutral," and "negative" sentiments. The company collects a dataset of customer feedback, which includes text reviews and their associated sentiment labels. For example, the data are [“Great service!" ? Positive, "It was okay." ? Neutral, and "Terrible experience." ? Negative]. The training unit (220) performs the following pre-processing steps such as text cleaning, tokenization, stop word removal, and stemming/lemmatization. The text cleaning removes punctuation, special characters, and numbers. The text cleaning converts all text to lowercase to ensure uniformity. The tokenization splits the cleaned text into individual words or tokens. The stop word removal removes common words (e.g., "the," "is," "at") that do not contribute significantly. The stemming/lemmatization reduces the words to their base or root form (e.g., "running" ? "run").
[0071] Further, the training unit (220) identifies the most relevant features that contribute to sentiment classification. This can involve removing low-variance features or applying techniques like Chi-square tests to select the best features. Further, the training unit (220) configures hyperparameters for the chosen model (e.g., learning rate, batch size, number of epochs). Options like grid search or random search can be used to find the optimal hyperparameter values that improve model performance.
[0072] Further, the training unit (220) splits the dataset into training and testing sets, typically using an 80/20 or 70/30 ratio. Further, the training unit (220) trains the selected model using the training set and tuned hyperparameters. The model learns to classify the text based on the provided sentiment labels.
[0073] FIG. 5 is an exemplary flow diagram (500) illustrating the method for deploying the LLM on the one or more edge devices (102), according to various embodiments of the present disclosure.
[0074] At 502, the method includes selecting the at least one of, the server (104) and the one or more resources from the server pool (404). In an embodiment, the method allows the selecting unit (216) to select at least one of the server (104) and the one or more resources from the server pool.
[0075] At 504, the method includes collecting the textual data from one or more data sources upon selecting the at least one of, the server and the one or more resources. In an embodiment, the method allows the collecting unit (218) to collect textual data from the one or more data sources (410) upon selecting the at least one of, the server (104) and the one or more resources.
[0076] At 506, the method includes pre-training the LLM with the corpus of textual data to enable the LLM to learn general language representation. In an embodiment, the method allows the training unit (220) to pre-train the LLM with the corpus of textual data to enable the LLM to learn general language representation.
[0077] At 508, the method includes training the pre-trained model with the collected textual data using at least one of, the server and the one or more resources. In an embodiment, the method allows the training unit (220) to train the pre-trained model with the collected textual data using at least one of, the server (104) and the associated one or more resources.
[0078] At 510, the method includes compressing the trained model. In an embodiment, the method allows the compressing engine (222) to compress the trained model.
[0079] At 512, the method includes deploying the compressed trained model onto one or more edge devices. In an embodiment, the method allows the deploying unit (224) deploy the compressed trained model onto one or more edge devices (102).
[0080] FIG. 6 is a flow diagram (600) illustrating an internal call flow for deploying the LLM on the one or more edge devices (102), in accordance with some embodiments.
[0081] At 602, the method includes selecting the edge server. In an embodiment, the consumer needs to select or set up a server that may act as the edge server. The server may have the capability to receive model updates from the edge devices, aggregate the data, and then perform model training or fine-tuning.
[0082] At 604, the method includes performing the security and authentication. In an embodiment, the security and the authentication protocols may be added by the system (108). In an example, Encryption, authentication, and access control mechanisms may be provided to secure communications between the edge devices (102) and the edge server (104). This ensures the integrity and confidentiality of data and model updates.
[0083] At 606, the method includes collecting the textual data and performing pre-processing. In an embodiment, once the new server bundle is ready, a domain specific textual data from the edge devices (102) are collected in large amounts and pre-processed to make data training ready. The textual data may be collected from multiple sources (410). The textual data may include various special characters, stop words that may be unsuitable for training. Therefore, the system (108) is converting raw input text into suitable format for training. For example, let us assume the use case of the chat bot for customer service deployed on the server (104) and one of the servers is the edge server. So, in real time, there may be many customers interacting with the chat bot. The lot of textual data may be generated, and that data is fed to the system (108). The pre-processing of the data may be performed by the system (108). The system (108) is configured to convert all data formats to text formats and then perform required training. For example, the formats such as PDFs and DOCs, including large data may be converted into the suitable text format in order to perform the training. The data sources with images, tables etc. are extracted separately and a link for the data is generated and stored in the database (214) ensuring data points are referring to a particular image, tables etc.
[0084] At step 608, the model initialization is performed. In an embodiment, model initialization may be the transfer learning process. The LLM model may be chosen by the consumer and may initialize LLM with pre-trained weights if available. The pre-training on the large corpus of text data facilitates the LLM model to learn general language representations, which can be fine-tuned for specific tasks. For example, multiple standard models may be already trained utilizing large corpus of text data. These trained models may be utilized depending on the requirements of the use cases. Here, the initial training of the LLM may be avoided and the LLM may undergo a quick and less computing intensive fine-tuning process.
[0085] At step 610, the method includes performing edge level training. In an embodiment, the model is trained with preprocessed data using the computational resources of customized server bundle (including edge server). For example, let us assume the use case of the chat bot for customer service, as the new textual data may be added to the chat, the model may be trained accordingly.
[0086] At step 612, the model compression is performed. In an embodiment, the textual data may be large in size and maybe in PDFs, DOCs, or TXTs formats. The trained LLM model utilizing the textual data is compressed using techniques like model quantization in order to reduce the size of the models. This is due to the limited storage and memory in edge devices. For example, techniques like model compression, quantization, and knowledge distillation are applied to create smaller, more lightweight versions of LLMs which further ensure their smooth edge deployment.
[0087] At step 614, the method includes deploying model locally. In an embodiment, the LLM model is then deployed locally on individual edge devices (102) or the edge servers (104). These local models can be periodically synchronized with the central server or with each other in order to be updated with the model characteristics and improvise the performance for precise predictions.
[0088] Below is the technical advancement of the present invention:
[0089] Based on the proposed method, in order to perform efficient training, the data retrieved from various sources are converted to suitable textual format and then trained. The trained model is compressed using the techniques like model quantization in order to reduce the size of the models. This is done due to the limited storage and memory in the edge devices (102). This enables the system to analyze large files like textual files and generate human-like text when questioned around trained information. Techniques like model compression, quantization, and knowledge distillation are applied to create smaller, more lightweight versions of LLMs which further ensure their smooth edge deployment. The edge-trained LLMs provide real-time or near-real-time responses because data processing is done locally on the device. The training LLMs at the edge device (102) allows sensitive data to remain on the local device, reducing the need to transmit it over the internet or to a cloud server. This greatly enhances data privacy and security. Personal information, confidential data, or proprietary information can be kept within the confines of the edge device. The proposed method can be implemented for personal recommendations on mobile devices, real-time language translation, chat boots for customer service or the like.
[0090] A person of ordinary skill in the art will readily ascertain that the illustrated embodiments and steps in description and drawings (FIGS. 1-6) are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope and spirit of the disclosed embodiments.
[0091] Method steps: A person of ordinary skill in the art will readily ascertain that the illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope and spirit of the disclosed embodiments.
[0092] The present invention offers multiple advantages over the prior art and the above list are a few examples to emphasize on some of the advantageous features. The listed advantages are to be read in a non-limiting manner.
REFERENCE NUMERALS
[0093] Environment - 100
[0094] UEs– 102, 102-1-102-n
[0095] Server - 104
[0096] Communication network – 106
[0097] System – 108
[0098] Processor – 202
[0099] Memory – 204
[00100] User Interface – 206
[00101] Display – 208
[00102] Input device – 210
[00103] Database – 214
[00104] Selecting unit – 216
[00105] Collecting unit – 218
[00106] Training unit – 220
[00107] Compressing engine - 222
[00108] Deploying unit – 224
[00109] Synchronizing unit – 226
[00110] System - 300
[00111] Primary processors -305
[00112] Memory– 310
[00113] Kernel– 315
[00114] System architecture - 400
[00115] Cluster – 402
[00116] Server bundle – 404
[00117] Model initialization unit – 406
[00118] Pre-processing unit – 408
[00119] Data source - 410
,CLAIMS:CLAIMS:
We Claim:
1. A method for deploying a Large Language Model (LLM) on one or more edge devices (102), the method comprising the steps of:
selecting, by one or more processors (202), at least one of, a server (104) and associated one or more resources from a server pool;
collecting, by the one or more processors (202), textual data from one or more data sources (410) upon selecting the at least one of, the server (104) and the associated one or more resources;
pre-training, by the one or more processors (202), the LLM with a corpus of textual data to enable the LLM to learn general language representation;
training, by the one or more processors (202), the pre-trained model with the collected textual data using at least one of, the server (104) and the associated one or more resources;
compressing, by the one or more processors (202), the trained model; and
deploying, by the one or more processors (202), the compressed trained model onto one or more edge devices (102).
2. The method as claimed in claim 1, wherein the at least one of, the server (104) and the associated one or more resources are selected by the one or more processors (202), based on receiving an input from a user via a user interface (106) pertaining to selection of the at least one of, the server (104) and the associated one or more resources.
3. The method as claimed in claim 1, wherein the server (104) is at least one of, an edge server.
4. The method as claimed in claim 1, wherein the step of, training, the pre-trained model with the collected textual data, includes the step of:
fine tuning, by the one or more processors (202), the pre-trained model to perform one or more tasks at an edge level, wherein the one or more tasks pertain to at least one of, text classification, language modelling, named entity recognition (NER), part of speech tagging and text summarization.
5. The method as claimed in claim 1, wherein the step of, collecting, textual data from one or more data sources, further includes the step of:
pre-processing, by the one or more processors (202), the collected textual data using the selected at least one of, the server (104) and the associated one or more resources.
6. The method as claimed in claim 1, wherein the method further comprising the steps of:
synchronizing, by the one or more processors (202), the one or more trained models deployed onto the one or more edge devices (102) with a centralized system; and
updating, by the one or more processors (202), the one or more trained models with updated historic data which is retrieved from the centralized system.
7. A system (108) for deploying a Large Language Model (LLM) on one or more edge devices (102), the system (108) comprising:
a selecting unit (216), configured to, select, at least one of, a server (104) and associated one or more resources from a server pool;
a collecting unit (218), configured to, collect, textual data from one or more data sources (410) upon selecting the at least one of, the server (104) and the associated one or more resources;
a training unit (220), configured to, pre-train, the LLM with a corpus of textual data to enable the LLM to learn general language representation;
the training unit (220), configured to, train, the pre-trained model with the collected textual data using at least one of, the server (104) and the associated one or more resources;
a compressing engine (222), configured to, compress, the trained model; and
a deploying unit (224), configured to, deploy, the compressed trained model onto one or more edge devices (102).
8. The system (108) as claimed in claim 7, wherein the at least one of, the server (104) and the associated one or more resources are selected based on receiving an input from a user via a user interface (206) pertaining to selection of the at least one of, the server (104) and the associated one or more resources.
9. The system (108) as claimed in claim 7, wherein the server (104) is at least one of, an edge server.
10. The system (108) as claimed in claim 7, wherein the training unit (220), trains, the pre-trained model with the collected textual data, by:
fine tuning, the pre-trained model to perform one or more tasks at an edge level, wherein the one or more tasks pertain to at least one of, text classification, language modelling, named entity recognition (NER), part of speech tagging and text summarization.
11. The system (108) as claimed in claim 7, wherein the collecting unit is further configured to:
pre-process, the collected textual data using the selected at least one of, the server and the associated one or more resources.
12. The system (108) as claimed in claim 7, wherein a synchronizing unit (226) is configured to:
synchronize, the one or more trained models deployed onto the one or more edge devices with a centralized system; and
update, the one or more trained models with updated historic data which is retrieved from the centralized system.
| # | Name | Date |
|---|---|---|
| 1 | 202321068467-STATEMENT OF UNDERTAKING (FORM 3) [11-10-2023(online)].pdf | 2023-10-11 |
| 2 | 202321068467-PROVISIONAL SPECIFICATION [11-10-2023(online)].pdf | 2023-10-11 |
| 3 | 202321068467-FORM 1 [11-10-2023(online)].pdf | 2023-10-11 |
| 4 | 202321068467-FIGURE OF ABSTRACT [11-10-2023(online)].pdf | 2023-10-11 |
| 5 | 202321068467-DRAWINGS [11-10-2023(online)].pdf | 2023-10-11 |
| 6 | 202321068467-DECLARATION OF INVENTORSHIP (FORM 5) [11-10-2023(online)].pdf | 2023-10-11 |
| 7 | 202321068467-FORM-26 [27-11-2023(online)].pdf | 2023-11-27 |
| 8 | 202321068467-Proof of Right [12-02-2024(online)].pdf | 2024-02-12 |
| 9 | 202321068467-DRAWING [11-10-2024(online)].pdf | 2024-10-11 |
| 10 | 202321068467-COMPLETE SPECIFICATION [11-10-2024(online)].pdf | 2024-10-11 |
| 11 | Abstract.jpg | 2025-01-06 |
| 12 | 202321068467-Power of Attorney [24-01-2025(online)].pdf | 2025-01-24 |
| 13 | 202321068467-Form 1 (Submitted on date of filing) [24-01-2025(online)].pdf | 2025-01-24 |
| 14 | 202321068467-Covering Letter [24-01-2025(online)].pdf | 2025-01-24 |
| 15 | 202321068467-CERTIFIED COPIES TRANSMISSION TO IB [24-01-2025(online)].pdf | 2025-01-24 |
| 16 | 202321068467-FORM 3 [31-01-2025(online)].pdf | 2025-01-31 |