Sign In to Follow Application
View All Documents & Correspondence

A System And Method To Optimize Hardware Resources For Pods In A Container Orchestration Platform

Abstract: Disclosed is a method (500) for optimizing resource utilization for a plurality of pods (103A-N) in a plurality of nodes (102A-N) of a container orchestration platform (100). The method (500) includes receiving (502) a resource utilization data from each of the plurality of pods (103A-N) of the plurality of nodes (102A-N) and pre-processing (504) the received resource utilization data. The method includes classifying the pre-processed data into a training data set and a testing data set. The method further includes training (506) Long Short-Term Memory (LSTM)-Temporal Convolutional Networks (TCN) hybrid machine learning model based on the training data set. The method further includes predicting (508) an optimize resource utilization for the plurality of pods (103A-N) using the trained LSTM -TCN hybrid machine learning model.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
28 March 2024
Publication Number
40/2025
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
Parent Application

Applicants

BHARAT ELECTRONICS LIMITED
Outer Ring Road, Nagavara, Bangalore 560045, Karnataka, India

Inventors

1. Vanambathina Raj Kumar
Central Research Laboratory, Bharat Electronics Limited, Jalahalli P.O., Bangalore 560013, Karnataka, India
2. Nitin Kumar
Central Research Laboratory, Bharat Electronics Limited, Jalahalli P.O., Bangalore 560013, Karnataka, India
3. Manjunatha D R
Central Research Laboratory, Bharat Electronics Limited, Jalahalli P.O., Bangalore 560013, Karnataka, India
4. Rajashekhar N H
Central Research Laboratory, Bharat Electronics Limited, Jalahalli P.O., Bangalore 560013, Karnataka, India

Specification

DESC:TECHNICAL FIELD
[0001] The present disclosure relates to a system and method to optimize hardware resources for pods in a Container Orchestration Platform using AI Methods.
BACKGROUND
[0002] Microservice architecture is a type of application architecture where the application is developed as a collection of independent services. These independent services are called microservices. Microservices are bundled together with their dependencies and are run using container technology using any container platform. Docker is one of the popular existing platforms used for creating, migrating, managing and deploying containers. To manage multiple containers a container orchestration platform such as kubernetes or docker swarm is used. Smallest deployable unit in a container orchestration platform is called as a Pod. Pods contain one or more containers and configuration on how to run them.

[0003] A docker container unlike a virtual machine does not require or utilize a separate OS (Operating System). Rather, the container relies on the kernel’s functionality and uses hardware resources (e.g., central processing unit (CPU), memory, I/O, network bandwidth). Each independent docker container utilizes different amount of hardware resources during its execution. The resources for the containers inside the pod may be over provisioned or under provisioned which can result into resource wastage and can degrade the performance of the pods. So, a method to predict the future resource requirements for the pods is needed.
[0004] US10719363B2 discloses the technique which includes resource utilization data associated with at least one container may be obtained for a period. A set of forecasting models may be trained based on the resource utilization data associated and Resource utilization of the at least one container may be predicted for a remaining portion of the period. The predicted resource utilization may be compared with the obtained resource utilization data. A forecasting model may be determined from the set of trained forecasting models based on the comparison to optimize resource claims for the at least one container.
[0005] CN114490049A provides a method and a system for automatically allocating resources in containerized edge computing, which comprise the following steps: step 1: the monitor collects data and issues container resource utilization rate and application program performance state statistical information on the message broker; step 2: the analysis planner receives information sent by the monitor and obtained through the message agent, establishes a model through a machine learning mode, and generates extended operation based on model reasoning; step 3: and the executor receives the data transmitted by the analysis planner to generate an extended instruction for allocating resource.
[0006] Therefore, is there felt for a need for an invention which can optimize hardware resources for pods in a Container Orchestration Platform using AI Methods.
SUMMARY
[0007] This summary is provided to introduce a selection of concepts, in a simplified format, that are further described in the detailed description of the invention. This summary is neither intended to identify key or essential inventive concepts of the invention nor is it intended for determining the scope of the invention.
[0008] According to an embodiment of the present disclosure, disclosed herein is a method for optimizing resource utilization for a plurality of pods in a plurality of nodes of a container orchestration platform. The method includes receiving resource utilization data from each of the plurality of pods of the plurality of nodes and pre-processing the received resource utilization data. The method further includes classifying the pre-processed data into a training data and a testing data. The method also includes training Long Short-Term Memory (LSTM) -Temporal Convolutional Networks (TCN) hybrid machine learning model based on the training data. Thereafter the method includes predicting an optimize resource utilization for the plurality of pods using the trained LSTM -TCN hybrid machine learning model.
[0009] According to an embodiment of the present disclosure, disclosed herein is a system for optimizing resource utilization in a plurality of pods of a plurality of nodes. The system includes a memory and at least one processor coupled to the memory. The at least one processor is configured to receive resource utilization data from each of the plurality of pods of the at least one node and to pre-process the received resource utilization data. The at least one processor is also configured classify the pre-processed data into a training data and a testing data. The at least one processor is further configured to train a Long Short-Term Memory (LSTM) -Temporal Convolutional Networks (TCN) hybrid machine learning model based on the training data. Thereafter, the at least one processor is also configured to predict an optimized resource utilization for the plurality of pods through the trained LSTM -TCN hybrid machine learning model.
[0010] To further clarify the advantages and features of the present invention, a more particular description of the invention will be rendered by reference to specific embodiments thereof, which is illustrated in the appended drawing. It is appreciated that these drawings depict only typical embodiments of the invention and are therefore not to be considered limiting its scope. The invention will be described and explained with additional specificity and detail with the accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
[0011] The detailed description is described with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same numbers are used throughout the drawings to reference like features and modules.
[0012] Figure 1 illustrates architecture of a container orchestration platform having a plurality of nodes, in accordance with an exemplary implementation of the present disclosure;
[0013] Figure 2 illustrates a schematic block diagram depicting an operational flow of resource utilization data from the container orchestration platform to a system for hardware resource optimization, in accordance with an exemplary implementation of the present disclosure;
[0014] Figure 3 illustrates a schematic block diagram of a system for optimizing resource utilization in a plurality of pods in the container orchestration platform, in accordance with an exemplary implementation of the present disclosure;
[0015] Figure 4 illustrates schematic block diagram depicting LSTM-TCN hybrid Model implementation in the system, in accordance with an exemplary implementation of the present disclosure; and
[0016] Figure 5 illustrates a flowchart of a method for optimizing resource utilization in a plurality of pods in the container orchestration platform, in accordance with an exemplary implementation of the present disclosure.
[0017] Further, skilled artisans will appreciate that elements in the drawings are illustrated for simplicity and may not have necessarily been drawn to scale. For example, the flow charts illustrate the method in terms of the most prominent steps involved to help improve understanding of aspects of the present invention. Furthermore, in terms of the construction of the device, one or more components of the device may have been represented in the drawings by conventional symbols, and the drawings may show only those specific details that are pertinent to understanding the embodiments of the present invention so as not to obscure the drawings with details that will be readily apparent to those of ordinary skill in the art having the benefit of the description herein.
DETAILED DESCRIPTION
[0018] For the purpose of promoting an understanding of the principles of the present disclosure, reference will now be made to the various embodiments and specific language will be used to describe the same. It will nevertheless be understood that no limitation of the scope of the present disclosure is thereby intended, such alterations and further modifications in the illustrated system, and such further applications of the principles of the present disclosure as illustrated therein being contemplated as would normally occur to one skilled in the art to which the present disclosure relates.
[0019] It will be understood by those skilled in the art that the foregoing general description and the following detailed description are explanatory of the present disclosure and are not intended to be restrictive thereof.
[0020] Whether or not a certain feature or element was limited to being used only once, it may still be referred to as “one or more features” or “one or more elements” or “at least one feature” or “at least one element.” Furthermore, the use of the terms “one or more” or “at least one” feature or element do not preclude there being none of that feature or element, unless otherwise specified by limiting language including, but not limited to, “there needs to be one or more…” or “one or more elements is required.”
[0021] Reference is made herein to some “embodiments.” It should be understood that an embodiment is an example of a possible implementation of any features and/or elements of the present disclosure. Some embodiments have been described for the purpose of explaining one or more of the potential ways in which the specific features and/or elements of the proposed disclosure fulfil the requirements of uniqueness, utility, and non-obviousness.
[0022] Use of the phrases and/or terms including, but not limited to, “a first embodiment,” “a further embodiment,” “an alternate embodiment,” “one embodiment,” “an embodiment,” “multiple embodiments,” “some embodiments,” “other embodiments,” “further embodiment”, “furthermore embodiment”, “additional embodiment” or other variants thereof do not necessarily refer to the same embodiments. Unless otherwise specified, one or more particular features and/or elements described in connection with one or more embodiments may be found in one embodiment, or may be found in more than one embodiment, or may be found in all embodiments, or may be found in no embodiments. Although one or more features and/or elements may be described herein in the context of only a single embodiment, or in the context of more than one embodiment, or in the context of all embodiments, the features and/or elements may instead be provided separately or in any appropriate combination or not at all. Conversely, any features and/or elements described in the context of separate embodiments may alternatively be realized as existing together in the context of a single embodiment.
[0023] Any particular and all details set forth herein are used in the context of some embodiments and therefore should not necessarily be taken as limiting factors to the proposed disclosure.
[0024] The terms “comprises”, “comprising”, or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a process or method that comprises a list of steps does not include only those steps but may include other steps not expressly listed or inherent to such process or method. Similarly, one or more devices or sub-systems or elements or structures or components proceeded by “comprises... a” does not, without more constraints, preclude the existence of other devices or other sub-systems or other elements or other structures or other components or additional devices or additional sub-systems or additional elements or additional structures or additional components.
[0025] The various embodiments of the present disclosure relate to a system and a method to optimize hardware resources for a plurality of pods in a Container Orchestration Platform using Artificial Intelligence Methods. The implemented system uses Long Short-Term Memory (LSTM) fusion with Temporal Convolutional Networks (TCN) machine learning model to predict the optimized resource utilization for a plurality of pods in a container orchestration platform. The system continuously trains the LSTM-TCN hybrid machine learning model with resource utilization data collected through a monitoring pod running on the container orchestration platform at periodic intervals.
[0026] In the following description, for the purpose of explanation, specific details are set forth in order to provide an understanding of the present disclosure. It will be apparent, however, to one skilled in the art that the present disclosure may be practiced without these details. One skilled in the art will recognize that embodiments of the present disclosure, some of which are described below, may be incorporated into a number of systems.
[0027] However, the systems and methods are not limited to the specific embodiments described herein. Further, structures and devices shown in the figures are illustrative of exemplary embodiments of the present disclosure and are meant to avoid obscuring of the present disclosure.
[0028] The containers transformed application deployment by encapsulating applications and their dependencies as different application services like front-end services, back-end services, databases, message queue etc. can be encapsulated in separate containers. The application is divided into independent executable units called microservices and the containers play the role of deploying and executing each microservice. Each container runs independently, so it is easy to scale a particular service and update versions of each service in isolation.
[0029] However, as the number of containers increases, it becomes difficult to manage them manually, so a container orchestration platform is needed that automates the deployment, management, scaling, and networking of containers. The most used container orchestration platforms are kubernetes or docker swarm. The container orchestration platform includes concepts like pods, clusters, and services.
[0030] To deploy a pod, a Yet Another Markup Language (YAML) file describing specification of the pod is created. This YAML file contains hardware resource parameters like CPU, Memory and Storage which are required by the containers to execute. The pod resources are an aggregation of all the containers resources running inside a pod. For example, if a pod is assigned two containers where each container is specified a request of 0.5 CPU and 32MB of memory and a limit of 1 CPU and 64MB of memory. Then the total request for that pod is 1CPU and 64MB of memory, similarly, limit for that pod is 2 CPU and 128MB of memory.
[0031] The container orchestration platform uses this information to decide on which node to place the pod. For each container, it is required to specify requests and limits for the resources. The request is the minimum number of resources that a container can be guaranteed during its execution, and limit is maximum amount of resource that a container can acquire from the node on which the pod is running. If no request and limit is specified, the container orchestration platform will not reserve any specific number of resources for that pod, and the pod will potentially be allowed to use all available resources in that node. This can impact the performance of other co-located pods on the node itself. So, it is important that the parameters specified for the containers in the YAML file are accurate.
[0032] In the production environments, the pods resource request may be over provisioned. The pods may under-utilize the resources that may be originally requested. In one example, if a pod containing n number of containers has requested for 70% of CPU and 256 MB of memory, but the pod is only utilizing 40% of CPU and 64 MB of memory then the pod is considered to be over provisioned. Even if the resources are idle on a node, provisioning for new pods may be rejected on that node and resources will be exhausted substantially earlier than the actual capacity.
[0033] In other case, the pods resource limit may be under provisioned. If resource limits for containers in the pod are set too low, the containers may not get the necessary resources to perform efficiently. And if auto-scaling is not configured properly, the container orchestration platform may not correctly adjust resources based on demand. This can lead to under-provisioning during peak loads when additional resources are required.
[0034] So, to overcome these situations, a system and a method to optimize hardware resources utilization for a plurality of pods in a plurality of nodes of a Container Orchestration Platform, is described below in the forthcoming paragraphs with reference to the accompanying drawings.
[0035] The present disclosure focuses on implementing a machine learning-based system that can predict the optimized resource requirements for the plurality of pods based on the historical usage pattern of the respective pods. Machine learning can be leveraged to address both over-provisioning and under-provisioning issues in the container orchestration platform by providing intelligent and adaptive resource management.
[0036] The present system may use a smart monitoring pod 101 to collect the resource utilization data. Figure 1 illustrates architecture of a container orchestration platform having a plurality of nodes, in accordance with an exemplary implementation of the present disclosure. The smart monitoring pods may be integrated with the container orchestration platform 100. The smart monitoring pod 104A-N may be an independent pod that runs on each node 102A-N in the container orchestration cluster. A feeder client 105A-N a lightweight process runs on all pods running in the container orchestration cluster. The resource collection data process running in the smart monitoring pod 104A-N of a node 102A communicates with the feeder program on all pods of that node and collects the resource utilization data. These smart monitoring pods 104A-N running on all the nodes in turn will communicate with a pod smart monitoring pod for the master node 101 running on the master node. These pods deployed on all the nodes to reduce the network congestion in collecting the resource utilization data. And the collection of the resource utilization data may not become an overloading factor on the single smart monitoring pod running on the master node.
[0037] Figure 2 illustrates a schematic block diagram depicting an operational flow of resource utilization data from the container orchestration platform 100 to a system for hardware resource optimization, in accordance with an exemplary implementation of the present disclosure. The system may include a resource utilization data collection module 201 to gather or receive all the data collected from each of the plurality of pods 103 in the container orchestration platform 100. The smart monitoring pod 104A-N running on all the plurality of nodes 104 in the cluster feeds the resource utilization data to the resource collection module 201. The resource collection module 201 may have a sub-module to pre-process the collected resource utilization data. After the pre-processing the collected data is classified into a train data set and a test data set. The train data set is used to train the machine learning model LSTM fusion with TCN. The prediction from the machine learning model 203 may be used to optimize the resources for pods using a Dynamic Resource Allocator module 204. The Dynamic Resource Allocator module 204 is classified into three sub-modules namely resource manager 205, resource scheduler 206, and resource allocator 207. The resource manager 205 may decide the allocation of the resources based on the prediction. The resource scheduler 206 may identify the few pods from the plurality of pods 103A-N for dynamic resource allocation and few other pods from the plurality of pods 103A-N for the dynamic migration. Finally the resource allocator 207 may do dynamic resource allocation of resources to different pods in an optimized way as predicted by the machine learning model.
[0038] Figure 3 illustrates a schematic block diagram of a system for optimizing resource utilization in a plurality of pods in the container orchestration platform, in accordance with an exemplary implementation of the present disclosure. The system 200 may include at least one processor 302, a memory, and one or more modules 306. The system 200 may work in conjunction with a physical server connected via a network, or the system may be implemented in a cloud network or the system may be coupled with the container orchestration platform 100.
[0039] In an exemplary embodiment, the at least one processor 302 (hereinafter referred to as “processor 302”) may be operatively coupled to each of the memory 304, and the one or more modules 306. The processor 302 may include specialized processing units such as integrated system (bus) controllers, memory management control units, floating point units, graphics processing units, digital signal processing units, etc. In one embodiment, the processor 302 may include a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), or both. The processor 302 may be one or more general processors, Digital Signal Processors (DSPs), application-specific integrated circuits, Field-Programmable Gate Arrays (FPGAs), servers, networks, digital circuits, analog circuits, combinations thereof, or other now known or later developed devices for analyzing and processing data. The processor 302 may execute a software program, such as code generated manually (i.e., programmed) to perform the desired operation. The processor 302 may implement various techniques such as, but not limited to, data extraction, data processing, Artificial Intelligence (AI), Machine Learning (ML), Deep Learning (DL), and so forth to achieve the desired objective.
[0040] The memory 304 may be configured to store data and instructions executable by the processor 302. In one embodiment, the memory 304 may communicate via a bus within the system 200. The memory 304 may include, but is not limited to, a non-transitory computer-readable storage media, such as various types of volatile and non-volatile storage media including, but not limited to, random access memory, read-only memory, programmable read-only memory, electrically programmable read-only memory, electrically erasable read-only memory, flash memory, magnetic tape or disk, optical media and the like. In one example, the memory 304 may include a cache or random-access memory for the processor 302. In alternative examples, the memory 304 is separate from the processor 302, such as a cache memory of a processor, the system memory, or other memory. The memory 304 may be an external storage device or database for storing data. The memory 304 may be operable to store instructions executable by the processor 302. The functions, acts, or tasks illustrated in the figures or described may be performed by the programmed processor 302 for executing the instructions stored in the memory 304. The functions, acts, or tasks are independent of the particular type of instruction set, storage media, processor, or processing strategy and may be performed by software, hardware, integrated circuits, firmware, micro-code, and the like, operating alone or in combination. Likewise, processing strategies may include multiprocessing, multitasking, parallel processing, and the like. The memory 304 may further include a database to store the data. Further, the memory 304 may include an operating system for performing one or more tasks of the system 200, as performed by a generic operating system in the communications domain.
[0041] The one or more modules 306, amongst other things, may include routines, programs, objects, components, data structures, etc., which perform particular tasks or implement data types. The one or more modules 306 may also be implemented as, signal processor(s), state machine(s), logic circuitries, and/or any other device or component that manipulates signals based on operational instructions. Further, the one or more modules 306 can be implemented in hardware, instructions executed by a processing unit, or by a combination thereof. The processing unit can comprise a computer, the processor 302, a state machine, a logic array, or any other suitable devices capable of processing instructions. The processing unit can be a general-purpose processor that executes instructions to cause the general-purpose processor to perform the required tasks, or the processing unit can be dedicated to performing the required functions. In another embodiment of the present disclosure, the one or more modules 306 may be machine-readable instructions (software) which, when executed by a processor/processing unit, perform any of the described functionalities. Further, the data serves, amongst other things, as a repository for storing data processed, received, and generated by one or more of the modules.
[0042] Furthermore, the one or more modules 306 may be implemented through an artificial intelligence (AI) model. A function associated with AI may be performed through the non-volatile memory, the volatile memory, and the processor.
[0043] The at least one processor 302 may include one or a plurality of processors. At this time, one or a plurality of processors may be a general purpose processor, such as a central processing unit (CPU), an application processor (AP), or the like, a graphics-only processing unit such as a graphics processing unit (GPU), a visual processing unit (VPU), and/or an AI-dedicated processor such as a neural processing unit (NPU).
[0044] The one or a plurality of processors control the processing of the input data in accordance with a predefined operating rule or artificial intelligence (AI) model stored in the non-volatile memory and the volatile memory. The predefined operating rule or artificial intelligence model is provided through training or learning.
[0045] Here, being provided through learning means that, by applying a learning algorithm to a plurality of learning data, a predefined operating rule or AI model of a desired characteristic is made. The learning may be performed in a device itself in which AI according to an embodiment is performed, and may be implemented through a separate server/system.
[0046] The AI model may consist of a plurality of neural network layers. Each layer has a plurality of weight values and performs a layer operation through the calculation of a previous layer and an operation of a plurality of weights. Examples of neural networks include but are not limited to, convolutional neural network (CNN), deep neural network (DNN), and recurrent neural network (RNN).
[0047] The one or more modules 306 may include a set of instructions that may be executed to cause the system 200 to capture the desired event. The one or more modules 306 may include the Resource Utilization Data Collection Module 201, the Machine Learning Module 203, and the Dynamic Resource Allocator Module 207. The one or more modules 306 and various operations performed by each of the one or more modules 306 in communication with each other are described in the forthcoming paragraphs in conjunction with Figure 4-7.
[0048] Figure 4 illustrates schematic block diagram depicting LSTM-TCN hybrid Model implementation in the system, in accordance with an exemplary implementation of the present disclosure. In the present subject matter, a LSTM-TCN hybrid model is used to predict the resource utilization of the plurality of pods in the plurality of nodes. The present system may use the LSTM-TCN hybrid Model to leverage the strengths of each architecture to enhance overall predictive performance. The Long Short-Term Memory (LSTMs) machine learning model may be well-suited for capturing sequences of data with long-range dependencies. The LSTMs may find historical patterns and dependencies in resource utilization data along the sequence. On the other hand, Temporal Convolutional Networks (TCN) excel at capturing local patterns and dependencies within a sequence. The TCNs may efficiently capture short and medium-range patterns in the resource utilization data. This allows the LSTM-TCN hybrid machine learning model to capture both long-range temporal dependencies (captured well by LSTMs) and local patterns (captured well by TCNs).
[0049] The resource utilization data of the plurality of pods 103A-N may be provide as an input to the input layer 402 that may convert the input data into a format that can be processed by the subsequent layers. The second layer of the model is the LSTM layer 404 which may learn and capture long-term dependencies in sequential resource utilization data. The LSTM output layer 406 represents the output of the LSTM layer, it serves as a feature vector that summarizes the long-range patterns, i.e., it considers information that occurred sometime ago in the past. Along with the input data, this feature vector is also passed to the TCN layer 408 for further processing. The TCN layer 408 may learn and capture local patterns in the resource utilization data. The output layer 410 represents the final layer for predicting the output related to predictive scaling. The output layer 410 is the last layer of the LSTM-TCN hybrid model; it gives the final prediction 412 for the resource utilization of resources of the plurality of nods 103A-N.
[0050] Referring to Figure 3, the system 200 may include the Resource Utilization Data Collection Module 201. The Resource Utilization Data Collection Module 201 may be communicatively coupled with the monitoring pod 104A-N of each of the plurality of nodes 102A-N. The resource utilization data collection module 201 may be adapted to receive resource utilization data from each of the plurality of pods of the plurality of nodes. The resource utilization data collection module 201 may be adapted to pre-process the received resource utilization data by removal of outliers, deleting rows and columns that contain NULL values, normalization of values and removal of Duplicate entries. After pre-processing the clean data is separated into a train/training data set and a test data set. The train data set is the data set that may be used to train the LSTM-TCN hybrid machine learning model. The test data set is the data set that may be used to test/validate the predictions of the trained LSTM-TCN hybrid machine learning model.
[0051] The machine learning module 203 may be adapted to train the Long Short-Term Memory (LSTM) -Temporal Convolutional Networks (TCN) hybrid machine learning model based on the training data set. The machine learning module 203 may be adapted to predict an optimized resource utilization for the plurality of pods using the trained LSTM -TCN hybrid machine learning model.
[0052] The dynamic resource allocator module 207 may be adapted to allocate resources to the plurality of pods based on the predicted optimized resource utilization. Further, the dynamic resource allocator module 207 may be adapted to migrate the pods between the plurality of nodes based on the predicted optimized resource utilization to optimize the use of available resources.
[0053] Figure 5 illustrates a flowchart of a method for optimizing resource utilization in a plurality of pods in the container orchestration platform, in accordance with an exemplary implementation of the present disclosure. The method 500 may be implemented by the system 200.
[0054] At operation 502, the method may include receiving/collecting resource utilization data through the smart monitoring pod 104A-N. The resource utilization data collected by the smart monitoring pod 104A-N include resource utilization statistics of CPU, memory, storage and GPU from each pod.
[0055] At operation 504, the method may include pre-processing the received resource utilization data. Once the resource utilization data is collected, a pre-processing may be done that involves steps like removal of outliers, deleting rows and columns that contain NULL values, normalization of values and removal of Duplicate entries.
[0056] At operation 506, the method may include classifying the pre-processed data into a training data set and a test data set. The test data set may be used to evaluate the performance of the machine learning model. The test data set may be used to verify the performance of the machine learning model.
[0057] At operation 508, the method may include training Long Short-Term Memory (LSTM) -Temporal Convolutional Networks (TCN) hybrid machine learning model based on the training data set. The training set data may be used to train the Machine Learning model, the training data set may contain previous time periods when the plurality of pods 103A-N resources were over provisioned or under provisioned.
[0058] At operation 510, the method may include predicting future resource needs based on historical patterns of the pod's usage using the trained Machine Learning model. The prediction by the machine learning model may help in deciding which pods can be scaled up or down during which time period. The trained Machine Learning model may identify abnormal spikes in resource usage beforehand so the implemented system can dynamically scale the resources for the pods according to the prediction of the resource usage.
[0059] At operation 512, the method may include dynamically adjusting resource allocation for the plurality of pods based on the predicted optimized resource utilization. The machine learning model is integrated with the Resource Allocator Module to allow dynamic migration of the pods, allocation and freeing up of the resources to the pods in an optimized way.
[0060] At operation 514, the method may include periodically collecting updated resource utilization data for continuous training of the machine learning model. With the present operation, the system continuously learns from the actual resource usage patterns and adjusts the machine learning model accordingly.
[0061] At operation 516, the method may include performing batch training of the LSTM-TCN hybrid machine learning model for continuous optimization of the resource utilization. When a significant amount of new data is available, the training of the existing machine learning model may be performed using batch training to continuously update the resource utilization based on real-time values.
[0062] According to the method as disclosed herein, wherein the predicted model is used in pods migration, resource allocation and free up of the resources between the (pods) nodes.
[0063] According to the method as disclosed herein, smart monitoring pod in the container orchestration platform to collect the resources utilized by the pods in the platform.
[0064] According to the method as disclosed herein, a dynamic resource allocator which further comprising: a resource manager to manage the prediction models from the Machine Learning module; a resource scheduler to schedule the migration of the pods between the nodes, and to schedule the dynamic adjusting of the resource allocation and de-allocation to the pods; a resource allocator to allocate the resources to the pods.

[0065] Thus, the present disclosure focuses on a system and method to optimize the resource utilization for pods in container orchestration platform is described. The resource utilization data is collected from various individual pods running on the nodes in container orchestration platform using a smart monitoring pod. The collected resource utilization data is pre-processed and further used to train the LSTM (Long Short-Term Memory) fusioned with TCN (Temporal Convolutional Network) machine learning model. The prediction made by the machine learning model is used to dynamically adjust the resources among the pods running on the nodes. This prediction is further utilized in migrating the pods between the nodes for optimum utilization of the resources available in the migrated node. The resource utilization data is collected periodically at regular intervals to further re-train the model to achieve further optimization in resource utilization.

[0066] In this application, unless specifically stated otherwise, the use of the singular includes the plural, and the use of “or” means “and/or.” Furthermore, the use of the terms “including” or “having” is not limiting. Any range described herein will be understood to include the endpoints and all values between the endpoints. Features of the disclosed embodiments may be combined, rearranged, omitted, etc., within the scope of the invention to produce additional embodiments. Furthermore, certain features may sometimes be used to advantage without a corresponding use of other features.
[0067] While at least one exemplary embodiment has been presented in the foregoing detailed description, it should be appreciated that a vast number of variations exist. ,CLAIMS:1. A method (500) for optimizing resource utilization for a plurality of pods (103A-N) in a plurality of nodes (102A-N) of a container orchestration platform (100), the method (500) comprising:
receiving (502) a resource utilization data from each of the plurality of pods (103A-N) of the plurality of nodes (102A-N);
pre-processing (504) the received resource utilization data;
classifying the pre-processed data into a training data set and a testing data set;
training (506) Long Short-Term Memory (LSTM)-Temporal Convolutional Networks (TCN) hybrid machine learning model based on the training data set; and
predicting (508) an optimize resource utilization for the plurality of pods (103A-N) using the trained LSTM -TCN hybrid machine learning model.

2. The method (500) as claimed in claim 1, further comprising dynamically adjusting resource allocation for the plurality of pods (103A-N) based on the predicted optimized resource utilization.

3. The method (500) as claimed in claim 1, further comprising periodically collecting updated resource utilization data and performing batch training of the LSTM-TCN hybrid machine learning model for continuous optimization of the resource utilization.

4. The method (500) as claimed in claim 1, wherein the predicting optimize resource utilization through the LSTM-TCN hybrid machine learning model, the LSTM-TCN hybrid machine learning model comprising:
capturing long-range temporal dependencies in the training data to identify historical patterns in the resource utilization; and
capturing local patterns and short and medium-range temporal dependencies in the training data to identify local patterns in the resource utilization.

5. The method (500) as claimed in claim 1, further comprising:
utilizing the predicted optimize resource utilization for migration of pods (103A-N) between the plurality of nodes (102A-N), resource allocation and deallocation to the plurality of pods (103A-N), and free-up of the resources between the plurality of pods (103A-N).

6. A system (200) for optimizing resource utilization in a plurality of pods (103A-N) of a plurality of nodes (102A-N), the system (200) comprising:
a memory (304); and
at least one processor (302) coupled to the memory (304), the at least one processor (302) configured to:
receive resource utilization data from each of the plurality of pods of the at least one node; and
pre-process the received resource utilization data;
classify the pre-processed data into a training data set and a testing data set;
train a Long Short-Term Memory (LSTM) -Temporal Convolutional Networks (TCN) hybrid machine learning model based on the training data set; and
predict an optimized resource utilization for the plurality of pods (103A-N) using the trained LSTM -TCN hybrid machine learning model.

7. The system (200) as claimed in claim 6, wherein the at least one processor (302) is further configured to:
allocate resources to the plurality of pods (103A-N) based on the predicted optimized resource utilization; and
migration of the pods between the plurality of nodes (102A-N) based on the predicted optimized resource utilization to optimize the use of available resources.

8. The system (200) as claimed in claim 6, wherein the at least one processor (302) is further configured to: dynamically adjust resource allocation for the plurality of pods (103A-N) based on the predicted optimized resource utilization.
9. The system (200) as claimed in claim 6, wherein the at least one processor (202) is further configured to periodically collect updated resource utilization data and perform batch training of the LSTM-TCN hybrid machine learning model for continuous optimization of the resource utilization.
10. The system (200) as claimed in claim 6, wherein the at least one processor (302) is further configured to perform batch training periodically to refine the LSTM-TCN model using updated resource utilization data.

11. The system (200) as claimed in claim 6, wherein prior to predicting the optimized resource utilization through the LSTM-TCN hybrid machine learning model, the LSTM-TCN hybrid machine learning model coupled with the at least one processor (302) to:
capture long-range temporal dependencies in the training data to identify historical patterns in the resource utilization; and
capture local patterns, short and medium-range temporal dependencies in the training data to identify local patterns in the resource utilization.

Documents

Application Documents

# Name Date
1 202441025716-PROVISIONAL SPECIFICATION [28-03-2024(online)].pdf 2024-03-28
2 202441025716-FORM 1 [28-03-2024(online)].pdf 2024-03-28
3 202441025716-DRAWINGS [28-03-2024(online)].pdf 2024-03-28
4 202441025716-FORM-26 [07-06-2024(online)].pdf 2024-06-07
5 202441025716-Proof of Right [16-09-2024(online)].pdf 2024-09-16
6 202441025716-POA [04-10-2024(online)].pdf 2024-10-04
7 202441025716-FORM 13 [04-10-2024(online)].pdf 2024-10-04
8 202441025716-AMENDED DOCUMENTS [04-10-2024(online)].pdf 2024-10-04
9 202441025716-Response to office action [01-11-2024(online)].pdf 2024-11-01
10 202441025716-DRAWING [28-03-2025(online)].pdf 2025-03-28
11 202441025716-CORRESPONDENCE-OTHERS [28-03-2025(online)].pdf 2025-03-28
12 202441025716-COMPLETE SPECIFICATION [28-03-2025(online)].pdf 2025-03-28