Method And System For Latency Based Load Balancing

< Back

Method And System For Latency Based Load Balancing

Abstract: A method and system for latency-based load balancing is disclosed. The method includes receiving a plurality of requests at a load balancer (104) from an application (102). The load balancer (104) distributes network or application traffic across a plurality of servers (106) capable of fulfilling the plurality of requests. Further, the method includes assigning a predefined category from a plurality of predefined categories to each of the plurality of requests based on a predefined category associated with each of a plurality of application programming interfaces (APIs) associated with each of the plurality of requests, wherein the plurality of predefined categories is based on an execution time. Furthermore, the method includes prioritizing execution of the plurality of requests with a low execution time.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

25 March 2021

Publication Number

07/2023

Publication Type

INA

Invention Field

COMMUNICATION

Status

vaibhav.khanna@sterlite.com

Parent Application

Applicants

STERLITE TECHNOLOGIES LIMITED

STERLITE TECHNOLOGIES LIMITED, IFFCO Tower, 3rd Floor, Plot No.3, Sector 29, Gurgaon 122002, Haryana, India

Inventors

1. Sumit Sati

3rd Floor, Plot No. 3, IFFCO Tower, Sector 29, Gurugram, Haryana - 122002

2. Aditya Shrivastava

3rd Floor, Plot No. 3, IFFCO Tower, Sector 29, Gurugram, Haryana - 122002

Specification

The present disclosure relates to load balancing, and more specifically, relates to a method and a system for latency-based load balancing in dynamic manner.
BACKGROUND
[0002] Load balancers are one of the vital entities in information technology and networking infrastructure. In general, the load balancers distribute traffic to various servers present in a resource pool to ensure no single server is overloaded, thereby minimizing server response time and maximizing throughput. The load balancers typically route requests to suitable destinations to prevent bottlenecks that may hamper the information technology and networking infrastructure.
[0003] Conventionally, the load balancers distribute load based on target machine (e.g., virtual machine) health, queue size with a pre-defined algorithm. This makes an Application Programming Interface (API) or a request taking smaller execution time to wait for an API or a request taking longer time to finish.
[0004] For example, a prior art reference "US9459904B2" teaches systems and methods for preferentially assigning virtual machines (VMs) on a particular NUMA node with network queues on the same NUMA node. A load balancer process on a host assigns multiple VMs to network queues. The assignment of the VMs to a network queues is performed with a bias toward assigning VMs using a particular NUMA node to network queues on the same NUMA node. A scheduler on the host assigns VMs to NUMA nodes.
[0005] Further, a prior art reference "US6853642B1" relates to load balancing service requests between at least two service component instances associated with a network access point. Similarly, a prior art reference "US20040117794A1" relates to methods, system and framework for task scheduling while reducing the number of requests dropped in a computing system and enforcement of Quality of Service (QoS) and Service Level Agreements (SLA).

[0006] Further, a prior art reference "US8873424B2" discloses methods for performing load balancing within an Ethernet network. On a packet-by-packet basis or on a per flow basis, the first component dynamically selects a particular path of the multiple of paths by selecting a virtual network of the set of virtual networks for transporting the received packet that tends to balance traffic load across the set of virtual networks.
[0007] Further, in a prior art "Dynamic load balancing based on latency prediction" states a novel load balancing algorithm based on a modified accrual failure detector that exploits request-reply latency as an indirect measure of the load on individual backends; where the choice of request-reply latency as a load indicator is justified, by presenting empirical evidence that in a context where we have a known reliable load index, such as CPU utilization, latency is correlated to that index.
[0008] Furthermore, in a prior art "A new load balancing strategy by task allocation in edge computing based on intermediary nodes" a load balancing strategy by task allocation in edge computing based on intermediary nodes is proposed. The intermediary node is used to monitor the global information to obtain the real-time attributes of the edge nodes and complete the classification evaluation.
[0009] However, the problem remains same. That is, the load balancers assigns virtual machines to queues based on availability and load, thereby delaying the server response time.
[0010] In light of above discussion and in consideration with prior-arts, there remains a need for a load balancing method and system that assign the virtual machine for processing APIs and requests dynamically.
[0011] Any references to methods, apparatus or documents of the prior art are not to be taken as constituting any evidence or admission that they formed, or form part of the common general knowledge.
OBJECT OF THE DISCLOSURE

[0012] A principal object of the present disclosure is to provide a method and a system for latency-based load balancing in dynamic manner.
[0013] Another object of the present disclosure is to provide a dynamic load balancing method and system that assign virtual machines for processing Application Programming Interfaces (APIs) and/or requests based on API categories and dynamically updating the API categories based on run time average latency of each APIs in the system.
[0014] Another object of the present disclosure is to improve categorization of the application programming interfaces (APIs) and/or requests.
SUMMARY
[0015] In order to achieve the above objects, the present disclosure provides a method and a system for latency-based load balancing in dynamic manner.
[0016] In an aspect, the present disclosure provides the method for load balancing. The method includes receiving a plurality of requests at a load balancer from an application, wherein the load balancer distributes network or application traffic across a plurality of servers capable of fulfilling the plurality of requests. Further, the method includes assigning a predefined category from a plurality of predefined categories to each of the plurality of requests based on a predefined category associated with each of a plurality of application programming interfaces (APIs) associated with each of the plurality of requests, wherein the plurality of predefined categories is based on an execution time. Furthermore, the method includes prioritizing execution of the plurality of requests with a low execution time, wherein prioritizing execution of the plurality of requests comprises routing each of the plurality of requests based on the plurality of predefined categories, wherein each of the plurality of requests is routed to a server of the plurality of servers. Additionally, the method includes calculating a run-time average latency for each API, wherein the run-time average latency determines processing time taken by any of the plurality of APIs over a period to execute and dynamically

updating the predefined category to the plurality of predefined categories based on the run-time average latency.
[0017] In another aspect, the present disclosure provides a method of dynamically updating a plurality of predefined categories that can be associated with a plurality of requests. The method includes dynamically updating the plurality of predefined categories of a plurality of application programming interfaces (APIs) on the basis of latency for traffic management in a network and dynamically allocating the plurality of predefined categories of the plurality of APIs to facilitate transport of corresponding request from the plurality of requests flowing across the network. Further, the method includes managing the plurality of APIs in a category of latency based at request flow basis for the traffic management and determining a server from a plurality of servers that is executing the category of the plurality of requests. The method further includes feeding a load balancer with response of average time taken by the plurality of APIs while execution at the server. The method further comprising monitoring, the network or application traffic across the plurality of servers capable of fulfilling the plurality of requests based on latency. The method dynamically updates the plurality of predefined categories by updating the plurality of predefined categories corresponding to the plurality of APIs for executing each of the plurality of APIs at the server based on the latency and dynamically selects a request from the plurality of requests that has a lowest latency and routes the request to a specific server. A latency for each API comprises processing time taken by each of the plurality of APIs over a period.
[0018] In yet another aspect, the present disclosure provides the system for load balancing. The system comprises an application, a load balancer and a plurality of servers. The load balancer is configured to receive a plurality of requests from the application, wherein the load balancer distributes network or application traffic across the plurality of servers capable of fulfilling the plurality of requests. The load balancer assigns a predefined category from a plurality of predefined categories to each of the plurality of requests based on a predefined category associated with each of a plurality of application programming interfaces

(APIs) associated with each of the plurality of requests, wherein the plurality of predefined categories is based on an execution time. Further, the load balancer prioritizes execution of the plurality of requests with a low execution time. The load balancer routes each of the plurality of requests based on the plurality of predefined categories, wherein each of the plurality of requests is routed to a server of the plurality of servers.
[0019] These and other aspects herein will be better appreciated and understood when considered in conjunction with the following description and the accompanying drawings. It should be understood, however, that the following descriptions, while indicating preferred embodiments and numerous specific details thereof, are given by way of illustration and not of limitation. Many changes and modifications may be made within the scope of the embodiments herein without departing from the spirit thereof, and the embodiments herein include all such modifications.
BRIEF DESCRIPTION OF FIGURES
[0020] The method and system are illustrated in the accompanying drawings, throughout which like reference letters indicate corresponding parts in the drawings. The embodiments herein will be better understood from the following description with reference to the drawings, in which:
[0021] FIG. 1 is an example system having a load balancer.
[0022] FIG. 2 illustrates an architecture of an application of the system of FIG. 1.
[0023] FIG. 3 is an example process flow for load balancing in accordance with the present disclosure.
[0024] FIG. 4 is a flow-chart illustrating a method of load balancing in accordance with the present disclosure.
[0025] FIG. 5 is a flow-chart illustrating a method of dynamically updating predefined categories.
DETAILED DESCRIPTION

[0026] In the following detailed description of embodiments of the invention, numerous specific details are set forth in order to provide a thorough understanding of the embodiment of invention. However, it will be obvious to a person skilled in the art that the embodiments of the invention may be practiced with or without these specific details. In other instances, well known methods, procedures and components have not been described in details so as not to unnecessarily obscure aspects of the embodiments of the invention.
[0027] Furthermore, it will be clear that the invention is not limited to these embodiments only. Numerous modifications, changes, variations, substitutions and equivalents will be apparent to those skilled in the art, without parting from the scope of the invention.
[0028] The accompanying drawings are used to help easily understand various technical features and it should be understood that the embodiments presented herein are not limited by the accompanying drawings. As such, the present disclosure should be construed to extend to any alterations, equivalents and substitutes in addition to those which are particularly set out in the accompanying drawings. Although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are generally only used to distinguish one element from another.
[0029] Load balancing is one of the most scalable methodologies for handling the profusion of requests from workflows to provide a consistent and dependable end-user experience. The present disclosure proposes a load balancer and a method of load balancing that dynamically switch target machine based on average latency of application programming interfaces (APIs) and/or requests such that an application programming interface or a request taking lower time does not wait in a queue. Further, the present disclosure provides a way to improve categorization of the APIs and/or the requests. Advantageously, the load balancer of the present disclosure manages load efficiently, where faster API and/or request gets priority.
[0030] The present disclosure provides a method and system for providing load balancing technique based on latency of the APIs or the requests to be

processed. The latency for each API and/or request is calculated over a period and compared with threshold values for categorizing the APIs and/or the requests into different latency categories. The APIs and/or requests belonging to each latency category is assigned a specific machine, such as virtual machine, processor, for processing, thereby providing efficient use of machine resources within the system and low latency load balancing. Further, run time average latency for each API is calculated and is used for dynamic updating API latency category. The method and system provide an intelligent load balancing technique, focused on reducing latency that occurs due to different queue length of the APIs or the requests.
[0031] FIG. 1 is an example system (100) having a load balancer (104). FIG. 2 illustrates an architecture of an application of the system of FIG. 1. FIG. 3 is an example process flow for load balancing in accordance with the present disclosure.
[0032] Referring to FIG. 1, the system (100) comprises an application (102), the load balancer (104) and a plurality of servers (106). The application (102) may be a framework that operates and hosts applications for end users. The application (102) may be related to email, e-commerce, web application or the like. The applications reside in a device or a user equipment (170) present with an end user. The device may be any type of computing device, such as, but not limited to, mobile phone, laptop, personal computer. The applications are accessible via a user interface of the user equipment (170). The user equipment (170) may allow connectivity, such as via internet, to the application (102) via a communication network. The communication network provides network connectivity to the system (100). The communication network may be based on, but not limited to, 2G, 3G, 4G, 5G, Wi-Fi, BLE, LAN (local area network), VPN (virtual private network), WAN (wide area network), or the like.
[0033] FIG. 2 illustrates the architecture of the application (102) of the system (100) of FIG. 1. The application (102) corresponds to a Policy and Charging Control (PCC). The Policy and Charging Control (PCC), also known as an integrated PCC, is a policy management that enables operators to dynamically

control network resources with real-time policies based on service, subscriber or usage context.
[0034] As shown in FIG. 2, an Application Function (AF) (110) is an element implementing applications that require dynamic policy and/or charging control of traffic plane resources. A Policy and Charging Enforcement Function (PCEF) (120) provides service data flow detection, charging, and policy enforcement of the user plane traffic. Further, a Policy and Charging Rules Function (PCRF) (130) is a separate logical node and sits in between of the Application layer (e.g., service offering sources/applications on subscriber device/user equipment), where services (e.g., for example, stream live match, telecommunication-based services, data-file request, and the like) are initiated and service characteristics are negotiated, and the user plane where the actual service is being provided. The PCRF (130) provides policy and flow-based charging control functions, using subscriber data stored in a Subscription Profile Repository (SPR) (150) (defining the subscription details that user has subscribed to, for example, Gold Package subscription for viewing the Direct-to-Home (DTH) television services). The PCRF (130) receives service information (e.g., application identifier, type of media, bandwidth, IP address and port number) from the AF (110) over the Rx interface, and uses this to install PCC rules into the PCEF (120) which in turn ensures that only authorized media flows associated with the requested services are allowed, and that the correct bandwidth, charging and priority are applied. The PCEF (120) provides real-time charging information (for e.g., prepaid services) to an Online Charging System (OCS) (140) over a Gy interface and generates reports for an Offline Charging System (OFCS) regarding resource usage (for e.g., postpaid services) over a Gz interface. In general, Gy interface functions as DCCA (Diameter Credit-Control Application) proxy between the PCEF (120) and the OCS (140) to allow online credit control and Gz interface is used as an offline charging interface.
[0035] The AF (110) may modify session Information at any time, for example due to an AF session modification or internal AF trigger. Modification is achieved by the AF (110) sending an AA-Request command to the PCRF (130)

over the Rx reference point containing the Media-Component-Description Attribute-Value Pairs (AVPs), with the updated service Information as defined in communication standards. The PCRF (130) processes the received service information according to the operator policy and may decide whether the request is accepted or not. If the request is accepted, the PCRF (130) updates the pre-existing service information with the new information. The updated service information may require the PCRF (130) to create, modify or delete the related PCC rules and provide the updated information towards the PCEF (120) over the Gx reference point as specified in communication standards. Typically, Gx interface is used to provision service data flow as per charging rules between the PCRF (130) and the PCEF (120). The procedures used to update the Authorized QoS for the affected IP-CAN bearer are also specified in communication standards. Currently specified procedures for modification of the service information for PCC provide for the immediate activation, replacement and removal of filter description information at the PCEF (120).
[0036] As can be derived from aforementioned description, the PCEF (120) can be an access gateway or an interface or gateway configured to receive a request such as, but not limited to, subscription request from the user equipment (170). The request may be related to and/or associated with an application programming interface. The request may include service-based request. In an example, there may be one request and/or API. In another example, there may be more than one requests and/or APIs.
[0037] Referring to FIG. 1 and FIG. 3, the load balancer (104) receives a plurality of requests and/or APIs from the application (102) for processing that is further routed to a server of the plurality of servers (106). In general, the load balancer (104) accepts incoming traffic from clients such as from the user equipment (170) and routes requests to its registered targets (target machines). The load balancer (104) distributes network or application traffic across the plurality of servers (106) capable of fulfilling the requests in a manner that maximizes speed and capacity utilization. The load balancer (104) may be hardware or software or combination thereof. The plurality of servers (106) may

be instances, virtual machines (or machines), processors or the like. The server may correspond to collection of properties and attributes that define a virtual machine that has run, will run, or is running in a cloud. The server to which the load balancer (104) routes the plurality of requests and/or APIs may be a target (or a target machine). The target machine may be identified based on execution time. That is, the target machine may be identified based on total time taken to process a request and/or API from the plurality of requests and/or APIs.
[0038] The load balancer (104) is configured to include a plurality of predefined categories corresponding to each API of the plurality of APIs. Once the load balancer (104) receives the plurality of requests from the application (102), the load balancer (104) assigns a predefined category from the plurality of predefined categories to each of the plurality of requests based on the plurality of predefined categories associated with the plurality of APIs that is associated with the plurality of requests. That is, the load balancer (104) is configured to have the plurality of predefined categories with respect to the plurality of APIs. Once the plurality of requests is received at the load balancer (104), the load balancer (104) assigns each of the plurality of requests to the predefined category associated with corresponding API. In an example, one API may be associated with one request.
[0039] The plurality of predefined categories may be based on, but not limited to, response time/latency. For e.g., referring to FIG. 3, upon reception of the plurality of requests, the load balancer (104) identifies whether each of the
plurality of requests belongs to category A, category B, ,category Z. For e.g.,
if a request from the plurality of requests matches with a threshold of category A, then the request is assigned to server NN (i.e., application instance NN) corresponding to category A, if not, then it will be matched/checked with the threshold of category B. If the request matches with the threshold of category B, then the request will be allocated to server 2 (i.e., application instance 2), if not, the request will be matched/checked with other categories that are available. The threshold may be defined on the basis of the latency/execution time. In this way, for each request from the plurality of requests a corresponding category may be assigned and the load balancer prioritizes the execution of the request having the

low execution time. For example, the request may be any service based request such as adding a new subscriber and checking current balance of a subscriber. In this scenario, the request related to adding the new subscriber may take more execution time than the request related to checking current balance of the subscriber. The load balancer checks these requests in the plurality of predefined
categories category A, category B, ,category Z and prioritizes execution of
the requests in order of a low execution time to a high execution time. In the above example, the checking current balance of the subscriber may take less execution time, so this request will be executed first.
[0040] Based on prioritization, the request is routed to the server (i.e., target server) of the plurality of servers (106). Thus, the load balancer routes the request to specific server (application instance/machine/processor) based on category configuration.
[0041] The load balancer (104) may implement artificial intelligence/machine learning techniques to structure the plurality of predefined categories. Alternatively, the load balancer (104) may implement any other suitable analytical technique to structure the plurality of predefined categories. The plurality of predefined categories may be structured based on average response time for any of the API over a period of time, where the load balancer classifies/categorizes the plurality of requests and/or APIs in various categories from the beginning and changes according to ongoing results to assign the plurality of servers. For example, if an API associated with a request has latency greater than 10ms, then the load balancer may allocate server/machine 1 for processing of the request. Similarly, if the API associated with the request has latency between 5ms and 10ms, then the load balancer may allocate server/machine 2 for processing of the request. Similarly, if the API associated with the request has latency between 0ms and 5ms, then the load balancer may allocate server/machine 3 for processing of the request.
[0042] That is, after categorization of the request on the basis of latency, a server from the plurality of servers (106) will be allocated to the plurality of

predefined categories, to process, so that requests taking lower time (i.e., having lower latency/execution time) don't have to wait in a queue.
[0043] The load balancer (104) is further configured to calculate run-time average latency for each API associated with corresponding request. The run-time average latency determines processing time taken by any of the plurality of APIs over a period to execute. Based on which, the load balancer dynamically updates the predefined category to the plurality of predefined categories. The load balancer dynamically updates the plurality of predefined categories of the plurality of APIs on the basis of latency for traffic management in a network in the system (100) and dynamically allocates the plurality of predefined categories of the plurality of APIs to facilitate transport of respective request flowing across the network.
[0044] As mentioned previously, the load balancer may balance the load based on latency. Thus, the plurality of predefined categories may be latency based. For example, requests A and B has corresponding APIs with same latency. Then, these two requests shall fall under a single category and the server capable of executing the two requests will be assigned to the category. That is, the load balancer has the capability to manage the plurality of APIs in a category of latency based on request flow basis for the traffic management and determines which server from the plurality of servers (106) is executing the request. The load balancer is configured to receive consistent feedback/response of average time taken by the plurality of APIs while execution at the server.
[0045] The load balancer (104) is configured to monitor traffic across the plurality of servers (106) capable of fulfilling the requests based on latency and accordingly, distributes network or application traffic across the plurality of servers (106).
[0046] The present disclosure enables measuring the average latency of each Request/APIs over the period of time and if it is less than the threshold value redirects such API/requests a specific machine/s. This makes the system (100) faster where faster/smaller requests don't have to wait for slower/bigger requests to finish.

[0047] FIG. 4 is a flow-chart (400) illustrating a method of load balancing in accordance with the present disclosure. The method has been explained in conjunction with FIG. 1 to FIG. 3.
[0048] At step (402), the method includes receiving the plurality of requests at the load balancer (104) from the application (102). The load balancer (104) distributes network or application traffic across the plurality of servers (106) capable of fulfilling the plurality of requests.
[0049] At step (404), the method includes assigning the predefined category from the plurality of predefined categories to each of the plurality of requests based on the predefined category associated with each of the plurality of application programming interfaces (APIs) associated with each of the plurality of requests. The plurality of predefined categories is based on an execution time.
[0050] At step (406), the method includes prioritizing execution of the plurality of requests with a low execution time.
[0051] FIG. 5 is a flow-chart (500) illustrating a method of dynamically updating the plurality of predefined categories. The method has been explained in conjunction with FIG. 1 to FIG. 3. The plurality of predefined categories may be associated with the plurality of requests.
[0052] At step (502), the method includes dynamically updating the plurality of predefined categories of the plurality of application programming interfaces (APIs) on the basis of latency for traffic management in the network.
[0053] At step (504), the method includes dynamically allocating the plurality of predefined categories of the plurality of APIs to facilitate transport of corresponding request from the plurality of requests flowing across the network.
[0054] At step (506), the method includes managing the plurality of APIs in a category of latency based at request flow basis for the traffic management.
[0055] At step (508), the method includes determining a server from the plurality of servers (106) that is executing the category of the plurality of requests.
[0056] At step (510), the method includes feeding the load balancer (104) with response of average time taken by the plurality of APIs while execution at the server.

[0057] The methods and processes (400) and (500) described herein may have fewer or additional steps or states and the steps or states may be performed in a different order. Not all steps or states need to be reached. The methods and processes described herein may be embodied in, and fully or partially automated via, software code modules executed by one or more general purpose computers. The code modules may be stored in any type of computer-readable medium or other computer storage device. Some or all of the methods may alternatively be embodied in whole or in part in specialized computer hardware.
[0058] The embodiments disclosed herein can be implemented using at least one software program running on at least one hardware device and performing network management functions to control the elements.
[0059] It will be apparent to those skilled in the art that other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention. While the foregoing written description of the invention enables one of ordinary skill to make and use what is considered presently to be the best mode thereof, those of ordinary skill will understand and appreciate the existence of variations, combinations, and equivalents of the specific embodiment, method, and examples herein. The invention should therefore not be limited by the above described embodiment, method, and examples, but by all embodiments and methods within the scope of the invention. It is intended that the specification and examples be considered as exemplary, with the true scope of the invention being indicated by the claims.
[0060] The results of the disclosed methods may be stored in any type of computer data repository, such as relational databases and flat file systems that use volatile and/or non-volatile memory (e.g., magnetic disk storage, optical storage, EEPROM and/or solid state RAM).
[0061] The various illustrative logical blocks, modules, routines, and algorithm steps described in connection with the embodiments disclosed herein can be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software,

various illustrative components, blocks, modules, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. The described functionality can be implemented in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosure.
[0062] Moreover, the various illustrative logical blocks and modules described in connection with the embodiments disclosed herein can be implemented or performed by a machine, such as a general purpose processor device, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components or any combination thereof designed to perform the functions described herein. A general purpose processor device can be a microprocessor, but in the alternative, the processor device can be a controller, microcontroller, or state machine, combinations of the same, or the like. A processor device can include electrical circuitry configured to process computer-executable instructions. In another embodiment, a processor device includes an FPGA or other programmable device that performs logic operations without processing computer-executable instructions. A processor device can also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration. Although described herein primarily with respect to digital technology, a processor device may also include primarily analog components. A computing environment can include any type of computer system, including, but not limited to, a computer system based on a microprocessor, a mainframe computer, a digital signal processor, a portable computing device, a device controller, or a computational engine within an appliance, to name a few.
[0063] The elements of a method, process, routine, or algorithm described in connection with the embodiments disclosed herein can be embodied directly in

hardware, in a software module executed by a processor device, or in a combination of the two. A software module can reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of a non-transitory computer-readable storage medium. An exemplary storage medium can be coupled to the processor device such that the processor device can read information from, and write information to, the storage medium. In the alternative, the storage medium can be integral to the processor device. The processor device and the storage medium can reside in an ASIC. The ASIC can reside in a user terminal. In the alternative, the processor device and the storage medium can reside as discrete components in a user terminal.
[0064] Conditional language used herein, such as, among others, "can," "may," "might," "may," "e.g.," and the like, unless specifically stated otherwise, or otherwise understood within the context as used, is generally intended to convey that certain embodiments include, while other embodiments do not include, certain features, elements and/or steps. Thus, such conditional language is not generally intended to imply that features, elements and/or steps are in any way required for one or more embodiments or that one or more embodiments necessarily include logic for deciding, with or without other input or prompting, whether these features, elements and/or steps are included or are to be performed in any particular embodiment. The terms "comprising," "including," "having," and the like are synonymous and are used inclusively, in an open-ended fashion, and do not exclude additional elements, features, acts, operations, and so forth. Also, the term "or" is used in its inclusive sense (and not in its exclusive sense) so that when used, for example, to connect a list of elements, the term "or" means one, some, or all of the elements in the list.
[0065] Disjunctive language such as the phrase "at least one of X, Y, Z," unless specifically stated otherwise, is otherwise understood with the context as used in general to present that an item, term, etc., may be either X, Y, or Z, or any combination thereof (e.g., X, Y, and/or Z). Thus, such disjunctive language is not

generally intended to, and should not, imply that certain embodiments require at least one of X, at least one of Y, or at least one of Z to each be present.
[0066] While the above detailed description has shown, described, and pointed out novel features as applied to various embodiments, it can be understood that various omissions, substitutions, and changes in the form and details of the devices or algorithms illustrated can be made without departing from the scope of the disclosure. As can be recognized, certain embodiments described herein can be embodied within a form that does not provide all of the features and benefits set forth herein, as some features can be used or practiced separately from others.

CLAIMS

We Claim:

1. A method for load balancing, comprising:
receiving a plurality of requests at a load balancer (104) from an application (102), wherein the load balancer (104) distributes network or application traffic across a plurality of servers (106) capable of fulfilling the plurality of requests; and
prioritizing execution of the plurality of requests that have a low execution time.
The method as claimed in claim 1 further comprising assigning a predefined category from a plurality of predefined categories to each of the plurality of requests based on a predefined category associated with each of a plurality of application programming interfaces (APIs) associated with each of the plurality of requests, wherein the plurality of predefined categories is based on an execution time.
3. The method as claimed in claim 1, wherein prioritizing execution of the plurality of requests comprises routing each of the plurality of requests based on the plurality of predefined categories, wherein each of the plurality of requests is routed to a server of the plurality of servers (106).
4. The method as claimed in claim 1 further comprising:
calculating a run-time average latency for each API, wherein the run-time average latency determines processing time taken by any of the plurality of APIs over a period to execute; and
dynamically updating the predefined category to the plurality of predefined categories based on the run-time average latency.
5. A method of dynamically updating a plurality of predefined categories that can
be associated with a plurality of requests, the method comprising:

dynamically updating the plurality of predefined categories of a plurality of application programming interfaces (APIs) on the basis of latency for traffic management in a network;
dynamically allocating the plurality of predefined categories of the plurality of APIs to facilitate transport of corresponding request from the plurality of requests flowing across the network;
managing the plurality of APIs in a category of latency based at request flow basis for the traffic management;
determining a server from a plurality of servers (106) that is executing the category of the plurality of requests; and
feeding a load balancer with response of average time taken by the plurality of APIs while execution at the server.
6. The method as claimed in claim 4 further comprising monitoring, the network or application traffic across the plurality of servers (106) capable of fulfilling the plurality of requests based on latency.
7. The method as claimed in claim 4, wherein dynamically updating the plurality of predefined categories comprises updating the plurality of predefined categories corresponding to the plurality of APIs for executing each of the plurality of APIs at the server based on the latency.
8. The method as claimed in claim 4 dynamically selects a request from the plurality of requests that has a lowest latency and routes the request to a specific server.
9. The method as claimed in claim 4, wherein a latency for each API comprises processing time taken by each of the plurality of APIs over a period.
10. A system (100) for load balancing, the system comprises an application
(102), a load balancer (104) and a plurality of servers (106), the load balancer
is configured to:

receive a plurality of requests from the application (102), wherein the load balancer (104) distributes network or application traffic across the plurality of servers (106) capable of fulfilling the plurality of requests; and
prioritize execution of the plurality of requests with a low execution time.
11. The system (100) as claimed in claim 10, wherein a predefined category
from a plurality of predefined categories is assigned to each of the
plurality of requests based on a predefined category associated with each
of a plurality of application programming interfaces (APIs) associated with
each of the plurality of requests, wherein the plurality of predefined
categories is based on an execution time;
12. The system (100) as claimed in claim 10, wherein the load balancer (104)
routes each of the plurality of requests based on the plurality of predefined
categories, wherein each of the plurality of requests is routed to a server of the
plurality of servers (106).

Documents

Application Documents

#	Name	Date
1	202111013134-STATEMENT OF UNDERTAKING (FORM 3) [25-03-2021(online)].pdf	2021-03-25
2	202111013134-POWER OF AUTHORITY [25-03-2021(online)].pdf	2021-03-25
3	202111013134-FORM 1 [25-03-2021(online)].pdf	2021-03-25
4	202111013134-DRAWINGS [25-03-2021(online)].pdf	2021-03-25
5	202111013134-DECLARATION OF INVENTORSHIP (FORM 5) [25-03-2021(online)].pdf	2021-03-25
6	202111013134-COMPLETE SPECIFICATION [25-03-2021(online)].pdf	2021-03-25