System And Method For Optimized Dynamic Routing And Scheduling Of

< Back

System And Method For Optimized Dynamic Routing And Scheduling Of Capacitated Resources

Abstract: This disclosure relates generally to dynamic routing and scheduling of capacitated resources in a geographical area. Various conventional systems for routing and scheduling of capacitated resources do not consider dynamic nature of supply and demand as well as clustering of geographical area, thereby rendering the system functionality static which is independent of demand. The disclosed method considers the dynamic nature of supply and demand to dynamically identify multiple clusters in the geographical area, and utilizes a reinforcement learning model for optimizing the dynamic routing and scheduling of the capacitated resources for predictive and real-time operations. In an embodiment, the disclosed system fragments the optimization task into multiple fragmented tasks and processes each of the fragmented tasks by a distinct thread of a multi-core multi-threaded processor which leads to efficient utilization of memory and processing power. To be published with FIG. 2

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

24 June 2020

Publication Number

53/2021

Publication Type

INA

Invention Field

ELECTRONICS

Status

Email

kcopatents@khaitanco.com

Parent Application

Patent Number

Legal Status

Grant Date

2024-12-31

Renewal Date

Applicants

Tata Consultancy Services Limited

Nirmal Building, 9th Floor, Nariman Point Mumbai Maharashtra India 400021

Inventors

1. BANDYOPADHYAY, Tarashankar

Tata Consultancy Services Limited Dhara Building, Unit-V, No. 69/4, Salarpuria GR Tech Park, Dhara Block, Mahadevapura, K.R.Puram, Whitefield Road, Bangalore Karnataka India 560066

2. RAMAKRISHNAN, Selvakumaresan

Tata Consultancy Services Limited SIRUSERI TECHNO PARK, Plot No. 1/G1, SIPCOT IT Park, Siruseri, Navalur Post, Kancheepuram Dist., Chennai Tamil Nadu India 603103

3. GANESAN, Viswanath

Tata Consultancy Services Limited IIT-Madras Research Park, Block A, Second Floor, Phase - 2, Kanagam Road, Taramani, Chennai Tamil Nadu India 600113

4. SELVARAJU, Naveen Kumar

Tata Consultancy Services Limited SIRUSERI TECHNO PARK, Plot No. 1/G1, SIPCOT IT Park, Siruseri, Navalur Post, Kancheepuram Dist., Chennai Tamil Nadu India 603103

5. PEDHAVI, Pratik

Tata Consultancy Services Limited SIRUSERI TECHNO PARK, Plot No. 1/G1, SIPCOT IT Park, Siruseri, Navalur Post, Kancheepuram Dist., Chennai Tamil Nadu India 603103

6. PONNUSWAMY, Anandakumaran

Tata Consultancy Services Limited SIRUSERI TECHNO PARK, Plot No. 1/G1, SIPCOT IT Park, Siruseri, Navalur Post, Kancheepuram Dist., Chennai Tamil Nadu India 603103

7. JAWAHAR, Thanga

Tata Consultancy Services Limited SIRUSERI TECHNO PARK, Plot No. 1/G1, SIPCOT IT Park, Siruseri, Navalur Post, Kancheepuram Dist., Chennai Tamil Nadu India 603103

Specification

FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003
COMPLETE SPECIFICATION (See Section 10 and Rule 13)
Title of invention:
SYSTEM AND METHOD FOR OPTIMIZED DYNAMIC ROUTING AND SCHEDULING OF CAPACITATED RESOURCES
Applicant
Tata Consultancy Services Limited A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th floor,
Nariman point, Mumbai 400021,
Maharashtra, India
Preamble to the description
The following specification particularly describes the invention and the manner in which it is to be performed.

TECHNICAL FIELD [001] The disclosure herein generally relates to optimization models for predictive and real-time operations, and, more particularly, to system and method for optimized dynamic routing and scheduling of capacitated resources such as trucks, automated guided vehicles, and so on for predictive and real-time operations.
BACKGROUND
[002] Supply chain operations, in various industries, require either movement of capacitated resources across a geography or within-plant operation of resources to move goods/cargo/materials from one or more sources to one or more destinations. Herein, the capacitated resources refers to vehicles having capacity to accommodate goods/cargo/material and so on.
[003] The traditional approaches provide various ways to address similar logistic needs, there are certain specific issues and challenges that remain to be fulfilled by such methodologies. Such challenges stem from the business requirements, working systems, country specific logistics environments and other influencing factors. A few of the factors that can be generalized in the form of constraints are country specific norms applicable for trucks and drivers, road regulations such as speed limitations, time window of availability of docks for loading and unloading goods, etc., consideration of the onward and return journeys, which fundamentally necessitates a minimization of return journey distances while maximizing onward journey distances, requirement of completion of all the orders scheduled for a certain period.
SUMMARY [004] Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one embodiment, a processor-implemented method for optimized dynamic routing and scheduling of capacitated resources is provided. The method includes training, via

one or more hardware processors, a reinforcement learning (RL) model using a training data associated with the dynamic routing and scheduling of capacitated resources at a geographical area, the training data comprising customer order data, inventory data and capacitated resource data. Further, the method includes obtaining, via the one or more hardware processors, a real-time operation data along with operation constraints for optimization of routing and scheduling of the capacitated resources characterized by an optimization task. Also, the method includes decomposing, via the one or more hardware processors, by the trained RL model, the optimization task into a plurality of fragmented tasks of distinct sizes, each fragmented task of the plurality of fragmented tasks associated with a possible operational scenario associated with various levels of operational values for the capacitated resource data. Moreover, the method includes processing, via the one or more hardware processors, each of the plurality of fragmented tasks using a multi-core multi-thread processors based on a prioritization logic, wherein the prioritization logic is capable of assigning a prioritization score to each of the plurality of fragmented tasks.
[005] In another aspect, a system for optimized dynamic routing and scheduling of capacitated resources is provided. The system includes one or more memories; and one or more hardware processors, the one or more memories coupled to the one or more hardware processors, wherein the one or more hardware processors are configured to execute programmed instructions stored in the one or more memories to train a reinforcement learning (RL) model using a training data associated with the dynamic routing and scheduling of capacitated resources at a geographical area, the training data comprising customer order data, inventory data and capacitated resource data. The one or more hardware processors are further configured by the instructions to obtain a real-time operation data along with operation constraints for optimization of routing and scheduling of the capacitated resources characterized by an optimization task. The one or more hardware processors are further configured by the instructions to decompose by the trained RL model, the optimization task into a plurality of fragmented tasks of distinct sizes, each fragmented task of the plurality of fragmented tasks associated with a

possible operational scenario associated with various levels of operational values for the capacitated resource data. The one or more hardware processors are further configured by the instructions to process each of the plurality of fragmented tasks using a multi-core multi-thread processor based on a prioritization logic, wherein the prioritization logic is capable of assigning a prioritization score to each of the plurality of fragmented tasks.
[006] In yet another aspect, a non-transitory computer readable medium for a method for optimized dynamic routing and scheduling of capacitated resources is provided. The method includes training, via one or more hardware processors, a reinforcement learning (RL) model using a training data associated with the dynamic routing and scheduling of capacitated resources at a geographical area, the training data comprising customer order data, inventory data and capacitated resource data. Further, the method includes obtaining, via the one or more hardware processors, a real-time operation data along with operation constraints for optimization of routing and scheduling of the capacitated resources characterized by an optimization task. Also, the method includes decomposing, via the one or more hardware processors, by the trained RL model, the optimization task into a plurality of fragmented tasks of distinct sizes, each fragmented task of the plurality of fragmented tasks associated with a possible operational scenario associated with various levels of operational values for the capacitated resource data. Moreover, the method includes processing, via the one or more hardware processors, each of the plurality of fragmented tasks using a multi-core multi-thread processors based on a prioritization logic, wherein the prioritization logic is capable of assigning a prioritization score to each of the plurality of fragmented tasks.
[007] It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS

[008] The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles:
[009] FIG. 1 illustrates an example network implementation of a system optimized dynamic routing and scheduling of capacitated resources in accordance with an example embodiment.
[010] FIG. 2 is a flow diagram of a method for optimized dynamic routing and scheduling of capacitated resources according to some embodiments of the present disclosure.
[011] FIG. 3 is a block diagram of an exemplary computer system for implementing embodiments consistent with the present disclosure.
DETAILED DESCRIPTION OF EMBODIMENTS [012] A logistic network may represent either movement of capacitated resources across a geography or within-plant operation of resources to move goods/cargo/materials from one or more sources to one or more destinations. Planning of said movement of the capacitated resources offers certain challenges owing to various factors including, but not limited to, dynamism and uncertainties in customer order placements, supply availability, vehicle availability or status, and so on. The task of route planning and optimization for the capacitated resources requires a predictive planning and a reactive planning. The task requiring predictive planning includes planning the schedules with the routing definitions before the planning horizon (or period of operations). The task requiring reactive planning includes generation of revised schedules with routing plans during the period of operations due reasons related to, for example, breakdown of vehicles, climatic conditions such as snowfall or heavy rains, changes in customer or supply preferences, and so on.
[013] The reactive schedules need to be prepared dynamically and most of the conventional solutions in the market have limited capability to provide scalable approaches and mechanisms to solve aforementioned problems quite effectively. Various conventional systems are available for route planning for movement of the

capacitated resources across a geography. One such system clusters the locations within the geography based on geo-coordinates and customer locations. Such clustering of the geographies does not take into consideration the dynamic nature of supply and demand as well as clustering needs, thereby rendering the system functionality static which is independent of demand. Certain other conventional systems operate by formulating the movement planning (or routing of the capacitated resources) as a cost minimization problem or profit maximization and extend the same to the extent of demand fulfillment as a final criterion for measurement external to the scope of the problem. Many of such conventional systems leverage traditional solution approaches as well as geographical and location-based clustering approaches to solve the problem. However, the applications of such systems are limited to definitions of routes and solving smaller problem instances with limited applications to reactive planning during uncertainties as well as supporting real time decision making during logistic operations.
[014] Various embodiments disclosed herein provide method and system for dynamic routing and scheduling of capacitated resources in an optimized manner. The disclosed system utilizes data and decision support models to foresee and predict uncertainties, capture the nature of the uncertainties, and provide model driven decisions using intelligent learning models to get adaptable answers to respond to practical needs. For example, in one embodiment, the disclosed system enables clustering of the geographical locations based on supply and demand in the supply chain network, which is dynamic in nature and dependent on the market conditions. Moreover, rather than formulating the routing of the capacitated resources as a cost minimization problem, the disclosed method considers the problem as a profit maximization and demand fulfillment problem. The disclosed method takes into consideration not only demand fulfillment but also minimization or elimination of empty trips, balancing supply and demand across multiple clusters and their behavior over time. In addition, the disclosed system embodies a reinforcement learning model that utilizes multiple factors to define the learning process. Such factors include, but are not limited to, full truck load movement

frequency, empty truck movement frequency, margins and costs, historical route definitions captured on an ongoing basis, and so on. An important contribution of the disclosed embodiments is that the disclosed system embodies multi-core multi-thread processors for processing multiple fragmented tasks (obtained by fragmenting the routing and optimization task) based on a prioritization logic. Fragmenting the task into multiple fragmented tasks and processing each of the fragmented tasks utilizing distinct threads of the multi-core multi-thread processors facilitates in faster processing and effective memory utilization.
[015] Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments. It is intended that the following detailed description be considered as exemplary only, with the true scope being indicated by the following claims.
[016] Referring now to the drawings, and more particularly to FIG. 1 through 3, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.
[017] FIG. 1 illustrates an example network implementation 100 of a system 102 optimized dynamic routing and scheduling of capacitated resources in accordance with an example embodiment. In an embodiment, the system 102 facilitates route determination and resource assignment for capacitated resources based on a training and use of a machine learning process to obtain solutions to the logistics problem. For example, disclosed system trains a reinforcement learning (RL) model using a training data associated with the dynamic routing and scheduling of capacitated resources at a geographical area. The system 102 further obtains a real-time operation data along with operation constraints for optimization

of routing and scheduling of the capacitated resources characterized by an optimization task. The trained RL model decomposes the optimization task into multiple fragmented tasks of distinct sizes. Each of the fragmented tasks is processed by the system 102 using a multi-core multi-thread processor based on a prioritization logic, as will be described further in the description. Th system 102 leverages parallelization of computing processing threads considering memory requirements to solve decomposed/fragmented tasks of the original optimization task to run as quick as possible and obtain meaningful solutions for predictive planning as well as enabling quick solutions for reactive needs as during real-time operations.
[018] Although the present disclosure is explained considering that the system 102 is implemented on a server, it may be understood that the system 102 may also be implemented in a variety of computing systems 104, such as a laptop computer, a desktop computer, a notebook, a workstation, a cloud-based computing environment and the like. It will be understood that the system 102 may be accessed through one or more devices 106-1, 106-2... 106-N, collectively referred to as devices 106 hereinafter, or applications residing on the devices 106. Examples of the devices 106 may include, but are not limited to, a portable computer, a personal digital assistant, a handheld device, a Smartphone, a tablet computer, a workstation and the like. The devices 106 are communicatively coupled to the system 102 through a network 108.
[019] In an embodiment, the network 108 may be a wireless or a wired network, or a combination thereof. In an example, the network 108 can be implemented as a computer network, as one of the different types of networks, such as virtual private network (VPN), intranet, local area network (LAN), wide area network (WAN), the internet, and such. The network 106 may either be a dedicated network or a shared network, which represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), and Wireless Application Protocol (WAP), to communicate with each other. Further, the network 108 may include a variety of network devices, including routers,

bridges, servers, computing devices, storage devices. The network devices within the network 108 may interact with the system 102 through communication links.
[020] As discussed above, the system 102 may be implemented in a computing device 104, such as a hand-held device, a laptop or other portable computer, a tablet computer, a mobile phone, a PDA, a smartphone, and a desktop computer. The system 102 may also be implemented in a workstation, a mainframe computer, a server, and a network server. In an embodiment, the system 102 may be coupled to a data repository, for example, a repository 112. The repository 112 may store data processed, received, and generated by the system 102. In an alternate embodiment, the system 102 may include the data repository 112.
[021] The network environment 100 supports various connectivity options such as BLUETOOTH®, USB, ZigBee and other cellular services. The network environment enables connection of devices 106 such as Smartphone with the server 104, and accordingly with the database 112 using any communication link including Internet, WAN, MAN, and so on. In an exemplary embodiment, the system 102 is implemented to operate as a stand-alone device. In another embodiment, the system 102 may be implemented to work as a loosely coupled device to a smart computing environment. The components and functionalities of the system 102 are described further in detail with reference to FIGS. 2-3.
[022] FIG. 2 illustrates a flow diagram of a method 200 for optimized dynamic routing and scheduling of capacitated resources in accordance with an example embodiment. The method 200 depicted in the flow chart may be executed by a system, for example, the system 102 of FIG. 1. In an example embodiment, the system 100 may be embodied in a computing device.
[023] Operations of the flowchart, and combinations of operation in the flowchart, may be implemented by various means, such as hardware, firmware, processor, circuitry and/or other device associated with execution of software including one or more computer program instructions. For example, one or more of the procedures described in various embodiments may be embodied by computer program instructions. In an example embodiment, the computer program instructions, which embody the procedures, described in various embodiments may

be stored by at least one memory device of a system and executed by at least one processor in the system. Any such computer program instructions may be loaded onto a computer or other programmable system (for example, hardware) to produce a machine, such that the resulting computer or other programmable system embody means for implementing the operations specified in the flowchart. It will be noted herein that the operations of the method 200 are described with help of system 102. However, the operations of the method 200 can be described and/or practiced by using any other system.
[024] The disclosed method 200 relates to dynamic routing and scheduling of capacitated resources such as trucks, automated guided vehicles, etc. and optimization of the number of transportation resources required to handle simultaneous delivery and pick up operations in a logistic network. The logistic network may represent either movement of capacitated resources across a geography or within-plant operation of resources to move goods/cargo/materials from one or more source locations to one or more destination locations. More specifically, the disclosed method relates to route determination and resource assignment by characterizing the task optimization against the set of models and solution approaches. The selected model along with solution approach may execute on a predefined set of orders to be fulfilled on a specific set of requirements from the logistic operations. The resource movement addresses fully loaded or ‘onward’ and completely unloaded or ‘return’ journeys, which as a complete process or operation, gets a higher level of overall operating efficiency while meeting all the constraints based on business requirements and prevalent specific logistics norms.
[025] The method includes training a RL model for optimization of a task associated with routing and scheduling of capacitated resources. In an embodiment, the RL model may be a Q-learning model. The disclosed RL model utilizes a training data comprising of a historical data to define and understand the supply demand characteristics in a geographical area/location, logistics/within-plant network configuration and capacitated resource types available for the short-term planning horizon. The RL model is trained on factors, including but not limited to, customer order/supply categorization, material mapping to transportation resource

types, loading and unloading windows, the cost of loaded and unload trips along with possible set of route definitions, and other operational constraints such as running distances with maximum/minimum travel times, operating hours of sourcing locations, and so on, to build intelligence in operations. The RL model may further characterize the definition of the task and decomposes the task based on the historical data. Characterization of the definition of the task includes, for example, only pickup and delivery task, simultaneous pickup and delivery task, routing task with split deliveries, so on and so forth.
[026] The trained RL model is utilized for determining possible set of operational scenarios considering various levels of operational values for the selected factors. The objective of the task formulation and training of RL model may be to either minimize the cost of operations of the logistics operations or maximization of operating profit and revenues or ensuring on-time delivery or pickup as applicable based on the nature of the entity. The method utilizes the identified task formulation and RL model to solve the operational daily or weekly logistics task for future planning horizon considering the objective in operation (cost minimization or profit/revenue maximization, or any defined objective) to define a number of capacitated resources to be considered; details (for example, where, when and choice) of routes to schedule the capacitated resources, and so on. The RL model ensures that all orders associated with the task are fulfilled at expected service levels, which may be defined as performance of the system.
[027] The RL model is configured to update the route planning based on any revision in forecast data (or operational data) with the progress of time (based on real-time or near real time feeds). The RL model evaluates updated optimized route/resource allocation to anticipate revisions and/or distortions considering disturbances and disruptions associated with events and responds accordingly. The output of the RL model includes complete delivery and pickup schedule along with route definitions. The method 200 for optimized dynamic routing and scheduling of capacitated resources is described further in detail with reference to description below.

[028] As previously described, the RL model is trained for optimized dynamic routing and scheduling of capacitated resources. The RL model is trained using a plurality of input parameters, the customer location index (l), the supply location index (l”), the product index (p), the time index (t), the truck index (m), demand[l][p] representative of the demand [l][p] for product p at customer location l, distanceMatrix[l”][l] representing the distance between any two locations l” and l, supply[l][p] representing the maximum available supply of product p at plant ocation l, start_time[l], delivery_time[l] the start and end time of time window at location l that represents either supply of any product can be picked up by the trucks at supply location l (or) demand can be delivered at customer location l, truck_docking[m][l] number of capacitated resources (such as truck) type m that can be docked simultaneously at location l for loading or unloading of load, and docking_time[m][l] the time duration for loading or unloading of truck type m at location l.
[029] The variables considered by the disclosed RL model are as defined hereunder:
Full_FreqTable[l”][l][m] representing average daily frequency of loaded
trips of truck type m between supply location l” and demand location l
Empty_FreqTable[l][l”][m] representing average daily frequency of
empty trips of truck type m between customer location l and supply
location l”
Freq_Current[l”][l][m] representing number of full load trips made by the
truck type m between supply location l” and demand location l on a given
day
Empty_Current[l”][l][m] representing number of full load trips made by
the truck type m between customer location l and supply location l” on a
given day
Full_route_list[r][m][k] contains the list of complete routes of a truck trip
which are frequently used when demand and supply are defined for the day
for the customer locations and supply locations. A complete route is defined
as alternate set of supply and customer locations (defined using index k and

k e [l,l”]) in a sequence to complete the trip for a day by a truck of type m when supply and demand are non-zeros at the respective locations. The list keeps the list r route definitions which are used over last N days. N could be defined in between 30 to 90 defining the period of the month or a quarter. Full_route_Freq[r][m] representing Frequency measured in terms of number of trips made by truck type m over defined period length. Full_MarginTable[l”][l][m] representing the profit made by truck type m while carrying full load between supply location l” and customer location l
Empty_CostTable[l][l”][m] representing remaining profit made by truck type m while moving an empty truck between customer location l and supply location l”.
[030] The method 200 initiates by training a reinforcement learning (RL) model dynamic routing and scheduling of capacitated resources at a geographical area. The geographical area is partitioned into a plurality of location clusters. Each location cluster of the plurality location clusters includes a set of supply locations and a set of demand locations. The plurality of supply and customer demand locations in a geography are grouped into clusters of locations. In an embodiment, the plurality of supply and customer demand locations in the geography may be grouped into the plurality of clusters locations based on factors, including but not limited to, maximal distance travelled and time taken to travel. For example, the plurality of location clusters may be dynamically defined based at least on a demand and supply ratio (DSR) and an adjacency parameter indicative of proximity within clusters of the plurality of cluster locations. The DSR is used to classify the market scenarios into four categories viz., high, balanced, low and sporadic (or intermittent). The DSR may be represented as:
Demand to Supply Ratio (DSR) = Total Demand across all customer locations/Total Supply from all supply location (e.g. plants)
DSR(p) =∑t Demand/ ∑t"Supply where l” indicates the supply
locations.

[031] In an example embodiment, the DSR value of 1 with a variation of +5% to + 10 % may be considered to be a balanced scenario. DSR value of more than 1.1 may be indicative of a scenario where demand exceeds supply, while the DSR value less than 1.1 may be indicative of a scenario where the supply exceeds the demand. The DSR value less than 0.5 or even further lower indicates a very low demand scenario. A consistent lower value for the DSR in a cluster over time indicates either sporadic or intermittent demand behavior in a cluster from amongst the plurality of clusters.
[032] Herein, the plurality of clusters associated with the geographical
area may be predefined prior to the training of the RL model. Said clusters may be
rank ordered based on the DSR Ratio. For example, a geographical area may have
multiple supply locations (for instance, between 10 to 25) and 100s of customer
locations. Such plurality of clusters may be grouped based on the DSR Ratio and
the adjacency parameter. The adjacency parameter may be indicative of the
adjacent highest demand cluster. In an embodiment, a maximum of Ø number of
clusters may be grouped together. In various scenarios, Ø may be set a value of 2
or 3. An example of clustering the geographical area is as below:

__
Let C be the total number of clusters
Let Ω = C
For every cluster c do the following (till c < C ) If (DSR < 0.5)
find the adjacent highest adjacent demand cluster and group them together
Ω = Ω+ 1
Define the cluster index as Ω for the new group End For (c = c +1)
A new index say ĉ is used to identify the updated list of clusters

__
[033] Herein, the plurality of clusters may be categorized on daily basis as well as with respect to behavior of demand and supplies over time as one among

the following viz., demand exceeds supply or vice versa, balanced supply and demand, and low demand locations. Clusters that may be experiencing relatively less demand over a period of time may be marked as intermitted demand clusters. In an embodiment, it may be assumed that the supply locations for the customer demand are predominantly located within same cluster unless or otherwise stated. During clustering, low demand or intermittent demand clusters may be grouped together with adjacent high demand clusters for better utilization of trucks as well as from the point of cost optimization. In certain scenarios, the adjacent high demand clusters may be preferred first in the order of priority for grouping with a low or intermittent demand cluster, followed by a balanced demand clusters and so on.
[034] In an embodiment, a trained RL model is applied to each cluster of the plurality of clusters to estimate a capacity associated with the capacitated resources for each cluster. In an embodiment, the RL model includes a Q-learning model. In an embodiment, the RL model is trained by dynamically defining the plurality of location clusters in the geographic area.
[035] At 202, the method 200 includes training the RL model using a training data associated with the dynamic routing and scheduling of capacitated resources at the geographical area. The training data includes a customer order data, an inventory data and a capacitated resource data. Herein, the customer order data comprises of material type, quantity to be shipped, delivery window, service levels, delivery location and supply locations. The capacitated resource data includes nature of capacitated resource to enable movement of inventory from a source location to a destination location. Moreover, the operational constraints includes customer order/supply categorization, material mapping to transportation resource types, loading and unloading windows, the cost of loaded and unload trips along with possible set of route definitions, and other operational constraints such as running distances with maximum/minimum travel times, and operating hours of sourcing locations.
[036] In an embodiment, training the RL model further includes dynamically defining the plurality of location clusters in the geographic area, as

described previously. Each location cluster of the plurality location clusters includes a set of supply locations and a set of demand locations. The plurality of location clusters are dynamically defined based at least on a demand and supply ratio (DSR) and an adjacency parameter indicative of proximity within clusters of the plurality of cluster locations, as previously described. For each DSR scenario associated with each day, the DSR index value is created and updated by using the trained Q-learning model. The Q-learning model for optimized dynamic routing and scheduling is described further in detail below.
[037] The Q- learning model includes the term state and action. The location of an empty truck in a location l 1 after unloading or in a deport at beginning of the planning period (say at the start of the day or shift) is defined as a state. and the action represents the movement of the capacitated resource (for example, the truck) from the current location l 1 to supply location l”1 to satisfy demand at customer location l 2. The total trip time includes empty trip time from l 1 to l”1 and loading time at l”1 as well as unloading time at l 2 and also the waiting time if any at these two locations. The Q- learning model is expected to select an action that provides minimal waiting time or no waiting time unless or otherwise necessary. Using the FullMarginTable and Empty CostTable, the benefit associated in making a trip from l 1 to l”1 to l 2 is computed as below:
benefit[l 1][ l”1 to l 2] = Full_MarginTable[l”1][l2] - Empty_CostTable[l 1][ l”1] The benefit table is defined for all possible combinations of l and l”. The benefit table can be defined for each truck type m as required.
[038] The Q-table, that represent the learnings from the history of past truck movements (obtained from the training data), is initialized to zero to start with. A transition rule of Q learning is defined using the below formula: Q[l 1][ l”1 to l 2] = benefit[l 1][ l”1 to l 2] + y * Max[Q(next state, all actions)]
[039] The various possible next states are the positions of the trucks after unloading at various customer locations (defined as l 1, l 2, l 3, … and so on) that can be reached after picking up supplies of products from supply locations (defined as l”1, l”2, l”3, … and so on). It is assumed that a virtual agent learns through the experience using the Q-learning model by unsupervised learning process. In an

embodiment, the virtual agent may be embodied in the system 102 as the Q-learning
model. The training and learning process of the virtual agent comprises of exploring
all possible combination current permissible states (defined as ‘episodes’ in the Q-
learning model) and their respective feasible future states till the goal is met.
[040] The Gamma parameter (y) has a range of 0 to 1 (0 <= Gamma > 1). If
Gamma is closer to zero, the virtual agent may tend to consider only immediate
rewards. If Gamma is closer to one, the virtual agent may consider future rewards
with greater weight, willing to delay the reward. The reinforcement learning (RL)
algorithm is defined hereinunder below:
RL algorithm for optimized dynamic routing and scheduling of capacitated
resources

1. Set the gamma parameter (y), and compute the rewards defined in matrix benefit.
2. Initialize matrix Q[l i][ l”s to lj] to zero.
3. For each episode e:
a. Select a random initial state. Let the state be l i.
b. Do While {no more additional states [supply and customer
locations] can be added}.
i. Select one among all possible actions (loaded trips l”s to lj)
for the current state. ii. Get maximum Q value for this next state based on all
possible actions. iii. Compute: Q(li, l”s to lj) = benefit ( l i, l”s to lj) + y* Max[Q(lj,
l”s1 to lj), Q(lj, l”s2 to lj), ….]
iv. Using this possible action, consider going to lj. v. Set the next state lj as the current state li.
c. End Do
4. End For
[041] In an embodiment, the training of the RL model may include historical movements over last N days using and a final Q value may be defined by a Q matrix. Using the Q matrix, the top R routes may be defined in the matrix Full route list[r][m][k] for every truck type m. The frequency of use of the routes over last N days may be defined in Full route_freq[r][m]. The routes may be

dropped when the frequency of the occurrence of the routes falls below a threshold
probability value.
The threshold probability value is defined as the ratio of the frequency of the
selected route r to the maximum frequency of the routes defined in R. The threshold
value may be set equal to 0.2 in an implementation.
[042] The Q-learning model is applied on each one of the clusters
independently to define the learning pattern. In an example implementation, the
three possible DSR values viz., DSR > 1.1, 1.1 < DSR < 0.9, and 0.9 < DSR < 0.5
may be considered. The Q matrix is scenario dependent for each cluster and revision
of a Q Matrix is respectively defined for application on real world decision
requirements. The objective of the approach is to estimate the capacity (or total
number) of capacitated resources (such as trucks) for each cluster (or grouped
clusters) separately for the planning requirements, as described below:

_
1. Let total number of trucks, T = 0
2. For each cluster ĉ do the following
a. For each route r (for each truck type m) in Fullroutelist
i. Evaluate r for supply and demand ii. If route is found, fix the route. Set T = T + 1. iii. Update the demand and supply values
b. End For (r = r +1)
c. Use Q-Matrix to define additional a route r considering the supplies
and demand at various l" and l. Increment T i.e., T = T +1
d. Update Q-Matrix, if new states are defined (l" to l) and used.
e. Update Fullroutelist with the new route r and define/increment
count by 1.
f Repeat steps c & d until all the demand is satisfied or supplies
become exhausted. g. Compute the updated FullrouteFreq, FullFreqTable and
Empty FreqTable.
Herein,
Full_FreqTable[l”][l] = (Full_FreqTable[l”][l] + Freq_Current[l”][l])/N

where, N is the number of days used for computing the average value and
is capped by a maximum number to have relation with recency with history.
Empty_FreqTable[l”][l]=(Empty_FreqTable[l”][l]+Empty_Current[l”][l ])/N
where, N is the number of days used for computing the average value and is capped by a maximum number to have relation with recency with history. For an optimum learning, the values of Empty_FreqTable may be kept at minimum levels.
[043] The trained RL model is utilized for optimized routing and scheduling of the capacitated resources in the geographical location. For example, at 204, the method 200 includes obtaining a real-time operation data along with operation constraints for optimization of routing and scheduling of the capacitated resources characterized by an optimization task. Herein, the real-time operation data may include, but is not limited to, current location of trucks, status of running of trucks such as delayed or break downs, unexpected urgent customer demands, availability of trucks and so on. The real-time operation data may be obtained from, for example, handheld devices (for example, the devices 106 of FIG. 1) with users, sensors/tags in capacitated resources (vehicles, trucks, etc.), feeds on climate information, and so on.
[044] Herein, an optimization task may involve routing of trucks across a day or a planning period to ship products from one or more plant locations to multiple customer locations. Clustering of plants and customer locations based on DSR ratio may provide multiple optimization tasks to solve the problem in decomposed framework using multi-threaded processors. Each one of tasks (represented using clusters) in a larger problem with 100s of clusters may take up to 1 to 2 GB of RAM Memory based on the problem size we do look into with the set of data definitions we have considered to solve the problem.
[045] At 206, the method 200 includes decomposing, by the trained RL model, the optimization task into a plurality of fragmented tasks of distinct sizes. Each fragmented task of the plurality of fragmented tasks may be associated with a possible operational scenario (or episode) associated with various levels of operational values for the capacitated resource data.

[046] The plurality of fragmented tasks may be determined based a number of threads of a multi-core multi-thread processor (embodied in the system 102) that may be available in computing processor and the RAM memory available for solving the optimization task. The size of the optimization task may be defined by a number of orders or customers (drop off/pick up locations) to be covered, and number of vehicles (or capacitated resources) available or supply points available or both. In an embodiment, the sizes of the plurality of fragmented tasks may be predetermined, and computed by using a set of most frequent and predefined scenarios (episodes) whose computational time, RAM memory requirements, and number of threads are optimally defined for an optimization task of a predetermined size.
[047] A historical data associated with various conditional events may be populated for a set of varied problem sizes and their occurrence frequencies as well as computational effort and needs are stored using learning process which gets updated based on the relevance of the scenarios when the problem is solved either in a predictive or reactive manner for business requirements. For example, a historical data on conditional events (such as natural events, traffic event, and other operational events) may be captured using a multi-dimensional matrix or table and probabilities of said conditional events may be stored in the database (for example, the database 112) as possible rules of recommendations (to the drivers of the capacitated resources). For example, in case of a conditional event such as a snow-storm, the system 102 may suggest possible alternative routes to drivers, in case of a conditional event such as breakdown of the vehicle, the system may suggest possible availability of spare vehicle or dispatching a crew to fix the vehicle considering the location of breakdown, and so on. The optimization tasks for the conditional events may be formulated as combinatorial optimization tasks and may be solved by formulating the optimization tasks as linear/mixed integer/non-linear programming tasks. The larger the size of the optimization task, with higher DSR Ratio, it may require substantial number of trucks to be used and obtain a better solution.

[048] At 208, the method 200 includes processing each of the plurality of fragmented tasks using a multi-core multi-thread processor based on a prioritization logic. The prioritization logic may be capable of assigning a prioritization score to each of the plurality of fragmented tasks.
[049] For example, for each demand supply scenario (or the episode), the historical data and the learning process (using the Q-Learning model) for each of the clusters is run in a single thread in parallel. Further, the computational time required to create the Q-Matrix as well as RAM memory required may be stored in a table for various optimization task sizes of number of supply and demand (or customer) location pairs along with the average number of capacitated resources (or vehicles/trucks) required for each type m for logistics operations. An algorithm for training the Q-learning model to incorporate the parallelization of learning process using the multicore-processor is described below:
Algorithm for Parallelization of Learning Process using Multi-core
processor

__
Training Methodology for Q Learning Algorithm
1. For each cluster ĉ do the following
a. Compute DSR Ratio
b. If DSR > 1.1, set dsr_index_value = 1
else if 1.1 < DSR < 0.9 set dsr_index_value = 2
else if 0.9 < DSR < 0.5 set dsr_index_value = 3 else break
c. For each day d in historical data
i. Run Q-Learning Algorithm to create & update Q-Matrix[dsr_index_value] for each DSR Scenario separately ii. Do while (till demand is satisfied or no more supply is possible)
1. Create route r to satisfy demand locations l from supply locations l”
d. End For
e. Create Full_route_list[dsr_index_value]
2. End For
3. Return Q-Matrix

[050] The training data is used to create the Q-Matrix. Further a test data is used to create and record an average computational time and average RAM memory required to define the expected number of capacitated resources to be used in a day for each one of the demand supply scenarios. An algorithm for testing and validating the Q-learning model for the learning process using the multicore-processor is described below:
[051]
Testing and Validation Methodology for Q Learning Algorithm
1. For each cluster ĉ do the following
a. Compute DSR Ratio
b. If DSR > 1.1, set dsr_index_value = 1
else if 1.1 < DSR < 0.9 set dsr_index_value = 2
else if 0.9 < DSR < 0.5 set dsr_index_value = 3 else break
c. For each day d in historical data in test data set
i. Evaluate Full_route_list[dsr_index_value]. ii. If route r is chosen, the schedule for the route by the identified truck of type m is defined. iii. Run Q-Learning Algorithm to update Q-Matrix, if new
states are observed
iv. Store Computing time[dsr_index_value] and
RAM[dsr_index_value] v. Ave_Comp_Time[dsr_index_value] =
Computing_time[dsr_index_value]/Count of Observations vi. Ave_RAM[dsr_index_value] =
RAM[dsr_index_value]/Count of Observations
d. End For
e. Store Problem Size (No of supply locations, customer
locations), Number of Trucks[dsr_index_value],
Ave_Comp_Time[dsr_index_value], and
Ave_RAM[dsr_index_value].
f. Define the updated Full_route_Freq
2. End For
[052] The updating of the Full_route_Freq list eliminates the routes that have not appeared in the sequence list at least once in a half of the period length of

30 days (to 90 days) as defined for evaluation of test data set or a user defined number. New routes generated may be added to the list every day and the routes which have an average frequency count less than 1/30 (or 1/90) may get eliminated from the list. The evaluation of routes in step (c(i)) in the above algorithm (or validation methodology) may ensure that the routes are fully feasible considering the waiting time and docking time of the trucks. Departure time[t][l] = Arrival time[t][l] + waiting time[t][l”] + docking time[t][l]
[053] The table 1 presents the various problem sizes and he corresponding DSR Ratio along with the computational time and RAM memory required to solve the problem.
Table 1

Problem Size DSR Number of Average Average
(in terms of supply Ratio Trucks Computational RAM
and customer Estimated Time (secs) Memory
locations) (MB)
(6, 30) 1.3 26 200 100
[054] In an embodiment, the multi-core multi-thread processor prioritizes to process and solve larger fragmented tasks with higher DSR Ratio first followed by fragmented tasks having lower DSR ratios in the same optimization task size. Tougher problems are prioritized to solve first in the order followed by easier problems. In some scenarios, during deficit in supply to meet the demand, supplies from lower DSR ratio clusters may be utilized to meet the same, if feasible. An example method employed for prioritizing the processing of the fragmented tasks leverages an available number of threads, available random access memory (RAM), size of the optimization task, and complexity to prioritize and solve the fragmented tasks defined against various clusters:
1. Estimate the number of threads and RAM memory available
2. Sort the plurality of fragmented tasks in descending order based on problem size, DSR Ratio and the RAM_memory required.
a. Problem_Set = { [(m1,n1), DSR1, RAM1], [(m2,n2), DSR2,
RAM2] | m1 > m2, n1 > n2, DSR1> DSR2, RAM1> RAM2], and so on}

3. Assign the first fragmented task in the list to the first thread and compute the remnant CPU_RAM_Memory
4. Assign the next problem in the list to the next thread if CPU_RAM_Memory > RAMj.
5. Repeat Step 4 till maximum number of fragmented tasks are assigned to the threads.
6. The unassigned fragmented task in the list becomes the first problem in the list.
7. Redefine the Problem list in Step 2 and Repeat Steps 4, 5, and 6 [055] The problems with surplus in supply or deficit or unmet demand or
re-clustered with adjacent geographical clusters as appropriate and the fragmented tasks are generated which are dynamically added into the problem list.
[056] A high demand and supply cluster (irrespective of DSR Ratio) may require use of good number of trucks than an average demand/supply cluster and may necessitate to break the problem definition into smaller ones by using the Full_route_list. A relatively larger optimization task may be broken down into fragmented tasks of smaller sizes. For example, 20 supply locations with 200 customer locations may be broken down into four problems of sizes 5 supply locations with 50 customers. The four fragmented tasks may be solved independently as separate clusters with an integration flag and the remnant demand and supply may be finally pooled together to solve and obtain a complete solution.
[057] In an example scenario, for the purpose of creating clusters, it may be assumed that the clusters may be categorized on daily basis as well as with respect to behavior of demand and supplies over time as one among the following viz., demand exceeds supply or vice versa, balanced supply and demand, and low demand locations. Clusters that may be experiencing very less demand over a period of time are marked as intermitted demand clusters. The supply locations for the customer demand may be predominantly located within same cluster unless or otherwise stated. Low demand or intermittent demand clusters may be grouped together with adjacent high demand clusters for better utilization of trucks as well as from the point of cost optimization. The adjacent high demand clusters are

preferred first in the order of priority for grouping with a low or intermittent demand cluster, followed by a balanced demand clusters and so on. The travel time of all types of trucks between any two locations within a cluster may remain the same. A truck may have to be placed in a docking station for loading or unloading of cargo/goods at a supply or customer location. The number of docking stations at a given supply location or delivery location is limited and defined. The trucks may either up the cargo/goods at supply location or deliver the cargo/goods at the customer location within the predefined time window of operations.
[058] Planning horizon may be a predetermined, for example a day. It may be assumed that the total demand in a day across all customer locations has to delivered subject predefined delivery service levels. Every customer order may have a delivery date before which the corresponding goods/cargo has to be delivered. The daily frequency table may be maintained to capture the average number of trips travelled between a supply and a customer location. The products available for delivery may be constrained by the maximal available capacity at the supply location and demand is determined by the market needs. The demand to supply ratio may be used to classify the market scenarios in to four categories viz., high, balanced, low and sporadic (or intermittent).
[059] A trip of a truck may be defined as the distance and duration travelled by the truck from a pick-up location (origin point) to the customer location (delivery point) and back to the any one of the supply location. It includes a loaded trip and an empty trip and also considers the time spent in wait for availability of docking station as well as loading or unloading time.
[060] FIG. 3 is a block diagram of an exemplary computer system 301 for implementing embodiments consistent with the present disclosure. The computer system 301 may be implemented in alone or in combination of components of the system 102 (FIG. 1). Variations of computer system 301 may be used for implementing the devices included in this disclosure. Computer system 301 may comprise a central processing unit (“CPU” or “hardware processor”) 302. The hardware processor 302 may comprise at least one data processor for executing program components for executing user- or system-generated requests. The

processor may include specialized processing units such as integrated system (bus) controllers, memory management control units, floating point units, graphics processing units, digital signal processing units, etc. The processor may include a microprocessor, such as AMD AthlonTM, DuronTM or OpteronTM, ARM’s application, embedded or secure processors, IBM PowerPCTM, Intel’s Core, ItaniumTM, XeonTM, CeleronTM or other line of processors, etc. The processor 302 may be implemented using mainframe, distributed processor, multi-core, parallel, grid, or other architectures. Some embodiments may utilize embedded technologies like application specific integrated circuits (ASICs), digital signal processors (DSPs), Field Programmable Gate Arrays (FPGAs), etc. The processor 302 may be a multi-core multi-threaded processor.
[061] Processor 302 may be disposed in communication with one or more input/output (I/O) devices via I/O interface 303. The I/O interface 303 may employ communication protocols/methods such as, without limitation, audio, analog, digital, monoaural, RCA, stereo, IEEE-1394, serial bus, universal serial bus (USB), infrared, PS/2, BNC, coaxial, component, composite, digital visual interface (DVI), high-definition multimedia interface (HDMI), RF antennas, S-Video, VGA, IEEE 802.11 a/b/g/n/x, Bluetooth, cellular (e.g., code-division multiple access (CDMA), high-speed packet access (HSPA+), global system for mobile communications (GSM), long-term evolution (LTE), WiMax, or the like), etc.
[062] Using the I/O interface 303, the computer system 301 may communicate with one or more I/O devices. For example, the input device 304 may be an antenna, keyboard, mouse, joystick, (infrared) remote control, camera, card reader, fax machine, dongle, biometric reader, microphone, touch screen, touchpad, trackball, sensor (e.g., accelerometer, light sensor, GPS, gyroscope, proximity sensor, or the like), stylus, scanner, storage device, transceiver, video device/source, visors, etc.
[063] Output device 305 may be a printer, fax machine, video display (e.g., cathode ray tube (CRT), liquid crystal display (LCD), light-emitting diode (LED), plasma, or the like), audio speaker, etc. In some embodiments, a transceiver 306 may be disposed in connection with the processor 302. The transceiver may

facilitate various types of wireless transmission or reception. For example, the transceiver may include an antenna operatively connected to a transceiver chip (e.g., Texas Instruments WiLink WL1283, Broadcom BCM4750IUB8, Infineon Technologies X-Gold 618-PMB9800, or the like), providing IEEE 802.11a/b/g/n, Bluetooth, FM, global positioning system (GPS), 2G/3G HSDPA/HSUPA communications, etc.
[064] In some embodiments, the processor 302 may be disposed in communication with a communication network 308 via a network interface 307. The network interface 307 may communicate with the communication network 308. The network interface may employ connection protocols including, without limitation, direct connect, Ethernet (e.g., twisted pair 10/100/1000 Base T), transmission control protocol/internet protocol (TCP/IP), token ring, IEEE 802.11a/b/g/n/x, etc. The communication network 308 may include, without limitation, a direct interconnection, local area network (LAN), wide area network (WAN), wireless network (e.g., using Wireless Application Protocol), the Internet, etc. Using the network interface 307 and the communication network 308, the computer system 301 may communicate with devices 309 and 310. These devices may include, without limitation, personal computer(s), server(s), fax machines, printers, scanners, various mobile devices such as cellular telephones, smartphones (e.g., Apple iPhone, Blackberry, Android-based phones, etc.), tablet computers, eBook readers (Amazon Kindle, Nook, etc.), laptop computers, notebooks, gaming consoles (Microsoft Xbox, Nintendo DS, Sony PlayStation, etc.), or the like. In some embodiments, the computer system 701 may itself embody one or more of these devices.
[065] In some embodiments, the processor 302 may be disposed in communication with one or more memory devices (e.g., RAM 513, ROM 514, etc.) via a storage interface 312. The storage interface may connect to memory devices including, without limitation, memory drives, removable disc drives, etc., employing connection protocols such as serial advanced technology attachment (SATA), integrated drive electronics (IDE), IEEE-1394, universal serial bus (USB), fiber channel, small computer systems interface (SCSI), etc. The memory drives

may further include a drum, magnetic disc drive, magneto-optical drive, optical drive, redundant array of independent discs (RAID), solid-state memory devices, solid-state drives, etc. Variations of memory devices may be used for implementing, for example, any databases utilized in this disclosure.
[066] The memory devices may store a collection of programs or database components, including, without limitation, an operating system 316, user interface application 317, user/application data 318 (e.g., any data variables or data records discussed in this disclosure), etc. The operating system 316 may facilitate resource management and operation of the computer system 301. Examples of operating systems include, without limitation, Apple Macintosh OS X, Unix, Unix-like system distributions (e.g., Berkeley Software Distribution (BSD), FreeBSD, NetBSD, OpenBSD, etc.), Linux distributions (e.g., Red Hat, Ubuntu, Kubuntu, etc.), IBM OS/2, Microsoft Windows (XP, Vista/7/8, etc.), Apple iOS, Google Android, Blackberry OS, or the like. User interface 317 may facilitate display, execution, interaction, manipulation, or operation of program components through textual or graphical facilities. For example, user interfaces may provide computer interaction interface elements on a display system operatively connected to the computer system 301, such as cursors, icons, check boxes, menus, scrollers, windows, widgets, etc. Graphical user interfaces (GUIs) may be employed, including, without limitation, Apple Macintosh operating systems’ Aqua, IBM OS/2, Microsoft Windows (e.g., Aero, Metro, etc.), Unix X-Windows, web interface libraries (e.g., ActiveX, Java, Javascript, AJAX, HTML, Adobe Flash, etc.), or the like.
[067] In some embodiments, computer system 301 may store user/application data 318, such as the data, variables, records, etc. as described in this disclosure. Such databases may be implemented as fault-tolerant, relational, scalable, secure databases such as Oracle or Sybase. Alternatively, such databases may be implemented using standardized data structures, such as an array, hash, linked list, structured text file (e.g., XML), table, or as hand-oriented databases (e.g., using HandStore, Poet, Zope, etc.). Such databases may be consolidated or distributed, sometimes among various computer systems discussed above. It is to

be understood that the structure and operation of any computer or database component may be combined, consolidated, or distributed in any working combination.
[068] Additionally, in some embodiments, (the server, messaging and instructions transmitted or received may emanate from hardware, including operating system, and program code (i.e., application code) residing in a cloud implementation. Further, it should be noted that one or more of the systems and methods provided herein may be suitable for cloud-based implementation. For example, in some embodiments, some or all of the data used in the disclosed methods may be sourced from or stored on any cloud computing platform.
[069] The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
[070] Various embodiments disclosed herein provides method and system for optimized dynamic routing and scheduling of capacitated resources in a reliable and computationally efficient manner. For example, in an embodiment, a scheduling and/or routing task is split into a plurality of fragments, and each of the plurality of fragments are processed by a distinct thread of a multi-core multi-thread processor based on a prioritization logic. Due to splitting of task and systematic processing of multiple fragments of the task by the multiple threads respectively, the memory utilization of the system is efficient.
[071] It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g. any kind of

computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g. hardware means like e.g. an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g. an ASIC and an FPGA, or at least one microprocessor and at least one memory with software processing components located therein. Thus, the means can include both hardware means and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g. using a plurality of CPUs.
[072] The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various components described herein may be implemented in other components or combinations of other components. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
[073] The illustrated steps are set out to explain the exemplary
embodiments shown, and it should be anticipated that ongoing technological
development will change the manner in which particular functions are performed.
These examples are presented herein for purposes of illustration, and not limitation.
Further, the boundaries of the functional building blocks have been arbitrarily
defined herein for the convenience of the description. Alternative boundaries can
be defined so long as the specified functions and relationships thereof are
appropriately performed. Alternatives (including equivalents, extensions,
variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items

following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
[074] Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
[075] It is intended that the disclosure and examples be considered as exemplary only, with a true scope of disclosed embodiments being indicated by the following claims.

We Claim:
1. A processor implemented method, comprising:
training, via one or more hardware processors, a reinforcement learning (RL) model using a training data associated with dynamic routing and scheduling of capacitated resources at a geographical area, the training data comprising customer order data, inventory data, and capacitated resource data;
obtaining, via the one or more hardware processors, a real-time operation data along with operation constraints for optimization of routing and scheduling of the capacitated resources characterized by an optimization task;
decomposing, via the one or more hardware processors, by the RL model, the optimization task into a plurality of fragmented tasks of distinct sizes, each fragmented task of the plurality of fragmented tasks associated with a possible operational scenario associated with various levels of operational values for the capacitated resource data; and
processing, via the one or more hardware processors, each of the plurality of fragmented tasks using a multi-core multi-thread processor based on a prioritization logic, wherein the prioritization logic is capable of assigning a prioritization score to each of the plurality of fragmented tasks.
2. The processor implemented method of claim 1, further comprises dynamically defining a plurality of location clusters in the geographic area, each location cluster of the plurality location clusters comprising a set of supply locations and a set of demand locations, wherein the plurality of location clusters are dynamically defined based at least on a demand and supply ratio (DSR) and an adjacency parameter indicative of proximity within clusters of the plurality of cluster locations.
3. The processor implemented method of claim 1, wherein the RL model comprises a Q-learning model.

4. The processor implemented method of claim 3, further comprises applying
the Q-learning model on each cluster of the plurality of clusters to estimate a capacity associated with one or more capacitated resources from amongst the plurality of capacitated resources for each cluster, wherein applying the Q-learning model on the each cluster comprises:
obtaining a Q-matrix;
defining, using the Q matrix, a set of top routes in a set of full routes for each of the capacitated resources capable of moving inventory from one or more sources locations to one or more delivery locations in the geographical area;
defining a frequency of use of the set of top routes over a set of days; and
eliminating one or more route from the set of full routes on determination of the frequency of use below a threshold probability value.
5. The processor implemented method of claim 3, wherein training the Q-learning model further comprises:
computing a DSR ratio indicative of a ratio of total demand across a plurality of customer locations to the set of supply locations;
determining a DSR index value based on the DSR ratio;
for each DSR episode associated with each day, creating and updating the DSR index value by using the Q-learning model; and
creating a route to satisfy one or more demand locations from the set of supply locations.
6. The processor implemented method of claim 3, further comprising:
computing a DSR ratio associated with each cluster location from amongst the plurality of cluster locations, the DSR ration indicative of a ratio of total demand across a plurality of customer locations to the set of supply location;
determining a DSR index value based on the DSR ratio;

defining a schedule for a route for a capacitated resource of a specific type;
identifying one or more routes evaluating a plurality of values of DSR index for each of a plurality of DSR episodes in a time period;
computing and updating the DSR index value by using the Q-learning model; and
determining an average time and average memory associated with the computing of the DSR index value.
7. The processor implemented method of claim 1, wherein the customer order data comprises of material type, quantity to be shipped, delivery window, service levels, delivery location and supply locations.
8. The processor implemented method of claim 1, wherein the capacitated resource data comprises nature of capacitated resource to enable movement of inventory from a source location to a destination location.
9. The processor implemented method of claim 1, wherein the operational constraints comprise customer order/supply categorization, material mapping to transportation resource types, loading and unloading windows, cost of loaded and unload trips along with possible set of route definitions, and other operational constraints such as running distances with maximum/minimum travel times, and operating hours of sourcing locations.
10. A system (100), comprising:
a memory (102) storing instructions;
one or more communication interfaces (106); and
one or more hardware processors (104) coupled to the memory (102) via the one or more communication interfaces (106), wherein the one or more hardware processors (104) are configured by the instructions to:
train a reinforcement learning (RL) model using a training data
associated with the dynamic routing and scheduling of capacitated resources

at a geographical area, the training data comprising customer order data, inventory data and capacitated resource data;
obtain a real-time operation data along with operation constraints for optimization of routing and scheduling of the capacitated resources characterized by an optimization task;
decompose by the RL model, the optimization task into a plurality of fragmented tasks of distinct sizes, each fragmented task of the plurality of fragmented tasks associated with a possible operational scenario associated with various levels of operational values for the capacitated resource data; and
process each of the plurality of fragmented tasks using a multi-core multi-thread processor based on a prioritization logic, wherein the prioritization logic is capable of assigning a prioritization score to each of the plurality of fragmented tasks.
11. The system of claim 10, wherein the one or more hardware processors are further configured by the instructions to dynamically define a plurality of location clusters in the geographic area, each location cluster of the plurality location clusters comprising a set of supply locations and a set of demand locations, the plurality of location clusters dynamically defined based at least on a demand and supply ratio (DSR) and an adjacency parameter indicative of proximity within clusters of the plurality of cluster locations.
12. The system of claim 10, wherein the RL model comprises a Q-learning model.
13. The system of claim 12, wherein the one or more hardware processors are further configured by the instructions to apply the Q-learning model on each cluster of the plurality of clusters to estimate a capacity associated with one or more capacitated resources from amongst the plurality of capacitated resources for each cluster, wherein to apply the Q-learning model on the

each cluster, wherein the one or more hardware processors are further configured by the instructions to:
obtain a Q-matrix;
define, using the Q matrix, a set of top routes in a set of full routes for each of the capacitated resources capable of moving inventory from one or more sources locations to one or more delivery locations in the geographical area;
define a frequency of use of the set of top routes over a set of days, and
eliminate one or more route from the set of full routes on determination of the frequency of use below a threshold probability value.
14. The system of claim 12, wherein to train the Q-learning model, the one or more hardware processors are further configured by the instructions to:
compute a DSR ratio indicative of a ratio of total demand across a plurality of customer locations to the set of supply locations;
determine a DSR index value based on the DSR ratio;
for each DSR episode associated with each day, creating and updating the DSR index value by using the Q-learning model;
create a route to satisfy one or more demand locations from the set of supply locations.
15. The system of claim 12, wherein the one or more hardware processors are
further configured by the instructions to:
compute a DSR ratio associated with each cluster location from amongst the plurality of cluster locations, the DSR ration indicative of a ratio of total demand across a plurality of customer locations to the set of supply location;
determine a DSR index value based on the DSR ratio;
define a schedule for a route for a capacitated resource of a specific type;

identify one or more routes evaluating a plurality of values of DSR index for each of a plurality of DSR episodes in a time period;
compute and update the DSR index value by using the Q-learning model; and
determine an average time and average memory associated with the computing of the DSR index value.
16. The system of claim 10, wherein the customer order data comprises of material type, quantity to be shipped, delivery window, service levels, delivery location and supply locations.
17. The system of claim 10, wherein the capacitated resource data comprises nature of capacitated resource to enable movement of inventory from a source location to a destination location.
18. The system of claim 10, wherein the operational constraints comprises customer order/supply categorization, material mapping to transportation resource types, loading and unloading windows, cost of loaded and unload trips along with possible set of route definitions, and other operational constraints such as running distances with maximum/minimum travel times, and operating hours of sourcing locations.

Documents

Orders

Section	Controller	Decision Date

Application Documents

#	Name	Date
1	202021026741-IntimationOfGrant31-12-2024.pdf	2024-12-31
1	202021026741-STATEMENT OF UNDERTAKING (FORM 3) [24-06-2020(online)].pdf	2020-06-24
1	202021026741-Written submissions and relevant documents [16-04-2024(online)].pdf	2024-04-16
2	202021026741-Correspondence to notify the Controller [19-03-2024(online)].pdf	2024-03-19
2	202021026741-PatentCertificate31-12-2024.pdf	2024-12-31
2	202021026741-REQUEST FOR EXAMINATION (FORM-18) [24-06-2020(online)].pdf	2020-06-24
3	202021026741-FORM-26 [19-03-2024(online)].pdf	2024-03-19
3	202021026741-PROOF OF RIGHT [24-06-2020(online)].pdf	2020-06-24
3	202021026741-Written submissions and relevant documents [16-04-2024(online)].pdf	2024-04-16
4	202021026741-US(14)-HearingNotice-(HearingDate-04-04-2024).pdf	2024-02-28
4	202021026741-FORM 18 [24-06-2020(online)].pdf	2020-06-24
4	202021026741-Correspondence to notify the Controller [19-03-2024(online)].pdf	2024-03-19
5	202021026741-FORM-26 [19-03-2024(online)].pdf	2024-03-19
5	202021026741-FORM 1 [24-06-2020(online)].pdf	2020-06-24
5	202021026741-CLAIMS [03-06-2022(online)].pdf	2022-06-03
6	202021026741-US(14)-HearingNotice-(HearingDate-04-04-2024).pdf	2024-02-28
6	202021026741-FIGURE OF ABSTRACT [24-06-2020(online)].jpg	2020-06-24
6	202021026741-FER_SER_REPLY [03-06-2022(online)].pdf	2022-06-03
7	202021026741-FER.pdf	2022-01-25
7	202021026741-DRAWINGS [24-06-2020(online)].pdf	2020-06-24
7	202021026741-CLAIMS [03-06-2022(online)].pdf	2022-06-03
8	202021026741-DECLARATION OF INVENTORSHIP (FORM 5) [24-06-2020(online)].pdf	2020-06-24
8	202021026741-FER_SER_REPLY [03-06-2022(online)].pdf	2022-06-03
8	Abstract1.jpg	2021-10-19
9	202021026741-COMPLETE SPECIFICATION [24-06-2020(online)].pdf	2020-06-24
9	202021026741-FER.pdf	2022-01-25
9	202021026741-FORM-26 [23-10-2020(online)].pdf	2020-10-23
10	202021026741-COMPLETE SPECIFICATION [24-06-2020(online)].pdf	2020-06-24
10	202021026741-FORM-26 [23-10-2020(online)].pdf	2020-10-23
10	Abstract1.jpg	2021-10-19
11	202021026741-DECLARATION OF INVENTORSHIP (FORM 5) [24-06-2020(online)].pdf	2020-06-24
11	202021026741-FORM-26 [23-10-2020(online)].pdf	2020-10-23
11	Abstract1.jpg	2021-10-19
12	202021026741-COMPLETE SPECIFICATION [24-06-2020(online)].pdf	2020-06-24
12	202021026741-DRAWINGS [24-06-2020(online)].pdf	2020-06-24
12	202021026741-FER.pdf	2022-01-25
13	202021026741-DECLARATION OF INVENTORSHIP (FORM 5) [24-06-2020(online)].pdf	2020-06-24
13	202021026741-FER_SER_REPLY [03-06-2022(online)].pdf	2022-06-03
13	202021026741-FIGURE OF ABSTRACT [24-06-2020(online)].jpg	2020-06-24
14	202021026741-CLAIMS [03-06-2022(online)].pdf	2022-06-03
14	202021026741-DRAWINGS [24-06-2020(online)].pdf	2020-06-24
14	202021026741-FORM 1 [24-06-2020(online)].pdf	2020-06-24
15	202021026741-FIGURE OF ABSTRACT [24-06-2020(online)].jpg	2020-06-24
15	202021026741-FORM 18 [24-06-2020(online)].pdf	2020-06-24
15	202021026741-US(14)-HearingNotice-(HearingDate-04-04-2024).pdf	2024-02-28
16	202021026741-FORM 1 [24-06-2020(online)].pdf	2020-06-24
16	202021026741-FORM-26 [19-03-2024(online)].pdf	2024-03-19
16	202021026741-PROOF OF RIGHT [24-06-2020(online)].pdf	2020-06-24
17	202021026741-Correspondence to notify the Controller [19-03-2024(online)].pdf	2024-03-19
17	202021026741-FORM 18 [24-06-2020(online)].pdf	2020-06-24
17	202021026741-REQUEST FOR EXAMINATION (FORM-18) [24-06-2020(online)].pdf	2020-06-24
18	202021026741-Written submissions and relevant documents [16-04-2024(online)].pdf	2024-04-16
18	202021026741-STATEMENT OF UNDERTAKING (FORM 3) [24-06-2020(online)].pdf	2020-06-24
18	202021026741-PROOF OF RIGHT [24-06-2020(online)].pdf	2020-06-24
19	202021026741-REQUEST FOR EXAMINATION (FORM-18) [24-06-2020(online)].pdf	2020-06-24
19	202021026741-PatentCertificate31-12-2024.pdf	2024-12-31
20	202021026741-STATEMENT OF UNDERTAKING (FORM 3) [24-06-2020(online)].pdf	2020-06-24
20	202021026741-IntimationOfGrant31-12-2024.pdf	2024-12-31

Search Strategy

1	SearchHistory(1)-convertedE_25-01-2022.pdf

ERegister / Renewals

3rd: 03 Jan 2025

From 24/06/2022 - To 24/06/2023

4th: 03 Jan 2025

From 24/06/2023 - To 24/06/2024

5th: 03 Jan 2025

From 24/06/2024 - To 24/06/2025

6th: 08 May 2025

From 24/06/2025 - To 24/06/2026