Method And System For Integrated Decision Support For Enterprise Functions
Abstract:
This disclosure relates generally to method and system for integrated decision support for enterprise functions. The method enables integration of decision-making process(es) for all transportation functions in an enterprise thereby minimizing cost-to-serve with optimal routing solution of vehicles. The method obtains an input query from a requestor and processes using a trained RL model with possible heterogeneous capacities in a network environment. Further, each active requestor is paired with active vehicle based on the at least one of the current state space parameters. Then, a step reward is assigned for every time step of the journey, and a terminal reward at the end of journey towards the destination with a fulfilment ratio based on a set of reward parameters. Optimized routing solution are determined based on the selected requestor vehicle pair, the step reward, the terminal reward, a total distance, a total number of active requestors and the fulfilment ratio.
Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence
Nirmal Building, 9th Floor, Nariman Point, Mumbai - 400021, Maharashtra, India
Inventors
1. KHADILKAR, Harshad
Tata Consultancy Services Limited, Olympus - A, Opp Rodas Enclave, Hiranandani Estate, Ghodbunder Road, Patlipada, Thane West - 400607, Maharashtra, India
2. DHARA, Anulekha
Tata Consultancy Services Limited, Galaxy Business Park, Plot no. A-44 & A45, Block A Industrial Area, Sector 62, Noida - 201309, Uttar Pradesh, India
3. GHOSH, Supratim
Tata Consultancy Services Limited, Galaxy Business Park, Plot no. A-44 & A45, Block A Industrial Area, Sector 62, Noida - 201309, Uttar Pradesh, India
4. SINHA, Sudhir Kumar
Tata Consultancy Services Limited, Olympus - A, Opp Rodas Enclave, Hiranandani Estate, Ghodbunder Road, Patlipada, Thane West - 400607, Maharashtra, India
Specification
DESC:FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003
COMPLETE SPECIFICATION
(See Section 10 and Rule 13)
Title of invention:
METHOD AND SYSTEM FOR INTEGRATED DECISION SUPPORT FOR ENTERPRISE FUNCTIONS
Applicant:
Tata Consultancy Services Limited
A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th Floor,
Nariman Point, Mumbai 400021,
Maharashtra, India
The following specification particularly describes the invention and the manner in which it is to be performed.
CROSS-REFERENCE TO RELATED APPLICATIONS AND PRIORITY
The present application claims priority from Indian provisional patent application no. 202021052976, filed on December 04, 2020. The entire contents of the aforementioned application are incorporated herein by reference.
TECHNICAL FIELD
The disclosure herein generally relates to integrated decision support, and, more particularly, to method and system for integrated decision support for enterprise functions.
BACKGROUND
Logistics and Transportation networks are very complex socio technical systems which operate in uncertain environments. To handle the complexity of such systems, many enterprises strive to provide accurate real time business reports which help in deciding the right measures in the logistics and transportation network. Such generation of reports are usually supported by a Data Warehouse (DWH) technology based on analytical decision-making system(s). Conventional, Online Analytical Processing (OLAP) software provides a fast, flexible, and interactive access to the data in the DWH and enables the enterprises, aggregation, and visualization of information in real time. Nevertheless, even with state-of-the-art DWHs, OLAP technology and smart key performance indicator measurement systems (KPIMS), the outcomes of certain actions in integrating cross functional enterprise functions are very hard to predict for making decision dynamically.
Typically, enterprises operate using fixed rules that are manually defined and are updated based on a predefined period of time. On the other hand, the market scenario evolves at the time scale of a few days to a few months. One exemplary situation is the increase in popularity of e-commerce and ship-to-home services during the pandemic, and the difficulty faced by traditional businesses in adapting to this change. Such systems lack visibility into global enterprise data and fixed rule-based operations results in missed identification of opportunities to realise novel revenue schemes, such as delivery of goods to customer households from irregular sources such as warehouse-direct or third-party providers. In one existing method, integrating various parts associated with a logistics management system using a supervisory server communicates with individual servers handling operations in silos. However, this method lacks integration of various independent components which increases cost with reduced efficiency.
Conventionally, Vehicle Routing Problem (VRP) is a well-known problem in combinatorial optimization (CO) where computing optimal route for a single or multiple identical vehicles, given the nodes to be visited. If there is a single vehicle with no constraints in terms of capacity or fuel/endurance and the optimality is defined by length of the route, the problem is equivalent to the travelling salesman problem. However, there are numerous versions of vehicle routing problem that make the problem more realistic for practical use. Known in the art constraints include a limit on the volume or weight of load carried and on the distance travelled in a single trip (before the vehicle must return to its depot). Further complexities include versions with multiple vehicles, time windows for visiting each node, dynamically generated demand, and several other options. In particular, there appear to be two basic variants of the problem such as static routing where demands do not change during computation of the solution or its execution, and dynamic routing where such demands can change at any time. Capacitated vehicle routing problem with time windows and dynamic routing (CVRP-TWDR) is a real-world and time-constrained problem. Learning-based techniques provide optimal solution for the twin problems such as scale and dynamicity. Reinforcement Learning (RL) has been applied to many hard problems, such as Atari, recommender systems, robotics for optimizing vehicle route. Recently, existing literatures develop solutions for combinatorial optimization (CO) problems using machine learning. Such existing method with RL is much more complex for realistic versions of the VRP. The hypothesis is that RL provide solutions significantly faster than exact methods and meta-heuristics when enabled in real-time applications.
SUMMARY
Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one embodiment, a system for integrated decision support for enterprise functions is provided. The system includes obtaining by a trained RL model, an input query from a requestor comprising a one or more service time windows and a destination to provide minimized routing solution for routing one or more vehicles with possible heterogeneous size, type and capacities in a network environment. The input query is processed by initializing atleast one of current state space parameters associated with the network environment. Further, each active requestor are paired with active vehicle in the network environment based on the atleast one of the current state space parameters. Then, the requestor vehicle pair are selected to provide routing solution based on the pairwise values. Further, a step reward is assigned for every time step of the journey, and a terminal reward at the end of journey towards the destination by computing a fulfilment ratio based on a set of reward parameters. Further, an optimized routing solution are determined with specified time windows based on the selected requestor vehicle pair, the step reward, the terminal reward, a total distance, a total number of active requestors and the fulfilment ratio.
The current state space parameters comprises includes one of (i) distance to requestor from the current vehicle location, (ii) distance from requestor to initial start point, (iii) flag for vehicle outbound and inbound, (iv) distance from vehicle to initial start point, (v) flag for requestor, (vi) current time step, (vii) remaining capacity of vehicle at current time step, (viii) service time windows, (ix) distance to next nearest reachable requestor, (x) distance from requestor to the closest vehicle, and (xi) vehicle wait time to start service. The set of reward parameters comprises atleast one of (i) distance travelled by each vehicle at every stage of journey, (ii) remaining time for servicing remaining requestors when vehicle arrives at destination, (iii) vehicle wait time at every stage, (iv) total distance to be travelled to reach closest requestor from current requestor being serviced, (v) expected time to service closest requestor from current requestor being serviced, (vi) vehicle moving away from the initial start point, and (vii) vehicle serving the requestor.
In another aspect, a method for integrated decision support for enterprise functions is provided. The method includes obtaining by a trained RL model, an input query from a requestor comprising a one or more service time windows and a destination to provide minimized routing solution for routing one or more vehicles with possible heterogeneous size, type and capacities in a network environment. The input query is processed by initializing atleast one of current state space parameters associated with the network environment. Further, each active requestor are paired with active vehicle in the network environment based on the atleast one of the current state space parameters. Then, the requestor vehicle pair are selected to provide routing solution based on the pairwise values. Further, a step reward is assigned for every time step of the journey, and a terminal reward at the end of journey towards the destination by computing a fulfilment ratio based on a set of reward parameters. Further, an optimized routing solution are determined with specified time windows based on the selected requestor vehicle pair, the step reward, the terminal reward, a total distance, a total number of active requestors and the fulfilment ratio.
The current state space parameters comprises includes one of (i) distance to requestor from the current vehicle location, (ii) distance from requestor to initial start point, (iii) flag for vehicle outbound and inbound, (iv) distance from vehicle to initial start point, (v) flag for requestor, (vi) current time step, (vii) remaining capacity of vehicle at current time step, (viii) service time windows, (ix) distance to next nearest reachable requestor, (x) distance from requestor to the closest vehicle, and (xi) vehicle wait time to start service. The set of reward parameters comprises atleast one of (i) distance travelled by each vehicle at every stage of journey, (ii) remaining time for servicing remaining requestors when vehicle arrives at destination, (iii) vehicle wait time at every stage, (iv) total distance to be travelled to reach closest requestor from current requestor being serviced, (v) expected time to service closest requestor from current requestor being serviced, (vi) vehicle moving away from the initial start point, and (vii) vehicle serving the requestor.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS
The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles:
FIG. 1 illustrates an exemplary system (alternatively referred as integrated decision system) to route atleast one vehicle with minimized time windows, in accordance with some embodiments of the present disclosure.
FIG. 2A illustrates a high-level architecture of the integrated decision system for routing at least one vehicle using the system of FIG.1, in accordance with some embodiments of the present disclosure.
FIG.2B illustrates functional block diagram of the integrated decision system routing the vehicle to service requestor for every time step using the system of FIG.1, in accordance with some embodiments of the present disclosure.
FIG. 3A and FIG.3B illustrates a flow diagram of the method for determining optimized routing solution with specified time windows using the system of FIG.1, in accordance with some embodiments of the present disclosure.
FIG. 4 illustrates an example scenario of input query from a requestor processed to determine optimized routing solution using the system of FIG.1, in accordance with some embodiments of the present disclosure.
FIG.5 illustrates experimental data of performance graph for reward versus fulfillment ratio for all episodes using the system of FIG.1, in accordance with some embodiments of the present disclosure.
FIG.6A and FIG.6B illustrates experimental graph with computation time for different sized data sets for static and dynamic CVRP using the system of FIG.1, in accordance with some embodiments of the present disclosure.
FIG.7A and FIG.7B illustrates scatter plots of relative distance and vehicle counts in comparison with existing techniques using the system of FIG.1, in accordance with some embodiments of the present disclosure.
FIG.8A and FIG.8B illustrates experimental graph of dynamicity and data type for average distance and vehicles in comparison with existing techniques for CVRP using the system of FIG.1, in accordance with some embodiments of the present disclosure.
DETAILED DESCRIPTION OF EMBODIMENTS
Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments.
Embodiments herein provide a method and system for integrated decision support for enterprise functions. The system may be alternatively referred as integrated decision system (IDS) and interchangeably used herein. The method enables integration of the decision-making process(es) for all transportation and logistics functions in an enterprise functions thereby minimizing cost-to-serve, optimizing loading of parcels onto vehicles, scheduling of crews, and routes and timings of vehicles. The present disclosure provides an integrated decision-making system 100 related to transportation and logistics in the enterprise which includes identification of pertinent information across the enterprise. Further, one or more physical hardware devices are configured to the system for collecting pertinent information from at least one enterprise. The system 100 collates input query from a requestor of enterprise functions into a single data management system. The input query comprises one or more service time windows and a destination to provide minimized routing solution for routing one or more vehicles with possible heterogeneous size, type, and capacities in a network environment. Further, the input query is processed by a reinforcement learning (RL) model associated with the IDS thereby computing decisions that maximise revenue and minimise operating costs for the enterprise configured sequentially to each enterprise functions of the system within the purview of the transportation and logistics functions. Additionally, the present disclosure of the method implements complete logistics and transportation management system, with number of integrated software and hardware components which provides a complete system for scheduling transportation assets and all deliveries of outbound products with minimized cost thereby improving performance and increased time efficiency. The disclosed system 100 is further explained with the method as described in conjunction with FIG.1 to FIG.8B below.
Referring now to the drawings, and more particularly to FIG. 1 through FIG.8B, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.
FIG. 1 illustrates an exemplary system (alternatively referred as integrated decision system) to route atleast one vehicle with minimized time windows, in accordance with some embodiments of the present disclosure. In an embodiment, the system 100 includes one or more hardware processors 104, communication interface device(s) or input/output (I/O) interface(s) 106 (also referred as interface(s)), and one or more data storage devices or memory 102 operatively coupled to the one or more hardware processors 104. The one or more processors 104 may be one or more software processing components and/or hardware processors. In an embodiment, the hardware processors can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the processor(s) is/are configured to fetch and execute computer-readable instructions stored in the memory. In an embodiment, the system 100 can be implemented in a variety of computing systems, such as laptop computers, notebooks, hand-held devices, workstations, mainframe computers, servers, a network cloud, and the like.
The I/O interface device(s) 106 can include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like and can facilitate multiple communications within a wide variety of networks N/W and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. In an embodiment, the I/O interface device(s) can include one or more ports for connecting a number of devices to one another or to another server.
The memory 102 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random-access memory (SRAM) and dynamic-random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. The memory 102 further comprises (or may further comprise) information pertaining to input(s)/output(s) of each step performed by the systems and methods of the present disclosure. In other words, input(s) fed at each step and output(s) generated at each step are comprised in the memory 102 and can be utilized in further processing and analysis.
FIG. 2A illustrates a high-level architecture of the integrated decision system for routing at least one vehicle using the system of FIG.1, in accordance with some embodiments of the present disclosure. FIG.2A is the integrated decision system (IDS) with a plurality of components comprising of a data lake, a preprocessing component, a real time execution component, a multi model orchestration component and a model training component. The IDS is further configured to one or more enterprise functions but not limited to for example a Transport Management System (TMS), an Order Management System (OMS), an E-commerce platform (ECP), a Loading system (LS), a Crew Scheduling System (CSS), a Warehouse management system (WMS) and thereof. Here, each enterprise function is configured to at least one RL agent which further communicates sequentially with subsequent RL agents of the enterprise functions to obtain co-optimized decision in real time. Here, the RL agent refers to the vehicle in the IDS. The feeding of output from one RL agent to another RL agent configured to the enterprise function in the same decision step, is non-trivial. Each RL agent communicates and provides direct instructions to the next RL agent and subsequent RL agents of the enterprise function in sequential order for multiple RL agents communicate with each other in the enterprise functions. The setup usually destabilises the learning since the feedback of RL agent is no longer dependent only on its own decisions and is difficult to train RL agents effectively in such scenarios. The IDS may return to the one or more RL agents with a revised set of parameters, to refine the system-level decisions and thereby minimizing the time windows with cost to serve. The present disclosure handles by concatenating the output of processed input query of each RL agent with the feedback from the cost to serve and optimized time windows. The RL agents are trained with a single reward signal at time step, and multiple internal decision steps within the time step for providing routing solution to service the requestor. The amount of credit (or penalty) is assigned for each internal decision is computed by obtaining a linked set of gradients from the step reward and these gradients to improve the policy of the RL agents.
The data lake component of the IDS obtains one or more input query from one or more enterprise functions such as transportation and logistics into a single centralised IDS. This data lake component serves as collection of input query into the repository of historical data which can also be utilized for training atleast one RL agent in real time.
The preprocessing component enables to select subset of the data for decision-making using AI, machine learning, and optimisation components, based on context and other triggers. This further converts the outputs of one or more enterprise functions enabling real time decisions for each individual enterprise functions such as the TMS, the LS, the CSS, the OMS, and the WMS. In real time execution of one or more enterprise functions, a subsystem manages the available one or more hardware resources, whether physical or cloud-based, for the purpose of computing real-time decisions for transportation and logistics components.
The multi-model orchestration component selects the best algorithm to be executed for making decisions based on the input query. This enables the IDS to provide with a choice of algorithm(s) based on function, expected accuracy, and reliability based on historical performance. In the model training component, the IDS monitors all artificial intelligence (AI) and machine learning (ML) components associated with the trained RL agent component in operation and decides when to re-train one or more RL agents configured to one or more enterprise functions based on recent performance, changes to operating conditions (such as seasonal changes, demand surges in general or for specific products), or manual triggers.
FIG.2B illustrates functional block diagram of the integrated decision system routing the vehicle to service requestor for every time step using the system of FIG.1, in accordance with some embodiments of the present disclosure. FIG.2B describes decision making by at least one RL agent at each time step t for the given input query using the value based reinforcement learning to process the capacitated vehicle routing problem with time windows (CVRP-TW) when requestor demands to arrive at arbitrary time. The method of the present disclosure enables to build minimized routing solution by mapping vehicles to requestors one at a time to the vehicle, eventually leading to complete routing with arrival times. It is performed by implementing parallelized computation of values associated with each active requestor vehicle pair, followed by centralized heuristic approach to compute the final assignment for each vehicle. Parallelism is achieved by noting that state space parameters and value outputs for the given requestor vehicle pair can be computed independently of all other pairs, given the status of all vehicles and requestors at a specific time. Furthermore, the current state space parameters of the RL agent enables to (i) keep the value estimates consistent, (ii) accelerate training through pooling of experience, and (iii) scale to large problem instances without retraining.
FIG. 3A and FIG.3B illustrates a flow diagram of the method for determining optimized routing solution with specified time windows using the system of FIG.1, in accordance with some embodiments of the present disclosure. In an embodiment, the system 100 comprises one or more data storage devices or the memory 102 operatively coupled to the processor(s) 104 and is configured to store instructions for execution of steps of the method 300 by the processor(s) or one or more hardware processors 104. The steps of the method 300 of the present disclosure will now be explained with reference to the components or blocks of the system 100 as depicted in FIG.3A and FIG.3B and the steps of flow diagram. Although process steps, method steps, techniques or the like may be described in a sequential order, such processes, methods and techniques may be configured to work in alternate orders. In other words, any sequence or order of steps that may be described does not necessarily indicate a requirement that the steps to be performed in that order. The steps of processes described herein may be performed in any order practical. Further, some steps may be performed simultaneously.
Referring now to the steps of the method 300, at step 302, the one or more hardware processors 104 obtain, via a trained RL model, an input query from a requestor comprising a one or more service time windows and a destination to provide minimized routing solution for routing one or more vehicles with possible heterogeneous size, type and capacities in a network environment. The input query includes a current location of one or more requestors and a destination to be reached. The method processes the input query for mapping dynamically each requestor with one vehicle being free to service and routed to the destination with minimized time windows. Here, each vehicle independently evaluates the value of serving each available current requestor i at each time step t. Here, the centralized allocation heuristic approach collects the generated values, and maps vehicles to each requestor. This process is repeated iteratively until all requestors are served, or no further service can be rendered.
In one embodiment, the static version of the capacitated vehicle routing problem with service time windows (CVRP-TW) assumes that a set of requestors C is known, with their locations as (x_i ,y_i), a demanded load (m_i), and a service time windows ?[T?_i,min ,T_i,max], where i ? C, x_i ,y_i ? R, T_i,min ,T_i,max ? R+, and T_i,min < T_i,max. All vehicles start at the depot o which is an initial start point of the journey towards the destination, and to travel between any two locations (fully connected graph), with the distance d_(i,j) between any two requestors (current requestor and the next requestor) ??, ?? ? C given by the usual Euclidean metric. Here, a fixed set of vehicles V, the speed v of the vehicles and the maximum load M that any vehicle can carry in one trip. Here, the objective of the vehicle routing problem is to find the total distance J is minimized as described below in equation 1,
J= ¦(min@a_(i,j,k) f_(i,k) l_(i,k) ) (?_(i,j,k)¦?d_(i,j) a_(i,j,k) ?+ ?_(i,k)¦?d_(o,i ) f_(i,k) ?+ ?_(i,k)¦?d_(o,i) ? l?_(i,k) ?) --- equation 1
Where,
d_(i,j) is the distance from current requestor i to next requestor being rendered for service j
d_(o,i ) is the distance from origin (depot) to current requestor i,
a_(i,j,k) is an indicator variable which is 1 if vehicle k goes directly from the current requestor i to the next requestor being rendered for service j,
f_(i,k) is an indicator variable which is 1 if the current requestor i is the first requestor served by vehicle k, and
? l?_(i,k) is a similar indicator variable which is 1 if the current requestor i is the last customer visited by vehicle k.
Apart from the constraints such as a_(i,j,k), f_(i,k) and? l?_(i,k) to take values from {0, 1}, the other constraints are defined as described below in equation 2. Every requestor is served exactly with one vehicle, within its specified time window. If t_(i,k) is the time at which vehicle k visits i, then we write these constraints are described below in equation 2 and equation 3,
?_k¦?f_(i,k) ?+ ?_(j,k)¦?a_(j,i,k)=1 ?i ?---------- equation 2
T_i,min = t_(i,k) = T_i,max ,if f_(i,k)=1 or ?j s.t.a_(j,i,k)=1 ---------- equation 3
If a vehicle k serves the current requestor i as defined above, then the current requestor location is skipped and travel to next requestor j , or back to the depot or the initial start point. Conversely, the vehicle that has not served the current requestor i and cannot leave the location as denoted below in equation 4,
l_(i,k)+ ?_j¦?a_(i,j,k)=? 1 if f_(i,k)=1 or ?j s.t.a_(j,i,k)=1 ; 0 otherwise
------------ equation 4
The vehicle that starts at the depot or the initial start point, and its total load carried must be at most M as denoted below in equation 5 and equation 6,
?_i¦?l_(i,k) ?= ?_i¦?f_(i,k) ?k? ---------------- equation 5
?_(i,j)¦?m_(j ) a_(j,i,k)+ ?_i¦?m_(i ) f_(i,k) ? =M ?k? --------------- equation 6
Further, the travel time constraints between any two locations, based on the distance between them and the speed ?? at which vehicles are travelled are expressed below in equation 7 and equation8,
t_(i,k)= d_(o,i)/v if f_(i,k) =1 -------------- equation 7
t_(i,k)= t_(j,k)+ d_(j,i)/v if a_(j,i,k) = 1 -------------- equation 8
In some real-world scenarios the number of vehicles needs to be minimized, where each vehicles have different distance and velocity constraints, travel times can vary based on traffic conditions, vehicles can do multiple trips, in addition to other variations. Such dynamic version of the problem allow requestor to ‘pop up’ at arbitrary times, while current routing solution is being executed.
At step 304 of the method 300 the one or more hardware processors 104 process the input query by initializing at least one of current state space parameters associated with the network environment. Referring now to FIG.2B, and the above example, the current state space parameters are initialized for the input query where the current requestor is denoted as i, the vehicle is k, and the current time is t. Here, the distance from the start point to the destination are based on the current vehicle location (at the initial start point o) and the current requestor. Here, all the inputs are normalized by relevant constants (D for distances, M for loads, and t for times) as denoted in the embodiments. In order to handle arbitrary number of vehicles information about individual requestor-vehicle pair as inputs to each RL agent, and to output a scalar value for each pair are provided to the system. Table 1 lists the features used for computation of pairwise value estimates, for a generic vehicle k, at time t, considering the current requestor i. The vehicle could presently be at the depot ?? or it could have just completed a service for the current requestor j. Table 1 describes the state space parameters for each requestor vehicle pair, assuming that the current requestor is i, the vehicle is k, and current time is t . Here, the distance inputs are based on the current vehicle location (at depot ??). It is to be noted that all inputs are normalized by relevant constants such as D for distances, M for loads, and t for times.
Table 1 – State space parameters
Input Explanation
d_(o,i) or d_(j,i) distance to requestor from the current vehicle location
m_i requestors demand quantity
d_(o,i) distance to depot from the requestor location
I_(in / out) flag for vehicle outbound and inbound
0 or d_(o,j) distance to initial start point from the current vehicle location ( 0 if at o, otherwise d_(o,j))
I_(serve,k^'?k ) Flag where requestor can be served by another vehicle k^'
t current time-step
M_k (t) remaining capacity of vehicle k at current time step t
T_(i,max)
maximum time windows of the requestor
?min?_(rchbl .j (d_(i,j))) distance to next nearest reachable requestor from the current requestor being served
?min?_(k (d_(i,loc (k))))
Distance from current requestor i to the closest vehicle
max??(T_(i,min)-t- d_(j,i) /v,0)? Time for which vehicle will have to wait to start service
All distance related inputs are normalized by D, the diagonal length of the x-y map on which requestors are located. The time related inputs are normalized by t= ?max?_i T_(i,max) the latest among all time windows for requestors. The load related inputs are normalized by M, the load carrying capacity of each vehicle. The non-trivial inputs emulate information that would be available from a graphical analysis, as described below. Further, two of the inputs are binary flags, where the first flag is
I_(in / out) is set to 1 if the current time t is greater than t/2. The RL agent starts moving vehicles back towards the depot or the initial start point. The second flag,
I_(serve,k^'?k ), is set to 1 if there is no other vehicle k'?k which also serves the current requestor i within the time window. This flag indicates to the agent whether the requestor is likely to get dropped if it is not served by k. An extension to this signal is
?min?_(k (d_(i,loc (k)))) which is the distance to the closest vehicle from the current requestor ??. Further, ?min?_(rchbl .j (d_(i,j))) is the distance from the current requestor i to the nearest requestor j which is still active (not yet picked by any vehicle), and can be reached within its time window after serving ??. Finally, the time is indicated for which vehicle k has to wait at the current requestor i if arrived before the minimum time T_(i,min).
At step 306 of the method 300 the one or more hardware processors 104 compute pairwise values by pairing each active requestor with active vehicle in the network environment based on the at least one of the current state space parameters. The inputs query as described above are used by the RL agent to produce the scalar output for each requestor vehicle pair. The trigger for these computations is either the start of an episode (t = 0), or a vehicle becoming free (completes assigned service) at time step t >0. As shown in FIG.2A, the RL agent then computes pairwise values q(k,i) for each vehicle k and the current requestor i. This computation is done regardless of the current state of each vehicle (busy or free), but the state space parameters (Table 1) reflects additional time required for finishing the current service. The pairwise values q(k,i) are generated with one pair.
At step 308 of the method 300 the one or more hardware processors 104 select a requestor vehicle pair to provide routing solution based on the pairwise values.
The pairwise values are mapped to the current requestor i to the vehicle that triggers the computation, the decision is communicated to the environment. However, if the selected pair belongs to the vehicle k that is currently busy, a ‘soft’ update is made to the state information, emulating the assignment. Here, the soft update includes deactivation of selected current requestors i and consequent changes to estimated time and distances, but these updates are local to the RL agent. The process continues until the trigger vehicle gets new assignment. Referring now to the example (FIG.2B and FIG.4) where the vehicle k becomes free at time t, after serving the requestors i. Here, the requestor i is the only available customer at this time, it generates a single value q(k,i). However, vehicle k' is scheduled to become free one time step later, and is much closer to the current requestor i. Therefore, it would be more optimal to assign k' to i. If the vehicle is assigned to the requestor whose window is not yet open (waiting is expected), then the vehicle does not leave its current location until the correct time arrives.
At step 310 of the method 300 the one or more hardware processors 104 assign a step reward for every time step of the journey, and a terminal reward at the end of journey towards the destination by computing a fulfilment ratio based on a set of reward parameters. The terminal reward is given by the fulfilment ratio F, (FIG.2B and FIG.4) which is the ratio of requestors served to the total number of requestors in the dataset computed at the end of the episode. The terminal reward function is the sum of fulfillment ratio and the step reward of time step for every vehicle at every stage, wherein the step reward of time step is computed based on atleast one of the set of reward parameters. The step reward for vehicle k at stage ?? of its journey is computed based on several terms, as defined below and explained in Table 2.
Table 2 – Step reward parameters for vehicle k of at stage n of its journey, assuming currently at requestor i heading towards the next requestor j
Weight Term Explanation
a_1 d(k,n) distance travelled by each vehicle at every stage of journey, d_(o,j) or d_(i,j)
a_2 T_(j,max )-t(k,j) remaining time for servicing j remaining requestors when vehicle k arrives at destination
a_3 t_wait (k,j,n) vehicles wait time k at every stage n given by max??(0,T_(j,min)-t(k,j))?
a_4 ? (k,k^*,j) total distance to be travelled to reach closest requestor from current requestor being serviced k^*
a_5 d_(j,j^* ) /v +t_(wait ) (k,j^*,n+1) expected time to service closest requestor from current requestor being serviced j^* after j
a_6 a (i,j,t) vehicle moving away from the initial start point
a_7 I_(serve,k^'?k )
vehicle serving the requestor
The vehicle is currently at the location of the current requestor i at time t going to the next requestor j as the reward step is denoted below in equation 9,
R_step (k,n)= - a_1 d(k,n)- a_2 T_(j,max )-t(k,j)- a_3 t_wait (k,j,n)- a_4 ? (k,k^*,j)- a_5 d_(j,j^* ) /v +t_(wait ) (k,j^*,n+1) + a_(6 ) a (i,j,t)+ a_7 I_(serve,k^'?k )
---------------- equation 9
The step reward definition as described in equation 9 is necessarily complex, because it provides feedback about movements of other vehicles. Further, most terms can be correlated by the RL agent to various combinations of inputs, which were defined in Table 1. Apart from the basic distance and time, the step reward includes the comparison ?(k,k*,j ) of the distance of the current vehicle k from j, versus the closest vehicle k* from j . A term for the expected cost incurred in the next (n + 1)th stage of the journey is also included. Finally, the indicator variables are correlated in the input with the last two terms of the step reward. The binary reward a (i,j,t ) is equal to 1 if k is moving outward from the depot (j is farther from the depot than ??) in the first half, or when Iin/out=0 and vice versa. The final term is the large positive reward for serving j if k is the only vehicle that can do so. After experimentation, it is identified that the settings are [a_1,a_2 ...,a_7] to [0.2, 0.5, 1.0, 0.25, 0.5, 0.1, 0.25] resulted in good solutions on the training data. If k serves N_k customers in its journey, the terminal reward is denoted below in equation 10,
R(k,n)= R_step (k,n)+ ?^(N_(k-n) ) ? -------------- equation 10
The fulfilment ratio is the ratio of number of requestors served from the total number of requestors at every time step.
At step 312 of the method 300 the one or more hardware processors 104 determine optimized routing solution with specified time windows based on the selected requestor vehicle pair, the step reward, the terminal reward, a total distance, a total number of active requestors and the fulfilment ratio. Here, the neural network with experience replay for approximating the value function. The neural network receives the inputs as defined in Table 1, and attempts to predict the terminal reward as defined in equation 10. It is to be considered that each vehicle has the decentralized agent with parameter sharing. This property helps us to (i) accelerate learning by pooling experience from all vehicles, (ii) improve scalability as the number of vehicles and customers can be arbitrary, and (iii) improve consistency of decision-making, as the value outputs can differ only due to inputs and not due to network parameters. The neural network has fully connected layers of dimension (12, 6, 3, 1) neurons, with the first layer and last layer being input and output respectively. All the hidden layers have tanh activation and the output has linear activation. The network is trained using the
torch library in Python version 3.6, using Adam optimizer with a learning rate of 0.001, a batch size of 32 samples, and mean squared loss. The replay memory buffer consists of 50000 samples which is used for randomly sampling the batch size for training. Further, the training procedure of the RL model is described below in Table 3.
Table 3 – Training the RL model
Initialize replay buffer B
Initialize value network parameters ? and batch size ß
For episode =1 to N_episode do
Randomly choose instance from training dataset
Resent environment and get initial state
Stage 1: Predict and sample collection
While t
Documents
Application Documents
#
Name
Date
1
202021052976-STATEMENT OF UNDERTAKING (FORM 3) [04-12-2020(online)].pdf