Abstract: ABSTRACT METHOD AND SYSTEM FOR SOCIALLY COMPLIANT ROBOTIC PATH PLANNING Social robotics has made significant progress in recent years, particularly in social robot navigation, which is crucial for robots to coexist with humans. However, this presents challenges such as navigating crowded and dynamic environments, understanding social cues, and ensuring safety, comfort, and efficiency. Effective social robot navigation requires planning and executing complex tasks while adapting to changing surroundings and meeting user expectations. System and method disclosed herein provide an approach which samples a generated social motion latent space of a robot based on robot-human self-attention, one or more robot state features, and the social motion latent space parameters for a current time-step, and accordingly generates control signals for navigating the robot, based on a determined relative importance of the humans to the robot. [To be published with FIG. 2]
Description:FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003
COMPLETE SPECIFICATION
(See Section 10 and Rule 13)
Title of invention:
METHOD AND SYSTEM FOR SOCIALLY COMPLIANT ROBOTIC PATH PLANNING
Applicant
Tata Consultancy Services Limited
A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th floor,
Nariman point, Mumbai 400021,
Maharashtra, India
The following specification particularly describes the invention and the manner in which it is to be performed.
TECHNICAL FIELD
The disclosure herein generally relates to robotic path planning, and, more particularly, to a method and system for socially compliant robotic path planning.
BACKGROUND
Social robotics has made significant progress in recent years, particularly in social robot navigation, which is crucial for robots to coexist with humans. However, this presents challenges such as navigating crowded and dynamic environments, understanding social cues, and ensuring safety, comfort, and efficiency. Effective social robot navigation requires planning and executing complex tasks while adapting to changing surroundings and meeting user expectations. To tackle the above challenges, reactive methods such as ORCA (Optimal Reciprocal Collision Avoidance) and Social Force (SF) are widely used. ORCA is a velocity-based approach that uses a set of constraints to compute the optimal velocity for each agent in a centralized multi-agent system, ensuring that they avoid collisions while moving towards their goals. Social Force is a force-based approach that models the interactions between agents as forces and calculates the resulting movements based on the sum of these forces. Both methods have shown promising results in avoiding collisions in complex environments. Learning-based methods have been used to model robot crowd navigation as a Markov Decision Process (MDP) and to solve it using neural networks. A decentralized structural-RNN (DS-RNN) network based approach that has been used models crowd navigation as a spatio-temporal graph to capture the interactions between the robot and other agents through space and time. Methods involving Deep V-Learning use neural networks to approximate the state value in the MDP and choose actions based on this value. Deep V-Learning is typically initialized by ORCA, but this can result in inheriting ORCA’s limitations (e.g., robot freezing problem). Additionally, Deep V-Learning assumes that the dynamics of humans are deterministic and can be modelled accurately, which may not be true in real-world scenarios.
SUMMARY
Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one embodiment, a processor implemented method is provided. The method includes generating, via a multi-modal trajectory forecaster executed by one or more hardware processors, a social motion latent space for a robot. Further, a robot-human self-attention is modelled, by a controller module executed by the one or more hardware processors, by processing humans' awareness towards the robot and one or more quantities estimated with respect to a) a state of the robot, b) state of one or more humans. Further, the social motion latent space is sampled, by the controller module executed by the one or more hardware processors, to select a plurality of latent codes, computed using a plurality of control variables via a reparameterization operation, wherein the plurality of control variables are estimated based on one or more human features weighted by the modelled robot-human self-attention, one or more robot state features, and the social motion latent space parameters for a current time-step, and wherein the selected plurality of latent codes are goal-directed and socially compliant with respect to an environment in which the robot is located. Further, a plurality of motion vectors are generated via a decoder of the multi-modal trajectory forecaster, by transforming the plurality of latent codes. Further, the plurality of motion vectors are mapped, via the one or more hardware processors, to an action space of the robot, wherein by mapping the selected plurality of actions to the action space of the robot, one or more control commands to navigate the robot towards a defined goal in a socially compliant way are generated. Further, the robot is navigated towards the defined goal, via the one or more hardware processors, based on the mapping of the selected plurality of actions to the action space of the robot.
In an embodiment of the processor implemented method, modelling the robot-human self-attention comprises generating a plurality of weights via a sequence of differentiable operations, by processing a) a linearly transformed state of the robot, and b) a linearly transformed state of one or more humans and associated awareness state, and wherein the generated plurality of weights represent a relative importance of each of the one or more humans to the robot.
In another embodiment of the processor implemented method, information on the relative importance of each of the one or more humans to the robot is used to generate one or more goal-directed and socially compliant control commands by the controller module to navigate the robot towards the defined goal.
In another embodiment of the processor implemented method, sampling the social motion latent space by the controller module comprises processing a) the linearly transformed state of the robot, b) the linearly transformed states of one or more humans, weighted by the generated plurality of weights, and c) one or more linearly transformed social motion latent space parameters for a current time-step, to produce a plurality of perturbation vectors via a sequence of differentiable operations, wherein the plurality of perturbation vectors are converted to a latent code using a differentiable reparameterization operation.
In another embodiment of the processor implemented method, the controller module is modelled as a Markov decision Process (MDP), and is trained using a Reinforcement Learning (RL) framework, and wherein the training using the RL framework comprises of training an RL agent based on a plurality of parameters representing a) the humans’ awareness towards the robot, b) the state of the robot, c) the state of one or more humans, and d) the social motion latent space parameters.
A system is provided. The system includes one or more hardware processors, a communication interface, and a memory storing a plurality of instructions. The plurality of instructions cause the one or more hardware processors to generate via a multi-modal trajectory forecaster executed by the one or more hardware processors, a social motion latent space for a robot. Further, a robot-human self-attention is modelled, by a controller module executed by the one or more hardware processors, by processing humans' awareness towards the robot and one or more quantities estimated with respect to a) a state of the robot, b) state of one or more humans. Further, the social motion latent space is sampled, by the controller module executed by the one or more hardware processors, to select a plurality of latent codes, computed using a plurality of control variables via a reparameterization operation, wherein the plurality of control variables are estimated based on one or more human features weighted by the modelled robot-human self-attention, one or more robot state features, and the social motion latent space parameters for a current time-step, and wherein the selected plurality of latent codes are goal-directed and socially compliant with respect to an environment in which the robot is located. Further, a plurality of motion vectors are generated via a decoder of the multi-modal trajectory forecaster, by transforming the plurality of latent codes. Further, the plurality of motion vectors are mapped, via the one or more hardware processors, to an action space of the robot, wherein by mapping the selected plurality of actions to the action space of the robot, one or more control commands to navigate the robot towards a defined goal in a socially compliant way are generated. Further, the robot is navigated towards the defined goal, via the one or more hardware processors, based on the mapping of the selected plurality of actions to the action space of the robot.
In yet another embodiment of the system, modelling the robot-human self-attention comprises generating a plurality of weights via a sequence of differentiable operations, by processing a) a linearly transformed state of the robot, and b) a linearly transformed state of one or more humans and associated awareness state, and wherein the generated plurality of weights represent a relative importance of each of the one or more humans to the robot.
In yet another embodiment of the system, information on the relative importance of each of the one or more humans to the robot is used to generate one or more goal-directed and socially compliant control commands by the controller module to navigate the robot towards the defined goal.
In yet another embodiment of the system, sampling the social motion latent space by the controller module comprises processing a) the linearly transformed state of the robot, b) the linearly transformed states of one or more humans, weighted by the generated plurality of weights, and c) one or more linearly transformed social motion latent space parameters for a current time-step, to produce a plurality of perturbation vectors via a sequence of differentiable operations, wherein the plurality of perturbation vectors are converted to a latent code using a differentiable reparameterization operation.
In yet another embodiment of the system, the controller module is modelled as a Markov decision Process (MDP), and is trained using a Reinforcement Learning (RL) framework, and wherein the training using the RL framework comprises of training an RL agent based on a plurality of parameters representing a) the humans’ awareness towards the robot, b) the state of the robot, c) the state of one or more humans, and d) the social motion latent space parameters.
In yet another aspect, a non-transitory computer readable medium is provided. The non-transitory computer readable medium includes a plurality of instructions, which when executed, cause one or more hardware processors to perform the following steps. Initially, via a multi-modal trajectory forecaster executed by one or more hardware processors, a social motion latent space for a robot is generated. Further, a robot-human self-attention is modelled, by a controller module executed by the one or more hardware processors, by processing humans' awareness towards the robot and one or more quantities estimated with respect to a) a state of the robot, b) state of one or more humans. Further, the social motion latent space is sampled, by the controller module executed by the one or more hardware processors, to select a plurality of latent codes, computed using a plurality of control variables via a reparameterization operation, wherein the plurality of control variables are estimated based on one or more human features weighted by the modelled robot-human self-attention, one or more robot state features, and the social motion latent space parameters for a current time-step, and wherein the selected plurality of latent codes are goal-directed and socially compliant with respect to an environment in which the robot is located. Further, a plurality of motion vectors are generated via a decoder of the multi-modal trajectory forecaster, by transforming the plurality of latent codes. Further, the plurality of motion vectors are mapped, via the one or more hardware processors, to an action space of the robot, wherein by mapping the selected plurality of actions to the action space of the robot, one or more control commands to navigate the robot towards a defined goal in a socially compliant way are generated. Further, the robot is navigated towards the defined goal, via the one or more hardware processors, based on the mapping of the selected plurality of actions to the action space of the robot.
In an embodiment of the non-transitory computer readable medium, modelling the robot-human self-attention comprises generating a plurality of weights via a sequence of differentiable operations, by processing a) a linearly transformed state of the robot, and b) a linearly transformed state of one or more humans and associated awareness state, and wherein the generated plurality of weights represent a relative importance of each of the one or more humans to the robot.
In another embodiment of the non-transitory computer readable medium, information on the relative importance of each of the one or more humans to the robot is used to generate one or more goal-directed and socially compliant control commands by the controller module to navigate the robot towards the defined goal.
In another embodiment of the non-transitory computer readable medium, sampling the social motion latent space by the controller module comprises processing a) the linearly transformed state of the robot, b) the linearly transformed states of one or more humans, weighted by the generated plurality of weights, and c) one or more linearly transformed social motion latent space parameters for a current time-step, to produce a plurality of perturbation vectors via a sequence of differentiable operations, wherein the plurality of perturbation vectors are converted to a latent code using a differentiable reparameterization operation.
In another embodiment of the non-transitory computer readable medium, the controller module is modelled as a Markov decision Process (MDP), and is trained using a Reinforcement Learning (RL) framework, and wherein the training using the RL framework comprises of training an RL agent based on a plurality of parameters representing a) the humans’ awareness towards the robot, b) the state of the robot, c) the state of one or more humans, and d) the social motion latent space parameters.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS
The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles:
FIG. 1 illustrates an exemplary system for socially compliant robotic path planning, according to some embodiments of the present disclosure.
FIG. 2 is a functional block diagram of the system of FIG. 1, according to some embodiments of the present disclosure.
FIG. 3 is a flow diagram depicting steps involved in the process of socially compliant robotic path planning using the system of FIG. 1, in accordance with some embodiments of the present disclosure.
FIGS. 4A through 4E are example diagrams associated with the socially compliant robotic path planning, according to some embodiments of the present disclosure.
DETAILED DESCRIPTION OF EMBODIMENTS
Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments.
Social robotics is crucial for robots to coexist with humans. However, this presents challenges such as navigating crowded and dynamic environments, understanding social cues, and ensuring safety, comfort, and efficiency. Effective social robot navigation requires planning and executing complex tasks while adapting to changing surroundings and meeting user expectations. To tackle the above challenges, reactive methods such as ORCA (Optimal Reciprocal Collision Avoidance) and Social Force (SF) are widely used. ORCA is a velocity-based approach that uses a set of constraints to compute the optimal velocity for each agent in a centralized multi-agent system, ensuring that they avoid collisions while moving towards their goals. Social Force is a force-based approach that models the interactions between agents as forces and calculates the resulting movements based on the sum of these forces. Both methods have shown promising results in avoiding collisions in complex environments. Learning-based methods have been used to model robot crowd navigation as a Markov Decision Process (MDP) and to solve it using neural networks. A decentralized structural-RNN (DS-RNN) network based approach that has been used models crowd navigation as a spatio-temporal graph to capture the interactions between the robot and other agents through space and time. Methods involving Deep V-Learning use neural networks to approximate the state value in the MDP and choose actions based on this value. Deep V-Learning is typically initialized by ORCA, but this can result in inheriting ORCA’s limitations (e.g., robot freezing problem). Additionally, Deep V-Learning assumes that the dynamics of humans are deterministic and can be modelled accurately, which may not be true in real-world scenarios.
In order to address these challenges, method and system for socially compliant robotic path planning is planned. The method includes generating, via a multi-modal trajectory forecaster executed by one or more hardware processors, a social motion latent space for a robot. Further, a robot-human self-attention is modelled, by a controller module executed by the one or more hardware processors, by processing humans' awareness towards the robot and one or more quantities estimated with respect to a) a state of the robot, b) state of one or more humans. Further, the social motion latent space is sampled, by the controller module executed by the one or more hardware processors, to select a plurality of latent codes, computed using a plurality of control variables via a reparameterization operation, wherein the plurality of control variables are estimated based on one or more human features weighted by the modelled robot-human self-attention, one or more robot state features, and the social motion latent space parameters for a current time-step, and wherein the selected plurality of latent codes are goal-directed and socially compliant with respect to an environment in which the robot is located. Further, a plurality of motion vectors are generated via a decoder of the multi-modal trajectory forecaster, by transforming the plurality of latent codes. Further, the plurality of motion vectors are mapped, via the one or more hardware processors, to an action space of the robot, wherein by mapping the selected plurality of actions to the action space of the robot, one or more control commands to navigate the robot towards a defined goal in a socially compliant way are generated. Further, the robot is navigated towards the defined goal, via the one or more hardware processors, based on the mapping of the selected plurality of actions to the action space of the robot.
Referring now to the drawings, and more particularly to FIG. 1 through FIG. 4E, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.
FIG. 1 illustrates an exemplary system for socially compliant robotic path planning, according to some embodiments of the present disclosure. The system 100 includes or is otherwise in communication with hardware processors 102, at least one memory such as a memory 104, an I/O interface 112. The hardware processors 102, memory 104, and the Input /Output (I/O) interface 112 may be coupled by a system bus such as a system bus 108 or a similar mechanism. In an embodiment, the hardware processors 102 can be one or more hardware processors.
The I/O interface 112 may include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like. The I/O interface 112 may include a variety of software and hardware interfaces, for example, interfaces for peripheral device(s), such as a keyboard, a mouse, an external memory, a printer and the like. Further, the I/O interface 112 may enable the system 100 to communicate with other devices, such as web servers, and external databases.
The I/O interface 112 can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example, local area network (LAN), cable, etc., and wireless networks, such as Wireless LAN (WLAN), cellular, or satellite. For the purpose, the I/O interface 112 may include one or more ports for connecting several computing systems with one another or to another server computer. The I/O interface 112 may include one or more ports for connecting several devices to one another or to another server.
The one or more hardware processors 102 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, node machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the one or more hardware processors 102 is configured to fetch and execute computer-readable instructions stored in the memory 104.
The memory 104 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random-access memory (SRAM) and dynamic random-access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. In an embodiment, the memory 104 includes a plurality of modules 106.
The plurality of modules 106 include programs or coded instructions that supplement applications or functions performed by the system 100 for executing different steps involved in the process of the socially compliant robotic path planning, being performed by the system 100. The plurality of modules 106, amongst other things, can include routines, programs, objects, components, and data structures, which performs particular tasks or implement particular abstract data types. The plurality of modules 106 may also be used as, signal processor(s), node machine(s), logic circuitries, and/or any other device or component that manipulates signals based on operational instructions. Further, the plurality of modules 106 can be used by hardware, by computer-readable instructions executed by the one or more hardware processors 102, or by a combination thereof. The plurality of modules 106 can include various sub-modules (not shown). The plurality of modules 106 may include computer-readable instructions that supplement applications or functions performed by the system 100 for the socially compliant robotic path planning.
The data repository (or repository) 110 may include a plurality of abstracted piece of code for refinement and data that is processed, received, or generated as a result of the execution of the plurality of modules in the module(s) 106.
Although the data repository 110 is shown internal to the system 100, it will be noted that, in alternate embodiments, the data repository 110 can also be implemented external to the system 100, where the data repository 110 may be stored within a database (repository 110) communicatively coupled to the system 100. The data contained within such external database may be periodically updated. For example, new data may be added into the database (not shown in FIG. 1) and/or existing data may be modified and/or non-useful data may be deleted from the database. In one example, the data may be stored in an external system, such as a Lightweight Directory Access Protocol (LDAP) directory and a Relational Database Management System (RDBMS). Functions of the components of the system 100 are now explained with reference to the steps in flow diagrams in FIG. 3, the functional implementation in FIG. 3, and the example diagrams in FIGS. 4A through 4C.
FIG. 2 is a functional block diagram of the system of FIG. 1, according to some embodiments of the present disclosure. In an embodiment, the system 100 comprises one or more data storage devices or the memory 104 operatively coupled to the processor(s) 102 and is configured to store instructions for execution of steps of the method 200 by the processor(s) or one or more hardware processors 102. The steps of the method 200 of the present disclosure will now be explained with reference to the components or blocks of the system 100 as depicted in FIG. 1 and the steps of flow diagram as depicted in FIG. 2. Although process steps, method steps, techniques or the like may be described in a sequential order, such processes, methods, and techniques may be configured to work in alternate orders. In other words, any sequence or order of steps that may be described does not necessarily indicate a requirement that the steps to be performed in that order. The steps of processes described herein may be performed in any order practical. Further, some steps may be performed simultaneously.
The functional block diagram of the system 100, as depicted in FIG. 2, shows that the system 100 has a controller and a social latent space parameters predictor. The controller further includes a Robot-Human attention block, and a state encoder. The social latent space parameters predictor includes an encoder and a decoder. Different steps involved in the process of robotic path planning, as depicted in method 300 in FIG. 3 are explained with reference to the modules/components as depicted in FIG. 1 and FIG. 2.
For explanation purpose, a scenario in which a robot has to navigate from start pose to goal pose with multiple dynamic humans around is considered. For this scenario, at step 302 of the method 300, the system 100 generates, via a multi-modal trajectory forecaster executed by one or more hardware processors, a social motion latent space for the robot. The social motion latent space includes multiple future trajectories predicted for the robot which maybe social in nature, which may have been generated by considering past motion information of neighbors of the robot, i.e., the humans. In an embodiment, the multi-modal trajectory forecaster maybe any known and suitable trajectory forecaster, for example, Social-VAE.
Further, at step 304 of the method 300, the controller models a robot-human self-attention, by processing humans' awareness towards the robot and one or more quantities estimated with respect to a) a state of the robot, and b) state of one or more humans. Modelling of the robot-human self-attention includes generating a plurality of weights via a sequence of differentiable operations, by processing a) a linearly transformed state of the robot, and b) a linearly transformed state of one or more humans and associated awareness state, generated from the state of the robot, and the state of one or more humans and wherein the generated plurality of weights represent a relative importance of each of the one or more humans to the robot.
Further, at step 306 of the method 300, the controller samples the social motion latent space to select a plurality of latent codes. During the sampling of the social motion latent space, the plurality of latent codes are computed using a plurality of control variables via a reparameterization operation. The plurality of control variables are estimated based on one or more human features weighted by the modelled robot-human self-attention, one or more robot state features, and the social motion latent space parameters for a current time-step. The selected plurality of latent codes are goal-directed and socially compliant with respect to an environment in which the robot is located. In an embodiment of the processor implemented method, sampling the social motion latent space by the controller module comprises processing a) the linearly transformed state of the robot, b) the linearly transformed states of one or more humans, weighted by the generated plurality of weights, and c) one or more linearly transformed social motion latent space parameters for a current time-step, to produce a plurality of perturbation vectors via a sequence of differentiable operations, wherein the plurality of perturbation vectors are converted to a latent code using a differentiable reparameterization operation.
Further, at step 308 of the method 300, a plurality of motion vectors are generated via the decoder of the multi-modal trajectory forecaster, by transforming the plurality of latent codes. As the plurality of latent codes are goal-directed and socially compliant, the plurality of motion vectors generated from the plurality of latent codes also are goal-directed and socially compliant, and these motion vectors are in turn used to control motion of the robot to enable the robot to move towards the defined goal in the socially compliant manner.
In order to control the motion of the robot, further, at step 310 of the method 300, the system 100 maps the plurality of motion vectors to an action space of the robot. By mapping the plurality of motion vectors to the action space of the robot, one or more control commands to navigate the robot towards a defined goal in a socially compliant way are generated. Further, at step 312 of the method 300, the robot is navigated towards the defined goal, via the one or more hardware processors, based on the one or more control commands.
The modelling of the robot-human self-attention facilitates the robotic path planning by considering relative importance of each of the one or more humans to the robot. For example, consider that two humans, at an instance, are at same distance from the robot. However, their movement pattern, which may have been determined in terms of past positions (i.e., position at a selected number of past instances) indicate that one of the humans is moving towards the robot, while the other human is moving away from the robot. As the chances of the human moving towards the robot has high chances of colliding with the robot as compared to the human who is moving away from the robot, the system 100 may plan the route of the robot by giving more importance to the human who is moving towards the robot, and relatively less importance to the human who is moving away from the robot.
In an embodiment, the controller module is modelled as a Markov decision Process (MDP), and is trained using a Reinforcement Learning (RL) framework, and wherein the training using the RL framework comprises of training an RL agent based on a plurality of parameters representing a) the humans’ awareness towards the robot, b) the state of the robot, c) the state of one or more humans, and d) the social motion latent space parameters. Modelling and training of the controller is further explained below.
Let x and n be the robot’s and its neighboring humans’ past motion information. The encoder E of the social latent space parameters predictor takes x and n as inputs and produces social latent space parameters Zµ and Zs. The controller, from a distribution N (Zµ, Zs), produces a latent vector Z which the decoder uses to predict a socially plausible motion vector for a next time instance relative to current position of the robot (?x, ?y).
As the latent vectors generated by the social latent space parameters predictor lack control over the generation process to enforce specific goals or norms, the controller learns a policy to shape the robot’s behaviour with different rewards and penalties, controlling the sampling process based on the social latent space.
The process of controlling the generation process is modelled as the MDP described by the tuple < S, A, P, R, ?, S0>, where S is the overall state of the environment which is derived from the robot states, humans’ observable states, and the social latent space parameters (Zµ, Zs) corresponding to the robot.
State:
At any time instance t, the state st is made up of robot’s state (R) and all humans’ observable states (H). Let R be the robot state, comprising of position of a defined goal with respect to the robot (?px?_g^r,?py?_g^r ) , relative velocity of the defined goal with respect to the robot (?vx?_g^r,?vy?_g^r ), sum of radii of robot and human (radr + radh), angle between the robot’s velocity vector and a goal vector with respect to the robot thetag, and distance to the defined goal dg. Similarly, let h be human’s state vector comprising of their position with respect to the robot (?px?_h^r,?py?_h^r ), relative velocity with respect to robot (?vx?_h^r,?vy?_h^r ), the human’s radius radh, distance to robot dr, angle between robot’s velocity vector and human’s position vector with respect to the robot theta_hr_rv, angle between relative goal vector and human’s position vector with respect to the robot theta_hr_gr, the awareness a (a = 1.0 if human is aware and a = 0.0 if unaware). Let there be N humans in a scene being considered, then H = {h1,h2,...,hN} is the state of all humans in the scene.
Estimating human’s state of awareness towards the robot:
Consider ? to be the direction of gaze/head of the human and G be the set of simple activities, e.g., ? = {using cellphone, texting, reading, browsing the cellphone, picking up an object, ...}, i.e., G ? ?. Then at any time instance t, the following heuristics can be used to model a human’s (h) state of awareness towards the robot (r):
a_t^h (O^h,G^h ) = {¦(aware-of-robot if G^h ? ? nFoV (O^h,p_r )@ unaware-of-robot if G^h ? ? )¦
--- (1)
where pr is the robot’s pose, FoV(·) is a function that returns True if the robot (r) is in the field-of-view of the human h.
The attention block learns to generate attention scores for each human based on the robot and humans’ states. This attention function computes the attention scores using a query Q and a key K, and applies the normalized score to a value V.
Attention (Q,K,V)=softmax ((QK^T)/v(d_k )) V ---- (2)
Where,
Q= ?MLP?_q (H) ? R^(N*d_q ),K= ?MLP?_k (R)? R^(1*d_k ),V= ?MLP?_? (V)? R^(N*d_? ), and dq, dk, and dv are dimension of query, key, and values, respectively.
This approach of attention modelling allows the robot to attend differently to different humans, indicated in FIG. 4D by the black circle whose radius is a proxy for level of attention. After attention is computed and the score is accordingly applied to human features, an overall state St is formed which is the concatenation of weighted average of all the human features Attn H = PV , the robot features after being encoded R enc = MLPr (R), and the latent distribution parameters Zµ and Zs, concatenated and encoded to give
Z_state= ?MLP?_zstate (Z_µ? Z_s )
From this, an overall state S_t is obtained as:
S_t= [Attn_H ? R_enc?Z_state ]
The overall state St is fed to the encoder which maybe a Multi Layer Perception (MLP) network. Outcome of the encoder is then mapped to value Vt and a continuous policy p(at|st) where the action at is the control variable (in this case ?t) obtained as N(?t|?µ(st),?s(st)) which is used to generate a latent code Zcontrol. This latent code is then fed to the decoder D of the trajectory forecaster to produce next objective oriented motion vector ?xcontrol, ?ycontrol. The motion vector is converted to velocity by dividing by the time step and mapped to discrete action space.
Zcontrol = Zµ + Zs * ?t
(?xcontrol, ?ycontrol) = D (Zcontrol * 4.0)
(vx,vy) = ToDiscreteAction (??x?_control/?t,??y?_control/?t)
where, D resembles the decoder of the trajectory forecasting block. In an embodiment, the trajectory forecaster is frozen during training and testing.
Reward function:
A dense reward function is used during training of the RL agent, which rewards the robot for reaching or approaching the goal and penalizes collisions or getting too close to humans at any time t.
r_t (s_t,a_t )= {¦(10, if d_g^t=?_robot @¦(-20, if d_t^min<0@¦(4.0 (d_min^t-0.25), if 0 < d_t^min <0.25 @2.0 (d_goal^(t-1)-d_goal^t ), otherwise )))¦
In the above reward function, rt(st,at), at any time instance t, dtmin is the distance between the robot and the closest human, dtgoal is L2 norm between the robot and its goal, ?goal is the robot’s radius. The value of a state at time t is estimated using the n-step return scheme:
V_t (s_t )= ?_(i=t)^(t+n)¦?(r_t (s_t,a_t ).?^(i-t) )+ (V_(t+n+1) ) ^ ?
Where, ? is discount factor and (V_(t+n+1) ) ^ is a network predicted value of next state.
In an embodiment, a policy based model free RL approach maybe used, which may directly parameterize the policy p(a|s,?). In addition, an actor-critic framework (A2C) maybe used for optimizing the controller.
During end-to-end training of the RL agent, a plurality of trajectories are rolled out for n-steps from a plurality of synchronized environments and are recorded as a batch to update the policy. Number of aware humans in each environment is randomly selected. Policy updates may use n steps per environment, where value of n may be pre-configured or dynamically configured. For example, (n=5) with discount factor ? = 0.90, policy loss coefficient ? = 1.0, entropy term coefficient ? = 0.001, and critic loss coefficient ? = 0.25 as coefficients. The network maybe then trained for a plurality of episodes, with each episode running a defined number of steps.
EXPERIMENTAL DATA:
Architecture details.
Architecture used in experimental setup, as in FIG.2, may have the following specifications. MLPq and MLPv are made up of two full-connected (FC) layers whose input and output feature dimensions are 9 (length of human state vector) and 256, respectively with ReLU activation between the two – [FC(9,256), ReLU, FC(256,256)]. MLPk and MLPr are [FC(7,256), ReLU, FC(256,256)], where 7 is length of robot’s state vector. The MLPzstate is made as [FC(64,256), ReLU, FC(256,256)] where 64 is the length of the Zstate (as length of Zµ and Zs are 32 element vectors). The state encoder is composed as [FC(768, 512), ReLU, FC(512, 512), ReLU, FC(512, 256)]. Finally, the output of the state encoder is mapped to V (st) using FC(256,1) and mapped to ?µ(st) and ?s(st) using [FC(256, 32), Tanh]. As Tanh activation is used in the policy, whose output range is [-1, 1], the latent code Zcontrol is scaled by 4.0 to make sure that the almost the entire standard normal distribution is covered, as ?t ? N(0,I).
A. Simulation environment
CRI-Linear simulation environment was used, which controls its agents (considered as humans) using ORCA.
TABLE I
OVERALL EFFICACY
Method Success(%)? NT(s)? Jerk(m/s3)? TL(m)? Disc? Soc(m)?
CRI-Linear 96.8% 12.25 0.58 9.917 2.53 1.34
Decentralized Structural RNN (DSRNN) 94% 14.51 0.17 13.48 0.46 0.62
Relational Graph Learning (RGL) 97% 12.20 0.31 10.017 4.35 0.96
CrowdNav++ 82% 13.07 0.17 10.95 0.39 1.00
DSRNN - D 94.8% 14.72 0.35 13.55 0.58 0.60
CrowdNav++ - D 95% 13.60 0.43 12.93 0.49 1.54
METHOD 300-D (method 300) 98.2% 13.00 0.15 10.69 0.37 1.01
TABLE II
PERFORMANCE WITH CONTINUOUS ACTIONS
Method Success(%)? NT(s)? Jerk(m/s3)? TL(m)? Soc(m)? Disc?
METHOD 300-C
(method 300) 98.4% 11.55 0.13 10.80 1.16 0.29
CrowdNav++ 82% 13.07 0.17 10.95 1.00 0.39
DSRNN 94% 14.51 0.17 13.48 0.62 0.46
For fairness in comparisons and evaluation, like all the baselines, a circle crossing scenario has been used. In this scenario there were N = 5 humans randomly positioned on a circle of radius 4.5m. Unlike the conventional discomfort region of radius 0.25m around the humans. Starting positions of humans also were randomly perturbed and the goal was on the same circle but in the opposite direction. Each agent can move at a maximum speed defined by vpref which was set to be 1 m/s. The simulation ran at a speed of 4 steps/sec i.e., ?t = 0.25 sec. All the evaluations and comparisons were done using the 500 random test cases of simulation framework. In order to evaluate the method 300 for scenario with aware and unaware humans, the human agents were appended with the awareness attribute which allowed to control if the human reciprocates to robot’s motion.
The following evaluation metrics were used to evaluate performance of the method 300 in comparison with baselines considered.
Success: The fraction of episodes in % where the robot reached its goal to the total number of episodes.
Average Navigation Time(NT): The average time taken by the robot to reach its goal from the starting position across all episodes
Average Jerk: The average change in acceleration per unit time across all episodes. Lower values indicate smoother movements.
Average Trajectory Length(TL): The average distance traveled by the robot from the starting position to the goal across all successful episodes.
Average Discomfort(Disc): The ratio of the number of timesteps where the robot enters the discomfort region(0.25m) of at least one human to the total number of timesteps.
Sociability(Soc): The average closest distance between a human and the robot across all episodes in which the robot is in navigable field of view of the human. Not going too close in the humans FOV indicates robot’s ability to anticipate human motion. A sociability value =1.0 is preferable as based on theory of proxemics [21] which states that an area with 1m radius circle around human is approximately equal to the personal space region.
<28?: This refers to the proportion of successive steps(in %) in which the angle deviation is <28?.
Mean/Std?: This pertains to the mean and the standard deviation of the angular difference between successive steps. Lower values indicate legible trajectories.
C. Experimentation Results
Baselines: The method 300 was be compared with baselines, including DSRNN, CRI-Linear, RGL , and CrowdNav++ (has future collision penalty in its reward structure), using the aforementioned metrics. The method CRI assumed that the next step ground truth state is provided during training which is an unrealistic assumption. The method 300 used a more realistic version of this, CRI-Linear which utilized a linear motion model where agents maintain their velocities as in their previous state. CRI-Linear and RGL used discrete action spaces. DSRNN and Crowdnav++ used continuous actions by default. To show the results of the discrete versions of versions of DSRNN and Crowdnav++ their continuous actions were mapped to the closest action in the discrete action space mentioned as DSRNN - D and Crowdnav++ - D.
TABLE III
SMOOTHNESS STATISTICS
Method <28?? Mean/Std?? NT(s)? TL(m)?
CRI-Linear 80.3% 24.57/41.80 12.25 9.917
RGL 92.00% 14.31/35.92 12.20 10.02
DSRNN 89.80% 20.83/56.34 14.51 13.48
CrowdNav++ 87.90% 20.82/59.35 13.07 10.95
DSRNN - D 89.55% 22.15/54.51 14.72 13.55
CrowdNav++ - D 79.46% 32.78/62.15 13.60 12.93
Method 300-C 96.24% 9.02/25.19 11.55 10.80
Method 300-D 95.99% 8.75/27.98 13.00 10.69
Overall Efficacy: Purpose of the analysis was to determine the effectiveness of the method 300. The results, as outlined in Table I, indicated that the method 300 outperforms the other approaches across several metrics. Specifically, the method 300 achieved the best results for success rate, average jerk, and average discomfort.
The method 300 achieved a 98.2% success rate, indicating it is safe and has strong goal-reaching ability. The method 300 resulted in an average jerk of 0.15m/s3, indicating legible trajectories compared to other approaches. The analysis also showed a low discomfort frequency to other humans, with an average of 0.37, the best among the compared approaches.
Furthermore, despite its safe nature, the robot exhibited comparable navigation time and trajectory length to CRILinear and RGL. These methods exhibited jerky trajectories that were not particularly legible. The method 300 had second-best sociability score of 1.01m, only losing out to CRI-Linear, which had a score of 1.34m. This indicates that the method 300 is able to maintain a safe distance from other agents despite not having any reward for this property.
FIG. 4A depicts qualitative results of the method 300. It is evident from FIG. 4A that the has the capability of producing trajectories that are both socially acceptable and smooth. It is evident how the robot anticipate human’s future motion and accordingly change its trajectory to avoid collision.
FIG. 4B depicts a qualitative comparison of the method 300 with various baselines. Results reveal that the method 300 generated trajectories that are not only smoother than the baselines, but also short in length. This shows that the method 300 has been able to strike balance between smooth and short trajectories.
Exploiting social motion latent space of a forecasting network:
Advantages of generating actions from the social latent space of a motion forecasting network are elaborated here. Quality of robot trajectories depends on properties such as smoothness, operating in discrete/continuous action spaces, and performing well with limited speed. The method 300 was compared to baselines on these properties and found that the method 300 outperformed them quantitatively and qualitatively, demonstrating the effectiveness of generating actions from the social motion latent space.
Navigating using continuous actions is important as it allows for smooth movement through the environment, enabling precise adjustments to be made in response to environmental changes. The method 300 demonstrated the ability to navigate using continuous actions despite being trained using a discrete action space. Table II compared the method 300 with the baselines, including Crowdnav++ and DSRNN, and observed that the method 300 outperformed them on all six evaluation metrics. Specifically, the method 300 achieved a 98.2% success rate, indicating a high capability to reach the goal. The trajectories obtained by the method 300 were found to be efficient, with a low navigation time of 11.5s and shorter trajectory lengths of 10.80m. The method 300 achieved a low jerk of 0.13m/s3, demonstrating the ability to produce smooth trajectories compared to the baselines. Furthermore, the method 300 achieved trajectories that were social in nature, with a sociability of 1.16m and an average discomfort frequency of 0.29. Overall, the results demonstrated that the use of the social latent space of a motion forecasting network, combined with the ability to navigate using both continuous and discrete action spaces, enhances the performance of our method compared to baselines. FIG. 4C depicts the qualitative outcomes of two different episodes when the robot takes continuous actions.
Smooth robot trajectories are vital for efficient navigation, offering benefits such as energy efficiency, safety, precision, robustness, and human-like behavior. Table III compares the smoothness statistics of the method 300 with baselines on four performance metrics, including the fraction of robot steps with angle deviation <28?, Mean/Std? of angle change between time steps, navigation time, and average trajectory length. Two variants of the method 300, termed as METHOD 300-C and METHOD 300-D, operate using continuous and discrete actions, respectively. The results indicated that the method 300 produced smoother trajectories, achieving 96.24% and 95.99% for the first metric <28?. Furthermore, the method 300-D had the best mean deviation of 8.75?, while METHOD 300-C had the best standard deviation of 25.19?, and also achieved faster navigation time of 11.55s compared to baselines.
For cases where maximum speed was capped to 0.75m/s and 0.5m/s a * is added next to CrowdNav++ and DSRNN. This indicates that the goal radius has been increased to 0.5m and 1m for these methods for fairness. Without this, both these methods resulted in the robot timing out very close to the goal for many episodes(30% - 40%) when the goal radius was set to the original value of 0.3m.
Despite not incorporating future collision reward in the method 300, the system 100 was able to achieve anticipatory robot motion as action is generated from the social motion latent space learned by the social trajectory forecaster, which inherently encoded this information. To validate this, a version of the model used in the method 300 was trained with future collision check in the reward structure similar to Crowdnav++. Table VII shows that the method 300 outperformed Crowdnav++ (both discrete and continuous). The performance is consistent even for the continuous versions.
Ablation: To understand the benefit of generating robot actions from a social motion latent space, a variant of the data model that directly produces probabilities for the used discrete actions was used. From the ablation Table V it is clear that with negligible degradation in success rate, the model used in the method 300 outperforms the baselines. The model used in the method 300 achieved smooth trajectories even without a future collision reward, as used in Crowdnav++. Avoiding future collision check would prevent the generated policy from getting effected by the prediction errors of the trajectory forecaster.
Social Awareness: Table IV demonstrates benefits and outcomes of integrating human awareness into the robot state. Three sets of results are shown, where in the first two sets the effect of increasing the number of humans being aware is depicted. In a third set, the effect of not incorporating awareness in the robot state is covered.
In the first set, the results of the method 300 are demonstrated using discrete action space. It is noteworthy that an increase in the number of humans’ awareness leads to an overall improvement in most metrics. When all humans were aware (100%), the method 300 achieved a success rate of 100%. Moreover, the robot’s trajectories were improved, with a navigation time of 10.38s and a trajectory length of 8.80m. Since the humans tend to yield to the robot, the robot needed to take fewer deviations, resulting in less jerk of 0.007m/s3 and minimal discomfort. In addition, low angular deviations were observed. Lastly, in the third set where all humans were aware, performance of the method 300 when all humans are aware but the robot perceives them all unaware is covered. In such a case, important feature to observe is that while success rate is not affected, the navigation time increases, and robot keeps larger distance to humans. This shows that it is needed for the robot to be cognizant of the humans’ awareness (i.e., have it as a state element) to benefit from their positive interactions i.e. yielding to the robot.
Method (max speed) Success(%)? NT(s)? Jerk(m/s3)? TL(m)? Soc(m)? Disc?
Method 300 (1m/s) 98.4% 11.55 0.13 10.80 1.16 0.29
CrowdNav++ (1m/s) 82% 13.07 0.17 10.95 1.00 0.39
DSRNN (1m/s) 94% 14.51 0.17 13.48 0.62 0.46
Method 300 (0.75m/s) 98.8% 14.09 0.07 10.27 1.30 0.21
CrowdNav++ (0.75m/s) (0.5 goal radius)*2 96% 15.23 0.09 10.12 1.44 0.17
DSRNN (0.75m/s) (0.5 goal radius)* 91% 18.68 0.07 12.70 0.75 1.17
Method 300 (0.5m/s) 96.4% 19.50 0.06 9.83 1.50 0.39
CrowdNav++ (0.5m/s) (0.5 goal radius)* 93% 20.18 0.03 9.23 1.41 0.42
DSRNN (0.5m/s) (1.0 goal radius)* 76% 20.41 0.02 9.54 1.21 1.91
TABLE IV
SPEED CONSTRAINT STATISTICS
TABLE V
ABLATION
Method Success(%)? Jerk(m/s3)? <28?? Mean/Std??
Method 300 98.2% 0.152 95.99 8.75/27.98
Baseline 1 98.8% 0.178 94.97% 9.51/28.34
Baseline 2 99.4% 0.606 76.41% 34.35/72.70
TABLE VI
EFFECT OF INCORPORATING HUMAN AWARENESS INTO THE SOCIAL NAVIGATION FRAMEWORK
Method Awareness Success(%)? NT(s)? Jerk(m/s3)? TL(m)? Soc(m)? <28?? Mean/Std?? Disc?
METHOD 300-D 0% 98.2% 13.0 0.152 10.69 1.01 95.99% 8.75/27.98 0.37
METHOD 300-D 60% 98.8% 11.77 0.124 9.79 1.01 97.55% 5.94/20.44 0.27
METHOD 300-D 100% 100% 10.38 0.007 8.80 0.93 99.04% 0.22/2.30 0.00
METHOD 300-C 0% 98.4% 11.55 0.13 10.80 1.16 96.24% 9.02/25.19 0.29
METHOD 300-C 60% 98.4% 10.77 0.10 9.93 1.00 98.20% 6.41/15.86 0.33
METHOD 300-C 100% 100% 10.43 0.05 8.92 0.84 87.89% 2.48/2.99 0.01
METHOD 300 100% (perceived unaware) 100% 12.15 0.12 10.18 1.06 96.67% 6.93/26.67 0.00
METHOD 300-C 100% (perceived unaware) 100% 11.13 0.11 10.44 0.96 97.10% 7.58/21.87 0.01
TABLE VII
NEGLIGIBLE DEGRADATION DUE TO ABSENCE OF FUTURE COLLISION CHECK REWARD
Method Success(%)? NT(s)? Jerk(m/s3)? TL(m)? Soc(m)? <28?? Mean/Std?? Disc?
METHOD 300-D 98.2% 13.0 0.15 10.69 1.01 95.99 8.75/27.98 0.37
Abl-D 98.8% 12.77 0.17 10.51 1.05 94.97 9.51/28.34 0.34
CrowdNav++ - D 95% 13.60 0.43 12.93 1.54 79.46 32.78/62.15 0.49
METHOD 300 - C 98.4% 11.55 0.13 10.80 1.16 96.24 9.02/25.19 0.29
Abl - C 98.6% 11.64 0.18 10.66 1.17 93.55 11.82/30.61 0.23
CrowdNav++ - C 82% 13.07 0.17 10.95 1.00 87.90 0.82/59.35 0.39
FIG. 4E displays qualitative results from different episodes, with each row representing a distinct scenario. The first row pertains to the case where three humans exhibit social awareness, the second row showcases two aware humans, and the final row highlights the instance where all humans demonstrate awareness. Within the first row, the unaware and aware humans are distinguished by purple and green colours, respectively, in the first and second box. It is observed that the unaware human does not yield to the robot in the first box, continuing along its current path. In contrast, the aware human in the second box provides way to the robot, enabling a more efficient trajectory to the goal. Notably, an increase in the number of aware humans results in a corresponding improvement in the quality of the robot’s trajectory.
The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
The embodiments of present disclosure herein address unresolved problem of robotic path planning in crowded environments. The embodiment thus provides a mechanism of sampling a social motion latent space to select a plurality of latent codes, which are then used to generate a plurality of motion vectors. Moreover, the embodiments herein further facilitate navigation of the robot using the plurality of motion vectors.
It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g., any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g., hardware means like e.g., an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g., an ASIC and an FPGA, or at least one microprocessor and at least one memory with software processing components located therein. Thus, the means can include both hardware means and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g., using a plurality of CPUs.
The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various components described herein may be implemented in other components or combinations of other components. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
It is intended that the disclosure and examples be considered as exemplary only, with a true scope of disclosed embodiments being indicated by the following claims.
, Claims:We Claim:
1. A processor implemented method (300), comprising:
generating (302), via a multi-modal trajectory forecaster executed by one or more hardware processors, a social motion latent space for a robot;
modelling a robot-human self-attention (304), by a controller module executed by the one or more hardware processors, by processing humans' awareness towards the robot and one or more quantities estimated with respect to a) a state of the robot, b) state of one or more humans;
sampling (306), by the controller module executed by the one or more hardware processors, the social motion latent space to select a plurality of latent codes, computed using a plurality of control variables via a reparameterization operation, wherein the plurality of control variables are estimated based on one or more human features weighted by the modelled robot-human self-attention, one or more robot state features, and the social motion latent space parameters for a current time-step, and wherein the selected plurality of latent codes are goal-directed and socially compliant with respect to an environment in which the robot is located;
generating (308), via a decoder of the multi-modal trajectory forecaster, a plurality of motion vectors by transforming the selected plurality of latent codes;
mapping (310), via the one or more hardware processors, the plurality of motion vectors to an action space of the robot, wherein by mapping the plurality of motion vectors to the action space of the robot, one or more control commands to navigate the robot towards a defined goal in a socially compliant way are generated; and
navigating (312), via the one or more hardware processors, the robot towards the defined goal, based on the one or more control commands.
2. The processor implemented method as claimed in claim 1, wherein modelling the robot-human self-attention comprises generating a plurality of weights via a sequence of differentiable operations, by processing a) a linearly transformed state of the robot, and b) a linearly transformed state of one or more humans and associated awareness state, and wherein the generated plurality of weights represent a relative importance of each of the one or more humans to the robot.
3. The processor implemented method as claimed in claim 2, wherein information on the relative importance of each of the one or more humans to the robot is used to generate one or more goal-directed and socially compliant control commands by the controller module to navigate the robot towards the defined goal.
4. The processor implemented method as claimed in claim 2, wherein sampling the social motion latent space by the controller module comprises: processing a) the linearly transformed state of the robot, b) the linearly transformed states of one or more humans, weighted by the generated plurality of weights, and c) one or more linearly transformed social motion latent space parameters for a current time-step, to produce a plurality of perturbation vectors via a sequence of differentiable operations, wherein the plurality of perturbation vectors are converted to a latent code using a differentiable reparameterization operation.
5. The processor implemented method as claimed in claim 1, wherein the controller module is modelled as a Markov decision Process (MDP), and is trained using a Reinforcement Learning (RL) framework, and wherein the training using the RL framework comprises of training an RL agent based on a plurality of parameters representing a) the humans’ awareness towards the robot, b) the state of the robot, c) the state of one or more humans, and d) the social motion latent space parameters.
6. A system (100), comprising:
one or more hardware processors (102);
a communication interface (112); and
a memory (104) storing a plurality of instructions, wherein the plurality of instructions cause the one or more hardware processors to:
generate, via a multi-modal trajectory forecaster executed by one or more hardware processors, a social motion latent space for a robot;
model a robot-human self-attention, by a controller module executed by the one or more hardware processors, by processing humans' awareness towards the robot and one or more quantities estimated with respect to a) a state of the robot, b) state of one or more humans;
sample, by the controller module executed by the one or more hardware processors, the social motion latent space to select a plurality of latent codes, computed using a plurality of control variables via a reparameterization operation, wherein the plurality of control variables are estimated based on one or more human features weighted by the modelled robot-human self-attention, one or more robot state features, and the social motion latent space parameters for a current time-step, and wherein the selected plurality of latent codes are goal-directed and socially compliant with respect to an environment in which the robot is located;
generate, via a decoder of the multi-modal trajectory forecaster, a plurality of motion vectors by transforming the selected plurality of latent codes;
map the plurality of motion vectors to an action space of the robot, wherein by mapping the plurality of motion vectors to the action space of the robot, one or more control commands to navigate the robot towards a defined goal in a socially compliant way are generated; and
navigate the robot towards the defined goal, based on the one or more control commands.
7. The system as claimed in claim 6, wherein the controller module is configured to model the robot-human self-attention by generating a plurality of weights via a sequence of differentiable operations, by processing a) a linearly transformed state of the robot, and b) a linearly transformed state of one or more humans and associated awareness state, and wherein the generated plurality of weights represent a relative importance of each of the one or more humans to the robot.
8. The system as claimed in claim 7, wherein the controller module is configured to use the information on the relative importance of each of the one or more humans to the robot to generate one or more goal-directed and socially compliant control commands to navigate the robot towards the defined goal.
9. The system as claimed in claim 7, wherein the controller module is configured to sample the social motion latent space by processing a) the linearly transformed state of the robot, b) the linearly transformed states of one or more humans, weighted by the generated plurality of weights, and c) one or more linearly transformed social motion latent space parameters for a current time-step, to produce a plurality of perturbation vectors via a sequence of differentiable operations, wherein the plurality of perturbation vectors are converted to a latent code using a differentiable reparameterization operation.
10. The system as claimed in claim 6, wherein the controller module is modelled as a Markov decision Process (MDP), and is trained using a Reinforcement Learning (RL) framework, and wherein the training using the RL framework comprises of training an RL agent based on a plurality of parameters representing a) the humans’ awareness towards the robot, b) the state of the robot, c) the state of one or more humans, and d) the social motion latent space parameters.
Dated this 29th Day of August 2023
Tata Consultancy Services Limited
By their Agent & Attorney
(Adheesh Nargolkar)
of Khaitan & Co
Reg No IN-PA-1086
| # | Name | Date |
|---|---|---|
| 1 | 202321057947-STATEMENT OF UNDERTAKING (FORM 3) [29-08-2023(online)].pdf | 2023-08-29 |
| 2 | 202321057947-REQUEST FOR EXAMINATION (FORM-18) [29-08-2023(online)].pdf | 2023-08-29 |
| 3 | 202321057947-FORM 18 [29-08-2023(online)].pdf | 2023-08-29 |
| 4 | 202321057947-FORM 1 [29-08-2023(online)].pdf | 2023-08-29 |
| 5 | 202321057947-FIGURE OF ABSTRACT [29-08-2023(online)].pdf | 2023-08-29 |
| 6 | 202321057947-DRAWINGS [29-08-2023(online)].pdf | 2023-08-29 |
| 7 | 202321057947-DECLARATION OF INVENTORSHIP (FORM 5) [29-08-2023(online)].pdf | 2023-08-29 |
| 8 | 202321057947-COMPLETE SPECIFICATION [29-08-2023(online)].pdf | 2023-08-29 |
| 9 | 202321057947-FORM-26 [29-09-2023(online)].pdf | 2023-09-29 |
| 10 | 202321057947-Proof of Right [18-12-2023(online)].pdf | 2023-12-18 |
| 11 | Abstract.1.jpg | 2024-01-18 |
| 12 | 202321057947-FORM-26 [07-11-2025(online)].pdf | 2025-11-07 |