Sign In to Follow Application
View All Documents & Correspondence

Systems And Methods For Vision Based Soft Robotic Grasping Using Reinforcement Learning

Abstract: ABSTRACT SYSTEMS AND METHODS FOR VISION-BASED SOFT ROBOTIC GRASPING USING REINFORCEMENT LEARNING This disclosure provides a system and method that modifies vision-based grasp plans generated for parallel-jaw grippers and adapts them to grasping with a four-fingered soft gripper using reinforcement learning. In the present disclosure, grasp plans from a vision subsystem, curvature and torsion of flexible soft-fingers of a soft gripper device are being used as input to a controller unit. Further, the controller unit for the soft-gripper device is trained in simulation and later transferred to physical system using sim-to-real transfer, which require much less time and resources. The controller unit act as a RL agent that adaptively generates one or more optimal control command actions for each finger separately, discovering grasp synergies during training. The method of the present disclosure achieves a better top grasp pick-and-place success rate of 58.4% with the RL agent, compared to conventional approaches where all four fingers simultaneously enclose the object. [To be published with FIG. 1]

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
17 August 2022
Publication Number
08/2024
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
Parent Application

Applicants

Tata Consultancy Services Limited
Nirmal Building, 9th floor, Nariman point, Mumbai 400021, Maharashtra, India

Inventors

1. VATSAL, Vighnesh
Tata Consultancy Services Limited, Gopalan Global Axis, SEZ "H" Block, No. 152 (Sy No. 147,157 & 158), Hoody Village, Bangalore 560066, Karnataka, India
2. GEORGE, Nijil
Tata Consultancy Services Limited, Gopalan Global Axis, SEZ "H" Block, No. 152 (Sy No. 147,157 & 158), Hoody Village, Bangalore 560066, Karnataka, India

Specification

Description:FORM 2

THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003

COMPLETE SPECIFICATION
(See Section 10 and Rule 13)

Title of invention:
SYSTEMS AND METHODS FOR VISION-BASED SOFT ROBOTIC GRASPING USING REINFORCEMENT LEARNING

Applicant
Tata Consultancy Services Limited
A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th floor,
Nariman point, Mumbai 400021,
Maharashtra, India

Preamble to the description:
The following specification particularly describes the invention and the manner in which it is to be performed.
TECHNICAL FIELD
The disclosure herein generally relates to the field of robotic grasping, and, more particularly, to systems and methods for vision-based soft robotic grasping using reinforcement learning.

BACKGROUND
Robots find application in routine repetitive tasks and labor-intensive tasks which can be automated and reduces manual interventions. Examples of the routine and repetitive and labor-intensive task include arrangement and restocking of items in a supermarket, handling of dangerous chemicals in industry and laboratories, automation of manufacturing lines, and/or the like. However, a robot’s inability to handle delicate and deformable objects is a bottleneck for their application in some of the above-mentioned examples. A solution to this problem is use of soft-grippers which are made of soft materials and can be used for handling a larger spectrum of objects. While the soft nature of gripper broadens their use-case, effectively controlling the soft-gripper is expensive.
While few conventional methods exist for control of the soft-grippers using vision input, they rely on end-to-end training of deep learning (DL) or combination of vision (DL) and reinforcement learning (RL) networks on physical systems. Traditionally, the controller training has been done on multiple physical systems parallelly for multiple weeks at a time, which are resource and computationally intensive. In addition to being very high energy consuming setups, traditional approaches can cause wear and tear or even damage the physical systems during their training. Further, several vision-based techniques for grasp planning of effectors have been successfully deployed in pick and-place tasks. However, they assume the gripper to be of a rigid parallel-jaw type or a single-point vacuum suction-based design for computing the optimal grasp.

SUMMARY
Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one embodiment, vision-based robotic grasping system is provided. The vision-based robotic grasping system comprises a robotic manipulator; a soft gripper device comprising a plurality of fingers and connected with the robotic manipulator; a vision subsystem comprising: an image capturing device mounted on the robotic manipulator; and a Grasp Generative Convolutional Neural Network (GG-CNN); a controller unit operatively connected to the soft gripper device, the robotic manipulator and the vision subsystem, wherein the controller unit comprises: one or more data storage devices configured to store instructions; one or more communication interfaces; and one or more hardware processors operatively coupled to the one or more data storage devices via the one or more communication interfaces, wherein the one or more hardware processors are configured by the instructions to: receive, (i) a plurality of successive images of a target object corresponding to one or more views of one or more scenes in a robotic environment using the image capturing device, and (ii) one or more curvature measurements and one or more torsion measurements of the plurality of fingers of the soft gripper device; input, the plurality of successive images of the target object corresponding to the one or more views of the one or more scenes in the robotic environment to the Grasp Generative Convolutional Neural Network (GG-CNN) for generating an optimal set of grasp vectors, wherein the optimal set of grasp vectors provides an optimal grasp direction for actuating the soft gripper device to grasp the target object; iteratively train, the controller unit with (i) the optimal set of grasp vectors, (ii) the one or more curvature measurements and the one or more torsion measurements of the plurality of fingers of the soft gripper device, and (iii) a reward function computed for a specific action of the soft gripper device using a Soft Actor-Critic (SAC) algorithm, wherein the controller unit is a reinforcement learning (RL) based agent; and adaptively generate, one or more optimal control command actions by continuously optimizing the reward function based on one or more feedback variables obtained from the robotic environment using the iteratively trained controller unit, wherein the one or more optimal control command actions are used to generate one or more grasp plans for executing movements of one or more parts of the soft gripper device to optimally grasp the target object.
In another aspect, a processor implemented method is provided. The method includes receiving, via a first module and a second module executed by one or more hardware processors, (i) a plurality of successive images of a target object corresponding to one or more views of one or more scenes in a robotic environment using the image capturing device, and (ii) one or more curvature measurements and one or more torsion measurements of the plurality of fingers of the soft gripper device; inputting, via the first module executed by the one or more hardware processors, the plurality of successive images of the target object corresponding to the one or more views of the one or more scenes in the robotic environment to the Grasp Generative Convolutional Neural Network (GG-CNN) for generating an optimal set of grasp vectors, wherein the optimal set of grasp vectors provides an optimal grasp direction for actuating the soft gripper device to grasp the target object; iteratively training, via a third module executed by the one or more hardware processors, the controller unit with (i) the optimal set of grasp vectors, (ii) the one or more curvature measurements and the one or more torsion measurements of the plurality of fingers of the soft gripper device, and (iii) a reward function computed for a specific action of the soft gripper device using a Soft Actor-Critic (SAC) algorithm, wherein the controller unit is a reinforcement learning (RL) based agent; and adaptively generating, via the third module implemented by the one or more hardware processors, one or more optimal control command actions by continuously optimizing the reward function based on one or more feedback variables obtained from the robotic environment using the iteratively trained controller unit, wherein the one or more optimal control command actions are used to generate one or more grasp plans for executing movements of one or more parts of the soft gripper device to optimally grasp the target object.
In yet another aspect, a non-transitory computer readable medium for receiving, via a first module and a second module executed by one or more hardware processors, (i) a plurality of successive images of a target object corresponding to one or more views of one or more scenes in a robotic environment using the image capturing device, and (ii) one or more curvature measurements and one or more torsion measurements of the plurality of fingers of the soft gripper device; inputting, via the first module executed by the one or more hardware processors, the plurality of successive images of the target object corresponding to the one or more views of the one or more scenes in the robotic environment to the Grasp Generative Convolutional Neural Network (GG-CNN) for generating an optimal set of grasp vectors, wherein the optimal set of grasp vectors provides an optimal grasp direction for actuating the soft gripper device to grasp the target object; iteratively training, via a third module executed by the one or more hardware processors, the controller unit with (i) the optimal set of grasp vectors, (ii) the one or more curvature measurements and the one or more torsion measurements of the plurality of fingers of the soft gripper device, and (iii) a reward function computed for a specific action of the soft gripper device using a Soft Actor-Critic (SAC) algorithm, wherein the controller unit is a reinforcement learning (RL) based agent; and adaptively generating, via the third module implemented by the one or more hardware processors, one or more optimal control command actions by continuously optimizing the reward function based on one or more feedback variables obtained from the robotic environment using the iteratively trained controller unit, wherein the one or more optimal control command actions are used to generate one or more grasp plans for executing movements of one or more parts of the soft gripper device to optimally grasp the target object.
In accordance with an embodiment of the present disclosure, each of the plurality of fingers of the soft gripper device is controlled by the one or more optimal control command actions adaptively generated by the controller unit.
In accordance with an embodiment of the present disclosure, the robotic manipulator is a six Degree-of-freedom (DoF) robotic manipulator.
In accordance with an embodiment of the present disclosure, the one or more curvature measurements and the one or more torsion measurements of the plurality of fingers of the soft gripper device are indicative of an abstraction of (i) deformation of the plurality of fingers of the soft gripper device in a reactive manner to the target object when a contact occurs between the plurality of the fingers and the target object and (ii) size and shape of the target object being handled.
In accordance with an embodiment of the present disclosure, the reward function is represented as: ?_(i=1)^4¦?w_1 e^(-w_2 d_(c,i) ) ? +d_h w_3 [(?h-h_t)/(h_f-h_t )]^2, wherein first term w_1 e^(-w_2 d_(c,i) ) penalizes tips of each finger from the plurality of fingers of the soft gripper device for being away from the target object and favors grasps where the tips of each finger from the plurality of fingers closely wrap around the target object and second term d_h w_3 [(?h-h_t)/(h_f-h_t )]^2 rewards lifting the target object off a surface and provides a measure of how tightly the target object has been grasped, and wherein d_(c,i) represents distance between a spherical tip of i^th finger and geometric center of the target object, ?h represents instantaneous difference in height of the target object above its initial Z-position on the surface at start of the training, h_t represents a threshold height above which the reward function is activated, h_f represent a final height above the surface, and w_1, w_2, and w_3 represent parameters selected empirically through results from initial trials while building the reinforcement learning (RL) based agent.
In accordance with an embodiment of the present disclosure, the one or more feedback variables comprises (i) position of each finger from the plurality of fingers of the soft gripper with respect to the target object, and (ii) position of the target object with respect to ground.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

BRIEF DESCRIPTION OF THE DRAWINGS
The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles:
FIG. 1 illustrates an exemplary system for vision-based soft robotic grasping using reinforcement learning according to some embodiments of the present disclosure.
FIG. 2A and 2B depict functional diagrams of the soft gripper comprised in the vision-based robotic grasping system according to some embodiments of the present disclosure.
FIG. 3 illustrates an exemplary block diagram of a controller unit comprised in the vision-based robotic grasping system according to some embodiments of the present disclosure.
FIG. 4 is a functional block diagram of an architecture for vision-based soft robotic grasping using reinforcement learning according to some embodiments of the present disclosure.
FIG. 5 illustrates an exemplary flow diagram illustrating a method for vision-based soft robotic grasping using reinforcement learning in accordance with some embodiments of the present disclosure.
FIG. 6 illustrates a soft grasp planning simulation environment where the objective is to pick the target object and place it in a red region on a table, according to some embodiments of the present disclosure.
FIGS. 7A through 7C depict results of the vision subsystem comprised in the vision-based robotic grasping system, according to some embodiments of the present disclosure
FIGS. 8A and 8B depict a graphical representation illustrating training curves of a reinforcement learning (RL) based agent in terms of episode reward and entropy loss over a predefined number of steps, according to some embodiments of the present disclosure.
FIG. 9 illustrates a graphical representation providing a comparison of grasp success rates for the SAC-based RL agent and the baseline grasp synergy, according to some embodiments of the present disclosure.

DETAILED DESCRIPTION OF EMBODIMENTS
Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the scope of the disclosed embodiments. It is intended that the following detailed description be considered as exemplary only, with the true scope being indicated by the following embodiments described herein.
Grasping is a fundamental capability for robotic systems to be effectively deployed in real-world scenarios. Planning a grasp for a given robotic end-effector involves reasoning about geometries of both a gripper and target object. Therefore, grasp planning is typically formulated as a vision problem, where given certain assumptions on the structure and physical properties of the gripper and the target object, the goal is to determine an optimal pose of the gripper with respect to the target object based on an RGB or RGB-D image.
Although, few conventional approaches to robotic grasp planning being widely used in practice rely on vision-based techniques such as convolutional neural network (CNN) models. But these models typically take a single-view or multi-view image of the target object and generate the optimal pose of the gripper as output, based on a grasp quality metric. Further, these conventional approaches are based on few key assumptions which include that the grippers have rigid contact with the target object, being either of a parallel-jaw or two-fingered design, or based on a single point of contact through vacuum suction. Additionally, the target objects are assumed to be non-deformable under typical grasping forces.
With recent advancements in soft robotics, compliant, underactuated, and deformable grippers are witnessing wider adoption. Compared to rigid designs, soft robotic grippers provide advantages of enhanced safety in collaborative usage scenarios, and adaptability to a larger range of the target objects due to compliance. Further, as robot manipulators advance to be used in unstructured environments, the type of objects they have to grasp also diversify. Traditional rigid grippers and control schemes proved to be insufficient for all use cases. This has led to increased use of the soft robotic grippers and their learning-based control. Soft robotic grippers are grippers made up of soft materials or flexible structures. They are underactuated in the sense that they perform continuous free motion controlled by limited control variables. The soft robotic grippers are normally classified based on type of drive modes such as fluid drive, cable drive, and/or the like. Some examples of the existing soft robotic grippers include Pisa/IIT SoftHand which is a cable driven soft gripper, RBO hand 2 which is a pneumatic driven anthropomorphic robotic gripper and capable of performing grasps similar to a human hand, a pneumatically actuated dexterous soft gripper which demonstrates in-hand manipulation, and DRL Softhand, which is pneumatic driven and has force and bend sensors embedded into it.
There exists a reinforcement learning (RL) based approach that is being used for designing a controller for a soft gripper, where the training data is from human demonstrations and the training of the RL agent is done directly on hardware. Further, an end-to-end deep learning (DL) based approach for generating grasp for soft gripper exists, where an input video stream is converted by a convolutional neural network (CNN) to discrete position outputs for the gripper and wrist, and training is done on labeled data. Few conventional approaches for grasp planning of a soft gripper are equipped with proprioceptive sensors for force and bend, where data from these sensors is used to generate a grasp and identify the target object being grasped.
Few conventional approaches attempted to solve the grasping problem for a two-fingered rigid gripper using a combination of DL (vision) and RL and providing improved performance by incorporating multi-view images. However, these conventional approaches cannot be directly applied to the grasp planning of soft robotic grippers because of their underactuated nature. Further, another conventional approach suggested a mathematical way to find an effective grasp direction for a soft robotic gripper, where the effective grasp direction is then aligned with the grasp closure direction predicted by the DL (vision) network. However, this method cannot be used for all type of soft robotic grippers and hand configurations. Further, the soft robotic grippers also present challenges in planning effective grasps due to difficulties in modeling a contact with the target object and deformations of the fingers. As a result, grasp plans for these types of grippers rely on heuristics that assume that an approximately correct plan will succeed due to compliance, or training reinforcement learning (RL) models directly on physical systems.
Embodiments of the present disclosure provide systems and methods for vision-based soft robotic grasping using reinforcement learning. The present disclosure aims to develop grasp plans for soft robotic grippers in simulation, allowing for greater exploration of grasp strategies compared to direct training on physical systems. Concurrently, it was found that antipodal grasps generated by Convolutional neural network (CNN) models can serve as a good starting point for an RL agent to control a four-fingered soft gripper. More Specifically, the present disclosure describes a grasp planning method which is summarized as follows:
Implementation of a Generative Grasping Convolutional neural network (GG-CNN) in a PyBullet® environment for a table-top pick-and-place task with a 6-degrees of freedom robotic arm, generating an optimal pixel-wise grasp based on an RGB image for a top-down pick.
Using a robot wrist pose and grip width from the GG-CNN, along with computation of three-dimensional (3D) curvature and torsion of each of the fingers of a soft gripper as input observations for an RL agent.
Training a model-free deep RL agent using a Soft Actor-Critic (SAC) algorithm to generate optimal wrist poses and finger bending commands based on the above observations.
In the present disclosure, training and testing of the RL agent was performed on a dataset of randomly-generated 3D objects. The RL agent was compared to a baseline grasp synergy where each of the four fingers of the soft gripper is open and close together, attempting to envelop the target object. The RL agent had a grasp success rate of 58.4% on a test set, compared to 43.2% with the baseline grasp strategy.
Referring now to the drawings, and more particularly to FIGS. 1 through 9, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.
FIG. 1 illustrates an exemplary system 100 (interchangeably referred as vision-based robotic grasping system and used hereafter throughout the description) for vision-based soft robotic grasping using reinforcement learning according to some embodiments of the present disclosure. In an embodiment, the vision-based robotic grasping system 100 includes a robotic manipulator 102, a soft gripper device 104 comprising a plurality of fingers and connected with the robotic manipulator 102, a vision subsystem 106, and a controller unit (108) operatively connected to the soft gripper device (104), the robotic manipulator (102) and the vision subsystem (106).
In the context of present disclosure, a Techman TM5-700® robotic arm, ‘Universal Robotics’ UR5® industrial robotic arm, may act as the robotic manipulator 102 which is a six Degree-of-freedom (DoF) robotic manipulator. This means that the robotic manipulator has six actuating motors which are controlled by six different control signals from the controller unit. However, the present disclosure can be extended for any other type of robotic manipular (Alternatively referred as robotic arm) as long as inverse kinematics are available.
The soft gripper device 104 is attached as an end effector to the robotic manipulator. In the context of present disclosure, the expressions ‘soft griper device’, ‘soft gripper’, and ‘soft robotic gripper’ may be interchangeably used in the description. FIGS. 2A and 2B depict functional diagrams of the soft gripper comprised in the vision-based robotic grasping system according to some embodiments of the present disclosure. In an embodiment, the soft gripper device may be manufactured using methods including but not limited to 3D printing, injection moulding, and/or the like. As shown in FIG. 2A, in the present disclosure, the soft gripper device is configured as a combination of four flexible fingers attached to a base. Each individual finger of the soft gripper device can be controlled independently by a scalar input. In other words, each of the plurality of fingers of the soft gripper device are individually may be actuated by a cable-driven mechanism, a pneumatic driven mechanism and are controlled by four control signals produced by the controller unit. In an embodiment, a CAD model of each finger from the plurality of fingers of the soft gripper device is obtained which is composed of ten segments, approximating a soft and flexible structure, with a spherical tip at the end. In an embodiment, the length and dimensions of the soft gripper device is shown in FIG. 2B. It must be appreciated that design of the soft gripper device used in the present disclosure may be inspired from any state-of-the-art commercial grippers such as mGrip modular gripping system from Soft Robotics®.
The vision subsystem 106 comprises an image capturing device 106 A mounted on the robotic manipulator 102, and a Grasp Generative Convolutional Neural Network (GG-CNN) 106 B. In an embodiment, the image capturing device 106 A could be an optical camera such as a RGB camera, an RGB-D camera, an infrared camera, and/or the like. However, for the ease of explanation, the RGB camera is used in the present disclosure throughout the explanation. The RGB camera is affixed to wrist of the robotic manipulator for perception. Visual inputs from the RGB camera are given as input to the GG-CNN. In the present disclosure, the GG-CNN is a CNN based neural network which is trained on a UCB grasp dataset. Output of the GG-CNN captures essential details in form of grasp vectors for a successful grasp for a rigid two fingered gripper. These grasp vectors are passed as part of an environment state to a reinforcement learning (RL) based agent. In addition to the grasp vectors, the environment state also comprises of additional information in form of curvature of the plurality of fingers of the soft gripper device and position of the wrist of the robotic manipulator. Using these observations, the RL agent generates the required control signals for the four fingers of the soft gripper device, as well as angle and height of the wrist of the robotic manipulator. In the context of present disclosure, the expressions ‘RL agent’, ‘RL model’, ‘RL network’, and ‘RL architecture’ may be interchangeably used in the description.
FIG. 3 illustrates an exemplary block diagram of the controller unit 108 comprised in the vision-based robotic grasping system 100 according to some embodiments of the present disclosure. In an embodiment, the controller unit 108 includes one or more hardware processors 108 C, communication interface device(s) or input/output (I/O) interface(s) 108 B (also referred as interface(s)), and one or more data storage devices or memory 108 A operatively coupled to the one or more hardware processors 108 C.
The one or more processors 108 C may be one or more software processing components and/or hardware processors. In an embodiment, the hardware processors can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the processor(s) is/are configured to fetch and execute computer-readable instructions stored in the memory. In an embodiment, the controller unit 108 can be implemented in a variety of computing systems, such as laptop computers, notebooks, hand-held devices, workstations, mainframe computers, servers, a network cloud and the like.
The I/O interface device(s) 108 B can include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like and can facilitate multiple communications within a wide variety of networks N/W 5 and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. In an embodiment, the I/O interface device(s) 108 B can include one or more ports for connecting a number of devices to one another or to another server.
The one or more data storage devices or memory 108A may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. In an embodiment, a database (not shown) 302 is comprised in the memory 108 A, wherein the database 302 comprises a plurality of successive images of a target object corresponding to one or more views of one or more scenes in a robotic environment.
The database 302 further stores information on robotic manipulator, soft gripper device related information, information on reinforcement learning (RL) based agent, reward function, one or more curvature measurements and one or more torsion measurements of a plurality of fingers of the soft gripper device, one or more optimal control command actions, and optimal set of grasp vectors, and one or more grasp plans.
The database 302 further comprises one or more networks such as Convolutional Neural Networks, and one or more models such as reinforcement learning based models which when invoked and executed perform corresponding steps/actions as per the requirement by the vision-based robotic grasping system 100 to perform the methodologies described herein. The memory 108 A further comprises (or may further comprise) information pertaining to input(s)/output(s) of each step performed by the systems and methods of the present disclosure. In other words, input(s) fed at each step and output(s) generated at each step are comprised in the memory 108 A and can be utilized in further processing and analysis.
FIG. 4, with reference to FIGS. 1-3, depicts a block diagram of an architecture as implemented by the vision-based robotic grasping system 100 of FIG. 1 for vision-based soft robotic grasping using reinforcement learning, according to some embodiments of the present disclosure. As shown in FIG. 4, one or more modules are implemented by the vision-based robotic grasping system 100 of FIG. 1. The one or more modules include a vision module (Also referred as the vision subsystem), an actuation module, and a training module. In an embodiment, the robotic manipulator and the soft gripper device collectively form the actuation module.
FIG. 5, with reference to FIGS. 1-4, depicts an exemplary flow chart illustrating a method 200 for vision-based soft robotic grasping using reinforcement learning, using the vision-based robotic grasping system 100 of FIG. 1, in accordance with an embodiment of the present disclosure.
Referring to FIG. 1, in an embodiment, the vision-based robotic grasping system(s) 100 comprises one or more data storage devices or the memory 108 A operatively coupled to the one or more hardware processors 108 C and is configured to store instructions for execution of steps of the method by the one or more processors 108 C. The steps of the method 200 of the present disclosure will now be explained with reference to components of the vision-based robotic grasping system 100 of FIG. 1, the functional diagrams of FIGS. 2A and 2B, the block diagram of FIG. 3, the block diagram of FIG. 4, and the flow diagram as depicted in FIG. 5.
In an embodiment, at step 202 of the present disclosure, the one or more hardware processors 108 C receive, via a first module and a second module executed by the one or more hardware processors 108 C, (i) a plurality of successive images of a target object corresponding to one or more views of one or more scenes in a robotic environment using the image capturing device, and (ii) one or more curvature measurements and one or more torsion measurements of a plurality of fingers of the soft gripper device. As depicted in the block diagram of FIG. 4, the first module (e.g., refer the vision module in the block diagram) receives the plurality of successive images of the target object using the image capturing device. In other words, the image capturing device streams a video or series of images of a workspace and the target object to be handled. In an embodiment, the second module (e.g., refer the actuation module in the block diagram) receives the one or more curvature measurements and the one or more torsion measurements of the plurality of fingers of the soft gripper device. In an embodiment, the one or more curvature measurements and the one or more torsion measurements of the plurality of fingers of the soft gripper device are indicative of an abstraction of (i) deformation of the plurality of fingers of the soft gripper device in a reactive manner to the target object when a contact occurs between the plurality of the fingers and the target object and (ii) size and shape of the target object being handled.
In other words, in terms of control, each finger is composed of a series of revolute joints connecting successive segments, approximating effect of pneumatic actuation. A single scalar command is given to each finger, with the same command applied to all the segments, resulting in a quadratic curve being traced out by the finger. However, when the fingers encounter the target object, they are bent and distorted according to contact dynamics parameters including friction and restitution coefficients. Each of the segments bends and twists from a default start state. This interaction between the fingers and the target object is captured by geometric quantities of curvature (?) and torsion (t) in 3D space as provided in equation (1) and equation (2) below:
?=|v × a|/|v|^3 (1)
t=(-(v × a) . a ?)/|v × a|^2 (2)
These geometric quantities of curvature (?) and torsion (t)are computed in a piece-wise manner for each segment of the fingers (e.g., ?_i , t_i for the i^th segment). Considering a sequence of position vectors p_i of the i^thsegment in 3D, v and a are, respectively, first and second numerical derivatives, computed using a central difference gradient rule as provided in equation (3) and equation (4) below:
v_i=(p_(i+1)-p_(i-1))/2? (3)
a_i=(v_(i+1)-v_(i-1))/2? (4)
The homogeneous step size ? can be arbitrary, as it cancels out in equation (1) and equation (2). The quantities {?_i,t_i } are included in an observed state of the robotic environment, based on which the RL agent decides an appropriate control action. They provide the RL agent with a notion of how the fingers distort when coming in contact with an object.
In an embodiment, at step 204 of the present disclosure, the one or more hardware processors 108 C input, via the first module executed by the one or more hardware processors, the plurality of successive images of the target object corresponding to the one or more views of the one or more scenes in the robotic environment to the Grasp Generative Convolutional Neural Network (GG-CNN) for generating an optimal set of grasp vectors. The optimal set of grasp vectors provides an optimal grasp direction for actuating the soft gripper device to grasp the target object. As depicted in the block diagram of FIG. 4, the first module (e.g., refer the vision module in the block diagram) inputs the video stream to the GG-CNN which is a neural network trained to generate the proper grasp control for a two-fingered rigid gripper. The GG-CNN predicts best grasp vectors for a two-fingered rigid gripper. In the present disclosure, the best grasp vector refers to the optimal grasp direction for a two-fingered rigid gripper for successfully grasping the target object that is in an image from the plurality of successive images. However, it is observed that considering best five grasp vectors is more beneficial than considering only the best grasp vector. When the best five grasp vectors are considered, more information about the target object is encoded. It must be appreciated that state in the art GG-CNN is modified to predict top five grasp vectors for a two-fingered rigid gripper.
In another embodiment, the GG-CNN is referred as a pre-trained vision-based grasp planner and better understood by way of the following description provided as exemplary explanation.
Visual perception part of the vision-based robotic grasping system is a combination of wrist-mounted RGB camera and the GG-CNN based network. The RGB camera produces a 640 × 480 image, which is then cropped and segmented to a 300 × 300 size and fed into the GG-CNN based network. The GG-CNN based network is trained on annotated Cornell Grasping Dataset, and generates a pixel-wise grasp vector g ~ for each input image. FIG. 6 illustrates a soft grasp planning simulation environment where the objective is to pick the target object and place it in a red region on a table, according to some embodiments of the present disclosure. The GG-CNN based network is trained for top-down grasps in a table-top setting as shown in FIG. 6, where an effective movement of a robot’s wrist is in 2.5D (XY plus height), with orientation kept vertical. For the setup shown in FIG. 6, a single image of the target object is taken at the start of each training episode, and the pixel-wise grasp vector g ~ is computed by the GG-CNN. Here, g ~ consists of the quantities as provided in equation (5) below:
g ~ =s ~,f ~,w ~,Q (5)
Here s ~ refers to position in XY coordinates of the RGB image, f ~ refers to rotation angle of the robot’s wrist joint in image space, w ~ refers to width of the gripper for a two-fingered parallel jaw gripper, and Q refers to a pixel-wise grasp quality metric for a particular wrist placement. Q is a measure of probability of a successful grasp. The corresponding grasp vector in the world coordinate frame, g, can be computed through linear transformations based on internal camera parameters as provided in equation (6) below:
g=p ?,f,w,Q (6)
Here, p ? = (x,y,z) refers to cartesian position of the wrist of the robot, f refers to the rotation angle of the robot’s wrist joint (roll angle about a vertical axis), w refers to the width of the gripper for a two-fingered parallel jaw gripper and Q refers to the pixel-wise grasp quality metric for the particular wrist placement. In the present disclosure, G = [(x,y),?,Q] is used as a relevant grasp vector for the soft gripper device since an effective “grip width” is discovered by the RL agent, and Z-position of the robot wrist is an independently observed as well as controlled quantity. In PyBullet® simulation environment used in the present disclosure, the internal camera parameters are considered to be those of an Intel RealSense D435i®, with depth sensing replaced by directly measuring depth map from simulation.
FIGS. 7A through 7C depict results of the vision subsystem or vision module comprised in the vision-based robotic grasping system 100 of FIG. 1, according to some embodiments of the present disclosure. FIG. 6A shows a raw image captured using the image capturing device according to some embodiments of the present disclosure. FIG. 6B shows a cropped and segmented image which is an output of a preprocessing stage according to some embodiments of the present disclosure. FIG. 6C shows point Q values which are the output of the GG-CNN according to some embodiments of the present disclosure.
In an embodiment, at step 206 of the present disclosure, the one or more hardware processors 104 iteratively train, via a third module executed by the one or more hardware processors, using a Soft Actor-Critic (SAC) algorithm, the controller unit with (i) the optimal set of grasp vectors, (ii) the one or more curvature measurements and the one or more torsion measurements of the plurality of fingers of the soft gripper device, and (iii) a reward function computed for a specific action of the soft gripper device. As depicted in the block diagram of FIG. 4, the third module (e.g., refer training module of the block diagram), consists of a reward computation block and a controller training block. As shown in the block diagram of FIG. 4, the controller unit takes as input the optimal set of grasp vectors given by the vision module and the one or more curvature measurements and the one or more torsion measurements of the plurality of fingers of the soft gripper device. During the training of the controller unit, successful lifting of the target object and the closeness of each finger to the target objects have been considered for ascertaining the reward function for the predicted action.
In an embodiment, at step 208 of the present disclosure, the one or more hardware processors 104 adaptively generate, via the third module implemented by the one or more hardware processors, using the iteratively trained controller unit, one or more optimal control command actions by continuously optimizing the reward function based on one or more feedback variables obtained from the robotic environment. The one or more optimal control command actions are used to generate one or more grasp plans for executing movements of one or more parts of the soft gripper device to optimally grasp the target object. In other words, the one or more optimal command actions dictate how much each individual actuator in subsequent actuation module must actuate. In an embodiment, the one or more feedback variables comprises (i) position of each finger from the plurality of fingers of the soft gripper with respect to the target object and (ii) position of the target object with respect to ground. As depicted in the block diagram of FIG. 4, the reward computation block of the third module (e.g., refer training module of the block diagram), takes in as input environment variables such as distance between each finger and the target object and distance between the target object and the ground and computes the reward function for that specific action. The controller training block takes as input the states ( i.e., the optimal set of grasp vectors, (ii) the one or more curvature measurements and the one or more torsion measurements of the plurality of fingers of the soft gripper device) and the reward function from the reward computation block and tunes the parameters of the controller, optimizing them to increase the reward function. In an embodiment, the reward function is represented as provided in equation (7) below:
?_(i=1)^4¦?w_1 e^(-w_2 d_(c,i) ) ? +d_h w_3 [(?h-h_t)/(h_f-h_t )]^2 (7)
Here, first term w_1 e^(-w_2 d_(c,i) ) penalizes tips of each finger from the plurality of fingers of the soft gripper device for being away from the target object and favors grasps where the tips of each finger from the plurality of fingers closely wrap around the target object and second term d_h w_3 [(?h-h_t)/(h_f-h_t )]^2 rewards lifting the target object off a surface and provides a measure of how tightly the target object has been grasped. Further, d_(c,i) represents distance between a spherical tip of i^th finger and geometric center of the target object, ?h represents instantaneous difference in height of the target object above its initial Z-position on the surface at start of the training, h_t represents a threshold height above which the reward function is activated, h_f represent a final height above the surface, and w_1, w_2, and w_3 represent parameters selected empirically through results from initial trials while building the reinforcement learning (RL) based agent. In the present disclosure, the final height above the surface h_f is 20 cm, the threshold height h_t above which the reward function is activated is 2 cm, w_1, w_2, and w_3 are [5,1,20] respectively and d_h is provided in equation (8) below:
d_h={¦(1,if ?h =h_t @0,otherwise)¦ (8)
The steps 206 and 208 are better understood by way of the following description provided as exemplary explanation.
In the present disclosure, the controller unit 108 is a reinforcement learning (RL) based agent. The controller unit 108 comprises a software and an associated hardware in which it is executed. On the software side, the controller unit 108 acts as a neural network based on a Soft Actor Critic (SAC) which is a model-free RL algorithm, with the network weights shared between a policy network and a Q-function estimation network. The Soft Actor Critic (SAC) neural network. is implemented using stable baselines package in python. In an embodiment, architecture of the policy network consists of two hidden layers of 64 Multi-Layer Perceptrons each, with ReLU activation. A layer normalized version of the policy network is used for better performance, an entropy regularization coefficient is learned automatically with an initial value of 0.1, and other learning parameters are kept at their default values. In an embodiment, the other learning parameters include discount factor, learning rate, batch size, and soft update coefficient and their default values are 0.99, 0.0003, 0.005, and 64 respectively. On the hardware side, the Soft Actor Critic algorithm is implemented on a microprocessor with sufficient computational power. In the present disclosure, Nvidia® Xavier board is used as the hardware.
In an embodiment, an observed state vector is provided as input to the RL agent (i.e., the controller unit), where the observed state vector is provided in equation (9) below:
S=¦(x_w@y_w@?_w )¦¦(G ?@? ? @t ? ) (9)
Here, (x_w,y_w) are XY coordinates of the robot’s wrist, ?_w is the wrist rotation angle, G ?= ?[G_1,...,G_5] ?^T are top five outputs from the GG-CNN arranged in descending order of Q metrics, ? ? = ?[?_(i,j)]?^T and t ? = ?[t_(i,j)]?^Tare collected curvatures and torsions of the fingers of the soft gripper device, arranged as the j^th segment of i^th finger of the soft gripper device. The optimal control command action generate by the RL agent is provided in equation (10) below:
A=[¦(h&?&F ? )]^T (10)
Here, h is the height of the robot’s wrist above the surface which could be the surface of the table or ground surface, constrained between 11 cm and 20 cm. Corresponding robot joint angles are computed during execution based on an inverse kinematics (IK) implementation for TM5-700® in PyBullet®. It is constrained to position-only IK as the wrist remains vertical throughout. Only the rotation of the wrist about the vertical axis, ? varies, which influences relative positioning of the plurality of fingers of the soft gripper device with respect to the target object. This control input has a symmetry of p/2 due to the square design, leading to a range of [0,p/2] for ?. Finally, F ?= ?[F_1,F_2,F_3,F_4] ?^T is the vector of control inputs for each of the four fingers. Here, each segment of a finger is given the same joint angle reference, leading to an overall expected quadratic curvature profile.
In the present disclosure, the RL agent is trained on a subset of 100 objects from a set of random CAD models and supplied with the PyBullet® random URDF library. These objects were pruned from an overall set to ensure that they are neither too large nor too small to fit in the soft gripper device. This set included a combination of roughly prismatic objects, as well as objects with geometric features that would be adversarial for a parallel-jaw gripper. An object was selected at random in each training episode, with each training episode lasting for 50 steps, and overall training conducted for 2 × 10^4 steps. FIGS. 8A and 8B depict a graphical representation illustrating training curves of a reinforcement learning (RL) based agent in terms of episode reward and entropy loss over a predefined number of steps, according to some embodiments of the present disclosure. The predefined number of steps is 2 × 10^4 steps.
Experimental Results:
In the present disclosure, a trained SAC-based RL agent is compared against a baseline grasping strategy that considers a synergy of all four fingers working in tandem. In the baseline grasping strategy, the robot’s wrist is aligned vertically with center of the target object. The wrist is rotated by an angle ?_w from t optimal output of the GG-CNN, and all four fingers of the soft gripper device are given the same command, i.e., F ? = ?[0.2,0.2,0.2,0.2]?^T for closing. The height of the wrist h is computed by measuring a distance between Z-coordinate of center of the target object and a base of a cubical box on which the fingers of the soft gripper device are attached. To release the target object onto the red target region on the table as shown in FIG. 6, the same gripper opening command F ? = ?[-0.2,-0.2,-0.2,-0.2]?^Tis given in both baseline condition as well as for the RL agent.
During testing, the RL agent’s action A is frozen to current values of h, ? and F ? either when the target object is lifted beyond 5cm or when the maximum number of steps (i.e., 50 steps) is reached in a training episode. In the present disclosure, test setup consists of another 100 objects from the PyBullet® random URDF library not included during training. A test was considered successful if the target object was picked from its starting position and dropped onto the red target region as shown in FIG. 6. In the present disclosure, this test protocol was conducted five times for the 100 target objects with both the grasping strategies ( i.e., baseline synergy and RL agent). FIG. 9 illustrates a graphical representation providing a comparison of grasp success rates for the SAC-based RL agent and the baseline grasp synergy, according to some embodiments of the present disclosure. It is observed from FIG. 9 that mean success rate for the baseline grasp synergy was 43.2%, while it improved to 58.4% for the SAC-based RL agent.
The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined herein and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the present disclosure if they have similar elements that do not differ from the literal language of the embodiments or if they include equivalent elements with insubstantial differences from the literal language of the embodiments described herein.
The embodiments of present disclosure provide a model-free reinforcement learning (RL) approach that modifies vision-based grasp plans generated for parallel-jaw grippers and adapts them to grasping with a four-fingered soft gripper. The observed state of the RL agent is comprised of the grasp plans from the vision subsystems, deformation of the fingers of the soft gripper device, and pose of the soft gripper device acting as the end-effector. The RL agent controls each finger separately, discovering grasp synergies during training. The method of the present disclosure is compared to a baseline grasp synergy where all four fingers simultaneously enclose the object. In simulation, a top grasp pick-and-place success rate of 58.4% with the RL agent is achieved, compared to 43.2% with the baseline grasp synergy. In the present disclosure, the controller unit for the soft-gripper device was trained in simulation and later transferred to physical system using sim-to-real transfer, which require much less time and resources. Further, the curvature and torsion of the flexible soft-fingers of the soft gripper device are being used as input to the controller unit in the method of the present disclosure.
It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g. any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g. hardware means like e.g. an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g. an ASIC and an FPGA, or at least one microprocessor and at least one memory with software processing components located therein. Thus, the means can include both hardware means and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g. using a plurality of CPUs.
The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various components described herein may be implemented in other components or combinations of other components. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
It is intended that the disclosure and examples be considered as exemplary only, with a true scope of disclosed embodiments being indicated herein.
, Claims:We Claim:
A vision-based robotic grasping system (100), comprises:
a robotic manipulator (102);
a soft gripper device (104) comprising a plurality of fingers and connected with the robotic manipulator (102);
a vision subsystem (106) comprising:
an image capturing device (106 A) mounted on the robotic manipulator; and
a Grasp Generative Convolutional Neural Network (GG-CNN) (106 B);
a controller unit (108) operatively connected to the soft gripper device (104), the robotic manipulator (102) and the vision subsystem (106), wherein the controller unit (108) comprises:
one or more data storage devices (108 A) configured to store instructions;
one or more communication interfaces (108 B); and
one or more hardware processors (108 C) operatively coupled to the one or more data storage devices (108 A) via the one or more communication interfaces (108 B), wherein the one or more hardware processors (108 C) are configured by the instructions to:
receive, (i) a plurality of successive images of a target object corresponding to one or more views of one or more scenes in a robotic environment using the image capturing device, and (ii) one or more curvature measurements and one or more torsion measurements of the plurality of fingers of the soft gripper device;
input, the plurality of successive images of the target object corresponding to the one or more views of the one or more scenes in the robotic environment to the Grasp Generative Convolutional Neural Network (GG-CNN) for generating an optimal set of grasp vectors, wherein the optimal set of grasp vectors provides an optimal grasp direction for actuating the soft gripper device to grasp the target object;
iteratively train, the controller unit with (i) the optimal set of grasp vectors, (ii) the one or more curvature measurements and the one or more torsion measurements of the plurality of fingers of the soft gripper device, and (iii) a reward function computed for a specific action of the soft gripper device using a Soft Actor-Critic (SAC) algorithm, wherein the controller unit is a reinforcement learning (RL) based agent; and
adaptively generate, one or more optimal control command actions by continuously optimizing the reward function based on one or more feedback variables obtained from the robotic environment using the iteratively trained controller unit, wherein the one or more optimal control command actions are used to generate one or more grasp plans for executing movements of one or more parts of the soft gripper device to optimally grasp the target object.

The vision-based robotic grasping system as claimed in claim 1, wherein each of the plurality of fingers of the soft gripper device is controlled by the one or more optimal control command actions adaptively generated by the controller unit.

The vision-based robotic grasping system as claimed in claim 1, wherein the robotic manipulator is a six Degree-of-freedom (DoF) robotic manipulator.

The vision-based robotic grasping system as claimed in claim 1, wherein the one or more curvature measurements and the one or more torsion measurements of the plurality of fingers of the soft gripper device are indicative of an abstraction of (i) deformation of the plurality of fingers of the soft gripper device in a reactive manner to the target object when a contact occurs between the plurality of the fingers and the target object, and (ii) size and shape of the target object being handled.

The vision-based robotic grasping system as claimed in claim 1, wherein the reward function is represented as: ?_(i=1)^4¦?w_1 e^(-w_2 d_(c,i) ) ? +d_h w_3 [(?h-h_t)/(h_f-h_t )]^2, wherein first term w_1 e^(-w_2 d_(c,i) ) penalizes tips of each finger from the plurality of fingers of the soft gripper device for being away from the target object and favors grasps where the tips of each finger from the plurality of fingers closely wrap around the target object, and second term d_h w_3 [(?h-h_t)/(h_f-h_t )]^2 rewards lifting the target object off a surface and provides a measure of how tightly the target object has been grasped, and wherein d_(c,i) represents distance between a spherical tip of i^th finger and geometric center of the target object, ?h represents instantaneous difference in height of the target object above its initial Z-position on the surface at start of the training, h_t represents a threshold height above which the reward function is activated, h_f represent a final height above the surface, and w_1, w_2, and w_3 represent parameters selected empirically through results from initial trials while building the reinforcement learning (RL) based agent.

The vision-based robotic grasping system as claimed in claim 1, wherein the one or more feedback variables comprises (i) position of each finger from the plurality of fingers of the soft gripper with respect to the target object, and (ii) position of the target object with respect to ground.

A processor implemented method (200), comprising:
receiving (202), via a first module and a second module executed by one or more hardware processors, (i) a plurality of successive images of a target object corresponding to one or more views of one or more scenes in a robotic environment using the image capturing device, and (ii) one or more curvature measurements and one or more torsion measurements of the plurality of fingers of the soft gripper device;
inputting (204), via the first module executed by the one or more hardware processors, the plurality of successive images of the target object corresponding to the one or more views of the one or more scenes in the robotic environment to the Grasp Generative Convolutional Neural Network (GG-CNN) for generating an optimal set of grasp vectors, wherein the optimal set of grasp vectors provides an optimal grasp direction for actuating the soft gripper device to grasp the target object;
iteratively training (206), via a third module executed by the one or more hardware processors, the controller unit with (i) the optimal set of grasp vectors, (ii) the one or more curvature measurements and the one or more torsion measurements of the plurality of fingers of the soft gripper device, and (iii) a reward function computed for a specific action of the soft gripper device using a Soft Actor-Critic (SAC) algorithm, wherein the controller unit is a reinforcement learning (RL) based agent; and
adaptively generating (208), via the third module implemented by the one or more hardware processors, one or more optimal control command actions by continuously optimizing the reward function based on one or more feedback variables obtained from the robotic environment using the iteratively trained controller unit, wherein the one or more optimal control command actions are used to generate one or more grasp plans for executing movements of one or more parts of the soft gripper device to optimally grasp the target object.

The processor implemented method of claim 7, wherein each of the plurality of fingers of the soft gripper device is controlled by the one or more optimal control command actions adaptively generated by the controller unit.

The processor implemented method of claim 7, wherein the robotic manipulator is a six Degree-of-freedom (DoF) robotic manipulator.

The processor implemented method of claim 7, wherein the one or more curvature measurements and the one or more torsion measurements of the plurality of fingers of the soft gripper device are indicative of an abstraction of (i) deformation of the plurality of fingers of the soft gripper device in a reactive manner to the target object when a contact occurs between the plurality of the fingers and the target object and (ii) size and shape of the target object being handled.

The processor implemented method of claim 7, wherein the reward function is represented as: ?_(i=1)^4¦?w_1 e^(-w_2 d_(c,i) ) ? +d_h w_3 [(?h-h_t)/(h_f-h_t )]^2, wherein first term w_1 e^(-w_2 d_(c,i) ) penalizes tips of each finger from the plurality of fingers of the soft gripper device for being away from the target object and favors grasps where the tips of each finger from the plurality of fingers closely wrap around the target object and second term d_h w_3 [(?h-h_t)/(h_f-h_t )]^2 rewards lifting the target object off a surface and provides a measure of how tightly the target object has been grasped, and wherein d_(c,i) represents distance between a spherical tip of i^th finger and geometric center of the target object, ?h represents instantaneous difference in height of the target object above its initial Z-position on the surface at start of the training, h_t represents a threshold height above which the reward function is activated, h_f represent a final height above the surface, and w_1, w_2, and w_3 represent parameters selected empirically through results from initial trials while building the reinforcement learning (RL) based agent.

The processor implemented method of claim 7, wherein the one or more feedback variables comprises (i) position of each finger from the plurality of fingers of the soft gripper with respect to the target object, and (ii) position of the target object with respect to ground.

Dated this 17th Day of August 2022

Tata Consultancy Services Limited
By their Agent & Attorney

(Adheesh Nargolkar)
of Khaitan & Co
Reg No IN-PA-1086

Documents

Application Documents

# Name Date
1 202221046780-STATEMENT OF UNDERTAKING (FORM 3) [17-08-2022(online)].pdf 2022-08-17
2 202221046780-REQUEST FOR EXAMINATION (FORM-18) [17-08-2022(online)].pdf 2022-08-17
3 202221046780-FORM 18 [17-08-2022(online)].pdf 2022-08-17
4 202221046780-FORM 1 [17-08-2022(online)].pdf 2022-08-17
5 202221046780-FIGURE OF ABSTRACT [17-08-2022(online)].pdf 2022-08-17
6 202221046780-DRAWINGS [17-08-2022(online)].pdf 2022-08-17
7 202221046780-DECLARATION OF INVENTORSHIP (FORM 5) [17-08-2022(online)].pdf 2022-08-17
8 202221046780-COMPLETE SPECIFICATION [17-08-2022(online)].pdf 2022-08-17
9 202221046780-FORM-26 [20-09-2022(online)].pdf 2022-09-20
10 Abstract1.jpg 2022-11-28
11 202221046780-FER.pdf 2025-06-05
12 202221046780-RELEVANT DOCUMENTS [17-10-2025(online)].pdf 2025-10-17
13 202221046780-PETITION UNDER RULE 137 [17-10-2025(online)].pdf 2025-10-17
14 202221046780-OTHERS [17-10-2025(online)].pdf 2025-10-17
15 202221046780-FER_SER_REPLY [17-10-2025(online)].pdf 2025-10-17
16 202221046780-CLAIMS [17-10-2025(online)].pdf 2025-10-17

Search Strategy

1 202221046780E_13-11-2024.pdf