Sign In to Follow Application
View All Documents & Correspondence

Artificial Intelligence Based Multimodal Human Robot Collaboration

Abstract: Embodiments of the present disclosure relate to a method and system for an artificial intelligence based multimodal collaboration between a human and a robot, wherein a robot identifies a gesture and an audio input provided by a human as input, correlated the gesture with the audio input to perform a task and then performs the task. Other embodiments are also disclosed.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #
Filing Date
20 December 2023
Publication Number
10/2024
Publication Type
INA
Invention Field
COMPUTER SCIENCE
Status
Email
Parent Application
Patent Number
Legal Status
Grant Date
2025-03-25
Renewal Date

Applicants

INDIAN INSTITUTE OF SCIENCE
C V RAMAN AVENUE, BANGALORE 560012, INDIA.

Inventors

1. Abhra Roy Chowdhury
Indian Institute of Science, C V Raman Avenue, Bangalore - 560012, Karnataka, India
2. Jayesh Prakash
Indian Institute of Science, C V Raman Avenue, Bangalore - 560012, Karnataka, India

Specification

Description:TECHNICAL FIELD
[0001] Embodiments of the present disclosure relate to an artificial intelligence based multimodal architecture for collaboration between humans and robots, and more specifically to an artificial intelligence based multimodal architecture for collaboration between humans and a robotic manipulator arm using audio and visual gestures.

BACKGROUND
[0002] Generally, Human-Robot Collaboration is a collaborative processes where human and robot agents work jointly in order to perform tasks that may be shared and thereby achieve shared goals. There are a number of technological areas and applications where robots may be required to work alongside humans, making them a part of the team and have capable and efficient members of human-robot teams. Collaboration is defined as a special type of coordinated activity, one in which two or more agents work jointly with each other, together performing a task or carrying out the activities needed to satisfy a shared goal. Some of the areas where such collaborations find use for robots includes homes, hospitals, offices, space exploration, manufacturing etc. Normally, it is known that Human-Robot Collaboration (HRC) is an interdisciplinary research area comprising classical robotics, human-computer interaction, artificial intelligence, process design, layout planning, ergonomics, cognitive sciences, psychology etc.
[0003] Industrial applications of human-robot collaborations involve something known as Collaborative Robots, also generally referred to as cobots. These cobots are known to physically interact with humans in a shared workspace to complete tasks such as collaborative manipulation or object handovers. For effective human-robot collaboration, it is generally imperative that the robot should be capable of understanding and interpreting several communication mechanisms similar to the mechanisms involved in human-human interaction. This poses a huge challenge with respect to human-robot collaboration. Again, these robots must also communicate with the interacting humans in order to coordinate its actions properly to execute the shared plan, and achieve the overall task by interacting with the humans. There is therefore a need in the art for a better, efficient and cost-effective human-robot collaboration.

SUMMARY
[0004] Embodiments of the present disclosure relate to a method and system for artificial intelligence based multimodal collaboration between a human and a robot (reference to robot in the text of this disclosure generally refers to a robotic manipulator arm unless specified to be otherwise), wherein multimodal in the present disclosure refers to the use of several modes in a single artifact. In an embodiment, the method includes identifying by a robot a first gesture made by a human, wherein the gesture is made by a human hand. In an embodiment, the robot may be a robotic arm placed proximate to the human, and the proximity may be pre-determined. In an embodiment the robot is coupled with an audio device and an imaging device. In an embodiment, the audio device is configured to interact by picking up audio signals from the human. In an embodiment, the imaging device may, for example, a camera whose output is configured through an AI module to identify gestures made by human. In an embodiment, the audio received from microphone which is attached to computer is configured through an AI module to recognize keywords in audio (also generally referred to in the disclosure as robotic arm is configured to process the audio signal) and the gesture made by the human using artificial intelligence and collaboratively perform tasks.
[0005] In an embodiment, a first gesture is made, for example an open hand pointing vertically upwards or a finger pointing vertically upwards, where the first gesture is configured to activate the robot or grab attention of the robot. Once the robot is in an active state, the human operator can assign commands to the robots within a pre-defined time. If the robot does not receive any command within a pre-defined time, the robot goes back to a sleep state waiting for activation.
[0006] In an embodiment, once the robot is active, the human operator may provide an input to the robot within a pre-defined time. In an embodiment, the robot is configured to receive a second gesture, which is coupled with a voice command. The second gesture and the voice command need to be coordinated such that they are received by the robot within a pre-defined time. The second gesture and the voice command in coordination is a command for the robot to perform certain tasks. In an embodiment, if no input is received by the robot within a pre-defined time, the robot goes back to a sleep mode waiting for activation.
[0007] In an embodiment, the gesture(s) are identified and processed by an imagining device coupled to the robot and the voice command (audio signal) is collected by the audio device coupled to the robot. The robot is configured to process the gesture and voice command, and if there is a correlation between the gesture and the voice command to accomplish the task, else if there is any discrepancy between the gesture and the voice command to intimate the human user of the error. In an embodiment, artificial intelligence is used to train the system and distinguish between the different types of gestures made by the human. Other embodiments are also disclosed.

BRIEF DESCRIPTION OF THE DRAWINGS
[0008] The detailed description is described with reference to the accompanying figures. Features, aspects, and advantages of the subject matter of the present disclosure will be better understood with regard to the following description and the accompanying drawings. The figures are intended to be illustrative, not limiting, and are generally described in context of the embodiments, and it should be understood that it is not intended to limit the scope of the disclosure to these particular embodiments. In the figures, the same numbers may be used throughout the drawings to reference features and components. In order that the present disclosure may be readily understood and put into practical effect, reference will now be made to exemplary embodiments and/or cases as illustrated with reference to the accompanying figures. The figures together with detailed description below, are incorporated in and form part of the specification, and serve to further illustrate the embodiments and explain various principles and advantages.
[0009] Figure 1 is an illustration of an exemplary robotic arm (robot) used in the human-robot collaborative (HRC) environment in accordance with an embodiment of the present disclosure.
[0010] Figure 2 is an exemplary pictorial representation of a few gestures that may be used to indicate tasks to be performed by the robot in the HRC environment in accordance with an embodiment of the present disclosure.
[0011] Figure 3 is an exemplary embodiment layout mapping using 21 key point human hand points implemented in the mediapipe framework for identification of gestures by the robot in the HRC environment in accordance with an embodiment of the present disclosure.
[0012] Figure 4 is an illustration of exemplary of case of implementation of the robot performing a task instructed by a human operator in a HRC environment in accordance with an embodiment of the present disclosure.
[0013] Figure 5 is an exemplary illustration of a method for operating the robot in a HRC environment in accordance with an embodiment of the present disclosure.
[0014] Figure 6 is another exemplary illustration of a method for operating the robot in a HRC environment in accordance with an embodiment of the present disclosure.
[0015] Figure 7 is an exemplary illustration of block diagram illustrating operation of the robot in the HRC environment in accordance with an embodiment of the present disclosure.
[0016] Figure 8 is an exemplary illustration of a confusion matrix in accordance with an embodiment of the present disclosure.
[0017] Throughout the drawings, identical reference numbers designate similar, but not necessarily identical elements. The figures are not necessarily to scale, and the size of some parts may be exaggerated to more clearly illustrate the example shown. Moreover, the drawings provide examples and/or implementations consistent with the description; however, the description is not limited to the examples and/or implementations provided in the drawings.

DETAILED DESCRIPTION
[0018] The following describes technical solutions in exemplary embodiments of the subject matter of the present disclosure with reference to the accompanying drawings. In this application as disclosed herein, "at least one" means one or more, and "a plurality of" means two or more. The term "and/or" describes an association relationship for describing associated objects and represents that three relationships may exist. For example, A and/or B may represent the following cases: Only A exists, both A and B exist, and only B exists, where A and B may be singular or plural. The character "/" usually indicates an "or" relationship between the associated objects. "At least one item (piece) of the following" or a similar expression thereof means any combination of the items, including any combination of singular items (piece) or plural items (pieces). For example, at least one item (piece) of a, b, or c may represent a, b, c, a and b, a and c, b and c, or a, b, and c, where a, b, and c each may be singular or plural.
[0019] It should be noted that in this application articles “a”, “an” and “the” are used to refer to one or to more than one (i.e., to at least one) of the grammatical object of the article. The terms “comprise” and “comprising” are used in the inclusive, open sense, meaning that additional elements may be included. It is not intended to be construed as “consists of only”. Throughout this specification defined above, unless the context requires otherwise the word “comprise”, and variations such as “comprises” and “comprising”, will be understood to imply the inclusion of a stated element or step or group of elements or steps but not the exclusion of any other element or step or group of elements or steps. The term “including” is used to mean “including but not limited to”. “Including” and “including but not limited to” are used interchangeably. In the structural formulae given herein and throughout the present disclosure, the following terms have been indicated meaning, unless specifically stated otherwise.
[0020] Unless otherwise defined, all terms used in the disclosure, including technical and scientific terms, have meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. By means of further guidance, term definitions are included for better understanding of the present disclosure. The term ‘about’ as used herein when referring to a measurable value such as a parameter, an amount, a temporal duration, and the like, is meant to encompass variations of ±10% or less, preferably ±5% or less, more preferably ±1% or less and still more preferably ±0.1% or less of and from the specified value, insofar such variations are appropriate to perform the present disclosure. It is to be understood that the value to which the modifier ‘about’ refers is itself also specifically, and preferably disclosed.
[0021] It should be noted that in this application, the term such as "example" or "for example" or “exemplary” is used to represent giving an example, an illustration, or descriptions. Any embodiment or design scheme described as an "example" or "for example" in this application should not be explained as being more preferable or having more advantages than another embodiment or design scheme. Exactly, use of the word such as "example" or "for example" is intended to present a related concept in only a specific manner.
[0022] It should be understood that in the embodiments of the present subject matter that "B corresponding to A" indicates that B is associated with A, and B can be determined based on A. However, it should be further understood that determining B based on A does not mean that B is determined based on only A. B may alternatively be determined based on A and/or other information.
[0023] In the embodiments of this application, "a plurality of" means two or more than two. Descriptions such as "first", "second" in the embodiments of this application are merely used for indicating and distinguishing between described objects, do not show a sequence, do not indicate a specific limitation on a quantity of devices in the embodiments of this application, and do not constitute any limitation on the embodiments of this application.
[0024] Exemplary embodiments of the present disclosure relate to a method and system for multimodal collaboration between a human and a robot, wherein multimodal in the present disclosure refers to the use of several modes in a single artifact, wherein these modes create a meaning. In an exemplary embodiment, the present disclosure related to having or utilizing more than one mode of communication and operation between a human and a robot. In an exemplary embodiment, the present disclosure combines information from a variety of modalities, for example speech and vision to significantly enhance the performance in human-computer interaction system. Though two different modes are illustrated in the present disclosure, it should be obvious to a person skilled in the art that other modes may be included, for example text processing, and all such variation fall within the scope of the present disclosure. In an exemplary embodiment, the method includes identifying by a robot a first gesture made by a human, wherein the gesture is made by a human hand. In an exemplary embodiment, the robot may be a robotic arm placed proximate to the human, and the proximity may be pre-determined. In an exemplary embodiment, in the present disclosure proximity may mean in a range of about 1-3 meters, wherein the human issuing command is within 1-3 meters from the robotic arm and the tasks or set of tasks to be performed are also accessible within the prescribed range. In another exemplary embodiment, the range could vary and for example be configured to perform a task within a range of 10-15 meter, and the robotic arm is configured to move around within the range to perform a task or set of tasks. In an embodiment the robot (reference made to robot generally refers to a robotic manipulator arm in this disclosure unless and otherwise specified) is coupled with an audio device and an imaging device. In an exemplary embodiment, the audio device is configured to interact by picking up audio signals from the human, and use artificial intelligence to process the audio signals. In an exemplary embodiment, the imaging device may for example be a a camera whose output is configured through an AI module to identify gestures made by human, and use artificial intelligence to process the gestures made by the human. In an exemplary embodiment, the robotic arm (reference to robotic arm also should be read as robotic manipulator arm unless and otherwise specified in the present disclosure) is configured to process the audio signal and the gesture made by the human using artificial intelligence and collaboratively perform tasks.
[0025] In an exemplary embodiment, a first gesture is made, for example an open hand pointing vertically upwards or a finger pointing vertically upwards, where the first gesture is configured to activate the robot or grab attention of the robot. In an exemplary embodiment, once the robot is in an active state, the human operator can assign commands to the robots within a pre-defined time. In an exemplary embodiment, if the robot does not receive any command within a pre-defined time, the robot goes back to a sleep state waiting for activation. In an exemplary case, artificial intelligence is used to process the commands received by the robot.
[0026] In an exemplary embodiment, once the robot is active, the human operator may provide an input to the robot within a pre-defined time. In an exemplary embodiment, the human input to the robot includes a second gesture coupled with a voice command, which again has to be made within a pre-defined time interval such that the robot identifies the human input second, which include a gesture and a voice command, to perform certain tasks. In an exemplary embodiment, if no input is received, the within a pre-defined time, the robot goes back to a sleep mode waiting for activation. Again, in the exemplary case, artificial intelligence is used to process the commands received by the robot.
[0027] In an exemplary embodiment, on receiving the inputs, the gesture and voice command (audio signal), wherein the gesture is picked up by the imaging device coupled to the robot and the voice command is picked up by the audio device coupled to the robot. In an exemplary embodiment, the imaging device output is configured through an AI module to identify gestures made by human. In an exemplary embodiment, the robot is configured to process the gesture and voice command, and if there is a correlation between the gesture and the voice command to accomplish the task, else if there is any discrepancy between the gesture and the voice command to intimate the human user of the error. Again, in the exemplary case, artificial intelligence is used to process the commands received by the robot.
[0028] In an exemplary embodiment, the method for multimodal collaboration between a human and a robot includes identifying by a robot a first gesture made by a human hand using artificial intelligence, wherein the first gesture is configured to activate the robot. In an exemplary embodiment, a pre-defined time may be set such that if the robot does not receive any command to perform a task within a specified time, the robot goes back to a sleep state. In an exemplary embodiment, in response to the first gesture, the robot is activated, and on activation the robot is configured for receiving inputs from the human user, who is proximate to the robot. In an exemplary embodiment, the human should be within a prescribed range from the robot, for the robot to identify the gesture and/or the voice command. In an exemplary embodiment, on receiving the inputs from the human user, the robot configured to process the input, wherein the inputs are related to a task or set of tasks and the robot is configured to identify and perform a task. In the exemplary case, artificial intelligence is used to process the commands received by the robot.
[0029] In an exemplary embodiment, the robot is placed proximate to a human user. The proximity between the human operator (also referred to as human) and the robot may be pre-defined, such as defining a range for the human-robot collaboration (hereinafter also referred to as HRC) environment. In an exemplary embodiment, in the present disclosure proximity may mean in a range of about 1-3 meters, wherein the human issuing command is within 1-3 meters from the robotic arm and the tasks or set of tasks to be performed are also accessible within the prescribed range. In another exemplary embodiment, the range could vary and for example be configured to perform a task within a range of 10-15 meter, and a robot with the robotic arm is configured to move around the area specified within the range to perform a task or set of tasks, and essentially the commands being executed by the robotic arm by processing the audio and visual signal using artificial intelligence. In an exemplary embodiment, on activation of the robot, the robot is configured to receive inputs from the human, where the inputs are a second gesture and an audio input (hereinafter also referred to as a voice command) in combination, wherein the second gesture and the voice command has to be in near real-time or within a pre-defined time interval. In an exemplary embodiment, of the robot received a second gesture from within a pre-defined range and the voice command is from another source from outside the defined range, the robot is configured to provide an error indicator to the human. In an exemplary case if the voice command is provided to the robot and there is no gesture identified by the robot, then the robot is configured to provide an error indicator to the human. In the exemplary case, processing of the audio and/or visual commands by the robot is performed using artificial intelligence.
[0030] In an exemplary embodiment, the error indicator may be at least one of an audio command or a video command or a combination thereof. In an exemplary embodiment the error may be a voice command provided by the robot to the human. In an exemplary embodiment the video signal may be a red light or a flashing red light from the robot indicating that there is an input error. In an exemplary embodiment, on performing the task or set of tasks the robot may again be configured to provide an indicator to the human, in terms of a voice command (audio signal) or a visual signal to the human, wherein in this case a green light may indicate that the task is accomplished. In an exemplary embodiment, while the robot is in corporation performing a first task or a first set of tasks and the robot receives another input from the human, the task or the set of tasks may be put in a queue for execution and an orange or yellow light could flash indicating to the human that there are tasks pending to be performed by the robot. In the exemplary case, an appropriate error commands may be determined using artificial intelligence and then communicated to the human operator.
[0031] In an exemplary embodiment, when the robot is in an operation/operating mode performing a first task or a first set of tasks, if a second input from the human is provided to the robot, wherein the second input includes a further gesture and a further audio input, the robot is configured to process the second input, and provide an indicator to the human that the task or set of tasks is in a pending queue to be performed, and assign the new tasks in queue to be performed. In the exemplary case, artificial intelligence is used to process the audio and/or visual signals identified by the robot. In an exemplary case, artificial intelligence detection algorithm is used to distinguish the different gestures that are made to the robot.
[0032] In an exemplary embodiment, the input from the human to the robot, i.e., the second gesture and the voice command should be provided in conjunction to the robot in neat real-time or at least within a pre-defined timeframe. In an exemplary embodiment, the audio input is a voice command provided to the robot, and the voice command being associated with a task or set of tasks to be performed by the robot. In an exemplary embodiment, an imaging device may be coupled to the robot, wherein the imaging device is configured to identify the first gesture and/or the second gesture and/or further gestures, and distinguish between the first gesture, the second gesture and further gestures. In the exemplary case, artificial intelligence detection algorithm is used to distinguish between the different gestures made by the human operator and also process the gestures along with the relevant audio signals.
[0033] In an exemplary embodiment, a visual perception unit (VPU) coupled to the imaging device of the robot or to the robot may be configured to activate the robot in response to the first gesture, on identifying that the first gesture is a pre-defined gesture to activate the robot. In an exemplary embodiment, an audio perception unit (APU) coupled to the robot may be configured to identify and process the voice command (audio input) from the human. In an exemplary embodiment, the VPU and APU are coupled to a processor. In an exemplary embodiment, the VPU and the APU may be configured to map the voice command received from the audio device with the second gesture received from the imaging device. In the exemplary case, processing at the the VPU and/or the APU may be driven specifically by artificial intelligence (AI) algorithms. It should be obvious to a person of ordinary skill in the art that various AI algorithms are available to perform video capture and processing and audio capture and processing, and all such AI algorithms fall within the scope of the present disclosure.
[0034] In an exemplary embodiment, on negative determination of the mapping of the voice command with the second gesture, i.e., the gesture indicated a task to “give something” and the voice command indicated a task to “take away something”, when there is a mismatch of the voice command with the audio input, the robot may be configured to provide an error indicator to the human. In an exemplary embodiment, the voice command received as input is converted to text in the APU, and the APU may be configured to identify keywords related to a task or a set of tasks from the keywords. In the exemplary case, the AI algorithms used in the enabling the embodiments of the present disclosure may be trained and also be provided with the capability to learn and record to build a strong reference for future use.
[0035] In an exemplary embodiment, the APU may be configured to process the task or the set of tasks associated with the voice command, in conjunction with the gesture received from the VPU. In an exemplary embodiment, the robot is stationed at a pre-defined position proximate to the human, and the range of operation of the robot may be pre-defined by the human. In an exemplary embodiment, after activation of the robot, the robot may be configured to identify the second gestures and a related audio command within a specific time, and on negative determination of the second gesture and/or the voice command (i.e., no gesture and/or voice command being picked up and resolved by the robot), the robot is configured to provide an error indicator, wherein the error indicator is indicative of the error that has occurred. In an exemplary embodiment, the error indicator may be provided by an audio means and/or a visual means and/or a message being and/or a combination thereof. Various other means for indicating the error may be incorporated and all such means of indicating that an error has occurred fall within the scope of the present disclosure. In an exemplary embodiment, artificial intelligence and/or machine learning may be used for identifying keywords from the audio input and mapping the keywords to a task or a set of tasks to be performed. In an exemplary embodiment, mapping the keywords may be a ranked list, or a confidence score associated with the keywords and the task or the set of tasks. In an exemplary embodiment the robot as discussed in the present disclosure is a robotic arm. In an exemplary embodiment, ae HRC environment including at least one human and a robot, the human and the robot collaborating by the method disclosed previously in the present disclosure to perform a task or perform a set of tasks.
[0036] Reference is now made to Figure 1, which is an illustration of an exemplary robotic arm (robot) 100 used in the human-robot collaborative (HRC) environment in accordance with an embodiment of the present disclosure. As illustrated robotic arm 105 include base 107, where base 107 may be a standalone unit or may be affixed for example to a table or may be kept on the table or any other suitable location where base 107 is stationary. Base 107 is affixed with plurality of arms 109 and one of the plurality of arms 109 forming the end of robotic arm 105 has clip 110, wherein clip 110 is configured to hold things and provide the things requested to the human. In an exemplary embodiment, the number of arms for the robotic arm may depend on the choice of the tasks being performed by the robotic arms and the clip at the end of the arm may have several different shapes such as tongs or a forceps or the shape of a human hand. It should be obvious to a person of ordinary skill in the art that the number of arms can be variable, and the clip can have a variety of shapes and all such variation fall within the scope of the present disclosure.
[0037] Robotic arm 105 is provided with an imaging device 120, wherein the imaging device is configured to capture visual signals 140 and provide visual signal 140 for processing to robotic arm, where a visual perception unit and processor are configured to process visual signal 140. In an exemplary case, AI detection algorithm is used to distinguish various visual signal 140 captured by imaging device 120. In an exemplary case, the imaging device output is configured through an AI module to identify gestures made by human. Robotic arm 105 also has at least one audio device 130, wherein the audio device 130 is configured to receive audio input 150 (as a voice command) from the human and process voice command 150 in an audio perception unit and processor, wherein voice command 150 is processed in conjunction with visual signal 140 at that given instant of time. In an exemplary case, AI detection algorithm is used to distinguish various audio signal 159 captured by audio device 130. In the exemplary robotic arm 105, a single imaging device is attached to the top near the clip and two audio devices are coupled one at the top near the imaging device and the other at the bottom on the base of the robotic arm. Proximity distance 160 may be pre-defined such that robotic arm 105 does not received any inputs if the human is beyond the pre-defined range. In the exemplary case, robotic manipulator arm, imaging device and microphone may be connected to a common computer.
[0038] In an exemplary case, hand 140 indicated a gesture for robotic arm 105 to get activated. Gesture 140 is recognised by imaging device 120 and processed in robotic arm 105 and based on the pre-defined gesture 140 recognized by robotic arm 105, wherein when the gesture is an open hand or four fingers closed with index finger pointed up, robotic arm 105 is moved to an active state. In the exemplary case, plurality of audio device 130 may be placed on robotic arm 105, wherein audio device 130 may be configured to receive as input a voice command from the human, and audio device 130 configured to process the voice command in the APU. After processing the voice command by extracting keywords and mapping keywords to a task or set of tasks and in conjunction with the visual signal, i.e., gesture 140, robotic arm 105 is configured to the task or the set of tasks assigned to each gesture 140 made to robotic arm 105. If gesture 140 or voice command 150 is received from outside the pre-determined or pre-defined range, robotic arm 105 may be configured to provide an indicator, such as an error indicator to the human operating the robotic arm with the pre-defined range. In an exemplary case, the range may be made variable based on the type of application and environment of the HRC, and all such variation fall within the scope of the present disclosure. In the exemplary case, the APU and VPU incorporate AI algorithms for the identification of commands, processing of commands and functioning of the robot.
[0039] Reference is now made to Figure 2, which is an exemplary pictorial representation of a few gestures 200 that may be used to indicate tasks to be performed by the robot in the HRC environment in accordance with an embodiment of the present disclosure. As illustrated gesture 210, showing an open palm with fingers pointing upwards or only the index finger pointing upwards may be preprogramed to indicate that such gesture 210 may be configured to activate the robotic arm. This gesture may be programmed such that the task of activating the robotic arm may be performed with or without an appropriate voice command. Gesture 210 may also be followed with a voice command “activate robotic arm”, and the VPU and APU attached to the robotic arm, may process the relevant signals, and based on gesture 210 and audio input “activate”, put the robotic arm in an active mode, wherein the robotic arm may receive further commands and gestures from the human. In an exemplary case, as illustrated gesture 215 having an open palm with fingers pointing downwards or the index finger pointing downwards with a voice command “activate the robot”, may issue an appropriate error indicator as there is no correlation between the gesture and the voice command. As discussed previously the error indicator may be an audio signal to the human or a visual signal to the human or a combination thereof. If should be obvious to a person skilled in the art that various other means may be devised to map the gesture to voice commands, and all such variations fall within the scope of the present disclosure. In the exemplary case, AI algorithms are used in the VPU and APU for identifying commands, processing commands and then performing the tasks associated with the gesture and commands from the human operator.
[0040] Again, as illustrated in the exemplary case, gesture 220 with an open palm, wherein the palm is pointing in an upwards direction may be mapped to the word “give” or fetch”. A human may gesture to the robotic arm with an open palm facing upwards or sidewards issuing a voice command fetch “pen”. The VPU process the visual signal and the APU processes the audio input, and the robot may fetch a “pen” that is located proximate to the human. If the open palm is facing downwards and the human issues a command “fetch” the robotic arm does not correlate the visual signal with the audio input and will be configured to provide an error indicator to the human. Again, as illustrated in the exemplary case, gesture 230 with a closed fist, wherein the first may be pointing in an upwards direction or a sidewards direction may be mapped to the word “put back”. A human may gesture to the robotic arm with a close fist issuing a voice command “put back” and “pen”. The VPU and APU process the visual signal and the audio input, and the robot may be configured to take the “pen” from the human and place it back in the exact location that the “pen” had been picked up. In an embodiment, the imaging device is configured to update the pre-defined proximate to the human in near real time, and maintain a mapping of the objects and/or other things placed proximate to the human, such that the robotic arm may be able to recognize and distinguish the objects within the pre-defined range and put the object back into the same location after the task is completed and on receipt of the command and gesture from the human. In the exemplary case given above, the pre-defined range between the human and the robot may be within 1-2 meters to perform tasks close to the human. In another exemplary case, in a operation theatre, if such a human-robot collaboration is used, the distance between the human and robot may be in the range of 1-3 meters. In another exemplary embodiment, if such as human-robot collaboration is used, for an invalid person within a room, a robot may be movable having a manipulator robotic arm, and the range may be about 10-15 meters, and the robotic arm will be configured to pick up signals and perform the tasks using the AI algorithms. In this exemplary case, it will also include moving the robot to a particular location or different locations to perform the tasks, and such processing is doen by the AI algorithms. However, in another exemplary case, the range for receiving the gesture and voice command may be defined to be about 1-3 meters, and any gestures and voice commands made to the robot beyond the pre-defined range of about 3 meters should not be identified by the robot, and the robot may return to it’s original place proximate to the invalid human.
[0041] Reference is now made to Figure 3, which is an exemplary embodiment layout mapping using 21 key point human hand points 300 implemented in the MediaPipe Hand Landmarker (MPHL) framework for identification of gestures by the robot in the HRC environment in accordance with an embodiment of the present disclosure. The MPHL task provides a way to detect the landmarks of the hands in an image, the image being a gesture’s made by the human operator and captured by the imaging device of the robot. In an exemplary case, the imaging device output is configured through an AI module to identify gestures made by human. In accordance with the present disclosure the imaging device captures an image made by the human operator which is related to a certain gesture. MPHL is then used by the VPU to locate key points of hands and/or render visual effects on them. MPHL operates on image data with a machine learning (ML) model as static data or a continuous stream and outputs hand landmarks in image coordinates, hand landmarks in world coordinates and handedness (left/right hand) of multiple detected hands. A mapping of the 21 key points is provided in Table 1 below:

Table 1
It should also be obvious to a person or ordinary skill in the art that other techniques may be developed and used to perform the similar task and all such variation fall within the scope of the present disclosure. In an exemplary embodiment, tracked key points may be used to train an AI/ML model using logistic regression to identify different hand gestures. In an exemplary case, 7500 frames were captured for each hand gestures and a classifier model were trained. In the exemplary case, the model was subsequently tested on a separate set of 4500 frames and an accuracy of 0.997 was achieved.
[0042] Reference is now made to Figure 4, which is an illustration of exemplary of case 400 of implementation of the robot performing a task instructed by a human operator in a HRC environment in accordance with an embodiment of the present disclosure. As illustrated, the HRC environment has human 410 interacting with robotic arm 420 to perform some tasks. In the exemplary case, robotic arm 420 uses the imaging device which may be built into robotic arm or may be coupled to robotic arm, wherein the imaging device may be placed at a different location to monitor a pre-defined range and connected to robotic arm 420 by wired means or wireless means or a combination thereof. In the exemplary case, robotic manipulator arm, imaging device and microphone may be connected to a common computer.
[0043] Robotic arm 420 has a layout of the pre-defined range and can mapping of the location of objects 430 in near-real time and keep these mappings in a temporary memory. Huma 410 first gestures to robotic arm 420 to get activated with an gesture as discussed previously. After the robotic arm is activated, human 410 makes a gesture with an open palm with a command “fetch marker pen”. The visual signal, i.e., the gesture is picked up by a VPU and indicated that the gesture is related to “fetch” or “give”. Simultaneously an audio input “fetch marker pen” is picked up by the audio devices attached to robotic arm 420, and keywords are extracted and processed and mapped to a task or set of tasks. Robotic arm using the correlation of the visual signal and the audio input, fetches the marker pen and hands over the marker pen to the human.
[0044] In another example, if the robotic arm is a moving arm placed within a room, wherein the room defines the pre-defined range, and there is a impaired person in the room. If the impaired person makes an appropriate signal to the moving robot to fetch water in terms of a visual signal and audio input, the robot processes the video signal and the audio input and fetches water for the impaired person.
[0045] In yet another example, a robotic arm in accordance with the present disclosure may be placed in an operation theatre, where the surgeon may make an appropriated signal, gesture and voice, for the robot to pick up the required instruments and hand them to the surgeon during surgery and an appropriate signal can put the used instruments into a cleaning bin and unused instruments back in the table. Several other exemplary case may be found where such application may be used in a HRC environment, and all such application of the gesture combined with audio input in a HRC environment fall within the scope of the present disclosure.
[0046] In another exemplary case, multiple gestures may be performed to complete a single task. If there is a first human, operating the robotic arm, and a second human alongside, the first human can perform the gesture to activate the robot and then make a gesture and provide an audio input to “fetch a pen”. After that the first human may immediately make a second gesture pointing to the second human and provide an audio input “give pen to second human”. The robotic arm is configured to identify the gestures and the commands from the audio inputs, and first fetches the pen from the pen stand and then hands over the pen to the second human. In this exemplary manner, the robotic arm may be con figured to perform multiple tasks, wherein the multiple tasks follow an order and inputs are received within a pre-determined time interval. It should be obvious to a person of ordinary skill in the art that the exemplary case illustrates the robotic arm performing two tasks, and more than two tasks may be performed, and all such variation fall within the scope of the present disclosure.
[0047] In an exemplary case, an experimental setup for enablement includes human operator 400, a robotic arm 420, imaging device, for example a simple RGB webcam, mounted at the top of the robot’s end effector (illustrated in Figure 1), a microphone and speaker used for audio inputs (Illustrated in Figure 1), tool post 430 consisting of three designated slots for three different types of tools, a controller, for example U2D2 controller, for controlling the robotic arm, motion planning was performed though moveit on ROS noetic on a Ubuntu system and Python used for controlling all the audio-visual perceptions.
[0048] In the exemplary case, a perception module consists of the APU and VPU. In the exemplary case, for visual perception, the imaging device mounted on the robotic arm captures the frames using OpenCV wherein the Mediapipe framework is used to track the 21 keypoints of the human hand. In the exemplary case, the tracked keypoints are used to train a ML model using logistic regression to identify the different hand gestures made by the human operator. In the exemplary case, 7500 frames were captured for each hand gestures and a classifier model were trained. In the exemplary case, the model was then tested on a separate set of 4500 frames and achieved an accuracy of about 99.7%. In this specific exemplary case, only two gestures were trained to define whether the user wants a tool, or needs to put a tool back into the tool post. It should however be obvious to a person skilled in the art that many other gestures may be trained and the training of two gestures in the experimental setup should not be construed as a limiting factor on the scope of the present disclosure. An exemplary confusion matrix associated with the two gesture is presented in Table 2.
Table 2

[0049] In the exemplary case, the audio perception module identifies keywords from the voice input, received as an audio signal, which is given by human operator. The voice input is converted to text or a string of texts and keywrds identified from the string of text. In the exemplary case, features were extracted from the audio signal by converting it to MFCC. In the exemplary case, 300 audio samples are taken for each keyword (“Screwdriver”, “Pen”, “Marker”, “Back”), and 2500 samples for background noises, each having duration of about 2 seconds. The MFCC features extracted were used to train a Deep Neural Network (structure: [40, 256, 256, D(0.5), 256, D(0.5), 5]) for classification of keywords from the audio signals. The waveforms for each of the audio signal may be stored for future use. In the exemplary case, the collected dataset is split in 0.8 and 0.2 for training and testing respectively. In the exemplary case, the model is tested on the test dataset and an accuracy of 0.97 was achieved for classification of 4 keywords. Confusion matrix for prediction on test dataset is discussed later in the disclosure (Figure 8).
[0050] In the exemplary case, only when both audio and visual commands are recognized, based on the combination of both commands, the desired task is identified. In the exemplary case, when a gesture recognised is recognized for ‘Asking something’, and the keyword ‘Screw driver’ is identified in the voice command, the task for fetching the tool ‘screw driver’ from the designated slot in too post and giving it in the human operator’s hand is recognized, processed accordingly and then performed by the robotic arm.
[0051] In the exemplary case, upon identification of task, the robot’s trajectory is planned and then commands are generated to move the robot in desired path to complete the desired task. In the exemplary case, an attention-grabbing mechanism allows the robot to keep working on other tasks instead of waiting for audio-visual commands all the time. In the exemplary case, when the operator makes a gesture in front of the robot, it implies that the operator needs some assistance and thus requires the robot attention. In the exemplary case, the robot then moves towards the operator, and then waits for audio-visual cues to provide assistance by performing the desired tasks. In the exemplary case, embodiments of the present disclosure include an AI (ML) model that is trained to recognize the hand gestures made by the human operator, Deep Neural Networks (DNN) used for training the voice inputs and the APU and VPU working in conjunction to complete the tasks allocated to the robot. It should be obvious that other techniques and algorithms may be used to achieve the same results and all such techniques fall within the scope of the present disclosure.
[0052] Reference is now made to Figure 5, which is an exemplary illustration of a method 500 for operating the robot in a HRC environment in accordance with an embodiment of the present disclosure. In step 510, command received from the operator, human, is identified. On identifying a correct gesture being made, in step 520 the robot is activated and is ready to accept inputs and perform tasks. In step 530, a hand gesture related to a task and an audio input related to the task is identified by the robot. In step 560, the hand gesture and the voice command are correlated, by extracting keywords from the voice command and checking the keywords against the task and gesture. In step 540 if there is a mismatch between the visual signal and the audio input, which has been discussed previously in details, in step 550 the robot does nothing and provides an error indicator to the human operating the robot. As disclosed previously, all gestures are processed using an AI detection algorithm and voice inputs are possessed using DNN. It should also be obvious to a person of ordinary skill in the art that various other AI/ML, and neural network algorithms, for example, CNN, may be used and all such variations fall within the scope of the present disclosure.
[0053] Reference is now made to Figure 6, which is another exemplary illustration of a method for operating the robot in a HRC environment in accordance with an embodiment of the present disclosure. In step 610, command received from the operator, human, is identified. On identifying a correct gesture being made, in step 620 the robot is activated and is ready to accept inputs and perform tasks. In step 630, a hand gesture related to a task and an audio input related to the task is identified by the robot. In step 660, the hand gesture and the voice command are correlated, by extracting keywords from the voice command and checking the keywords against the task and gesture. In step 640 if there is a match between the visual signal and the audio input, that is the audio input and the gesture have been reconciled with, which has been discussed previously in details, in step 650 the robot is configured to perform the task or set of tasks. As disclosed previously, all gestures are processed using an AI detection algorithm and voice inputs are possessed using DNN. It should also be obvious to a person of ordinary skill in the art that various other AI/ML, and neural network algorithms, for example, CNN, may be used and all such variations fall within the scope of the present disclosure.
[0054] Reference is now made to Figure 7, which is an exemplary illustration of block diagram 700 illustrating the operation of the robot in the HRC environment in accordance with an embodiment of the present disclosure. Operator 710 is a human interacting with robot 740. Operator 710 (also referred alternatively in the present disclosure as human or human operator) provides inputs to robot 740. Robot 740 is configured to accept input from operator 710 and process the inputs received from operator 710. In an exemplary case, operator 710 first activates robot 740, which may either be in a sleep mode or in a task performing mode. When a first gesture is made by operator 710, as disclosed previously, an open palm with the finger pointing upwards, robot is activated. Robot 740 may be configured to intimate the operator either by audio means or visual means that it is activated, for example by simply providing an audio output as “activated”. When robot 740 is already in an active state and performing some task(s), a first gesture made to robot 740 is identified and robot may intimate operator 710 with an audio means or visual means, for example by saying “Already Active”.
[0055] In an exemplary case, when robot 740 is active, robot 740 may be configured to perform tasks, within a pre-defined range or region, called as the field of operation. The field of operation may vary depending on the type of application where the robot is required to perform the tasks. In an exemplary case, in an operator theatre, robot 740 may be placed proximate to the surgeon, and the field of operation may be restricted to a not more than two meters. In another exemplary case, for an impaired in a room, the field of operation for robot 740 may be a few meters, wherein a command and gesture to robot 740 may be to fetch coffee for the impaired person. In such a case, robot 740 may be configured to fetch coffee from the table or a coffee making machine and provide the coffee to the impaired person.
[0056] In an exemplary case. in the active state, robot 740 required at least two inputs in conjunction to perform a task. Robot 740 received a visual signal as a gesture from operator 710 and simultaneously receive an audio input from operator 710, wherein the visual signal and the audio input are provided in near real-time or may be separated within a predefined time interval. If either signal is not received or if either signal is beyond the pre-defined time, robot 740 provides an error indicator to operator 710. In an exemplary embodiment, as disclosed previously the visual input is obtained from an imaging device attached to robot 740, and the audio input is obtained from an audio device attached to robot 740.
[0057] Robot 710 processes the visual signal and the audio signal by processing unit 720. In an exemplary case processing unit 720 may be placed within robot 740. In another exemplary case, processing 720 may be a computing device, such as a computer or a mobile phone or the likes and the visual signals and audio signals may be transmitted to the computing device, be processed by processing unit 720 at the computing device and processed task or instructions transmitted back to robot to perform the action. In the case of processing unit being externally connected to robot 740, the communication may be by wired means or wireless means or a combination thereof.
[0058] Processing unit 720 has a visual perception unit 722 (VPU) and a audio perception unit 726 (APU). Both VPU 722 and APU 726 work in conjunction based on the input received by robot 740 and process input. VPU 722 is configured to recognize a gesture made to robot 740 and identify the task associated with the gesture made. As disclosed previously, if the gesture is an open palm facing upwards, robot 740 immediately recognized that operator is asking for something. In case there is no audio input for correlating the gesture within a pre-defined time interval, robot 740 may provide an error indicator and go back to the sleep mode. APU 726 is configured to recognize audio inputs made to robot 740. Once the audio input is received, APU 726 is configured to extract keywords from the audio input. The processed keywords and gesture are provide to a task recognition unit 730, which has a mapping the keywords with an available set of tasks. In an embodiment, the set of tasks may be dynamically updated by operator 710 or automatically by the Artificial Intelligence or Machine Learning model being used. Task recognition unit 730 then provides the task to be performed to robot 740.
[0059] Only when both audio and visual commands are recognized, based on the combination of both commands, the desired task is identified. In an exemplary case, when a gesture recognised is recognized for ‘Asking something’, and the keyword ‘Screw driver’ is identified in the voice command, the task for fetching the tool ‘screw driver’ from the designated slot in tool post and giving it in the human operator’s hand is recognized and a the robot may perform the recognized task. In an exemplary case, upon identification of task, the robot’s trajectory and/or path may be planned and then commands provided and/or generated to move the robot in desired path to complete the desired task.
[0060] In an exemplary case, as disclosed previously, the processing module consists of APU and VPU. In an exemplary embodiment, for the visual perception, a camera mounted on robot arm captures the frames using any of the know techniques, and the Mediapipe framework is used to track the 21 key points on human-hand to identify the gesture. In an exemplary embodiment, the imaging device output is configured through an AI module to identify gestures made by human.
[0061] In an exemplary embodiment, the tracked key points may be used to train a machine learning model or any artificial intelligence model using logistic regression to identify different hand gestures. In an exemplary case, 7500 frames may be captured for each hand gestures and a classifier model may be trained. It should be obvious to a person skilled in the art that the number indicated here is only exemplary in nature and either more frames or lesser fames may be required to train the model. After training, the model may be tested on a separate set of 4500 frames and an accuracy of about 0.997 was achieved. In the exemplary case in accordance with the present disclosure, two gestures were trained to define whether the user wants a tool or needs to put a tool back the tool into the tool post, and almost a 100% accuracy was observed with the trained dataset.
[0062] In an exemplary embodiment, APU identifies keywords from the voice input given by human operator and converts the keywords identified to text. In an exemplary embodiment, features are extracted from the audio signal by converting it to MFCC. In an exemplary case, 300 audio samples were taken for each keyword (“Screwdriver”, “Pen”, “Marker”, “Back”), and 2500 samples for background noises, each having duration of 2 seconds. In the exemplary case, the MFCC features extracted were used to train a Deep Neural Network (structure: [40, 256, 256, D(0.5), 256, D(0.5), 5]) for classification of keywords from audio signal. As disclosed previously, all gestures are processed using an AI detection algorithm and voice inputs are possessed using DNN. It should also be obvious to a person of ordinary skill in the art that various other AI/ML, and neural network algorithms, for example, CNN, may be used and all such variations fall within the scope of the present disclosure.
[0063] Reference is now made to Figure 8, which is an exemplary illustration of a confusion matrix 800 in accordance with an embodiment of the present disclosure. In an exemplary embodiment. the above collected dataset may be split in 0.8 and 0.2 for training and testing respectively. The model may be tested on the test dataset and achieved an accuracy of 0.97 for classification of 4 keywords. Confusion matrix for prediction on test dataset is illustrated in Figure 8. It should be obvious to a person of ordinary skill in the art that the dataset and gesture used here were only as an exemplary illustration and larger datasets may be trained and larger set of gestures mapped to tasks, and all such variations fall within the scope and spirit of the present disclosure.
[0064] As disclosed previously, it is important for a skilled person to realize that only when both the audio command and visual command which are input are recognized, based on the combination of both commands, the desired task is identified, and the tasks may be performed by the robot. Upon identification of task, the robot’s trajectory and path may be planned and then commands are generated to move the robot in desired trajectory and path to complete the desired task. Other exemplary case have been discussed previously with respect to the present disclosure. As disclosed previously, advantageously, embodiments of the present disclosure may be used in a range of applications from household social robots to hospital robots to industrial practices by providing the operator with an intuitive and natural way for communication with the robots Advantageously, the embodiments disclosed herein are cost effective, wherein complex controllers and programming may be avoided.
[0065] Although the present disclosure has been described with reference to several preferred embodiments, it should be understood that the present disclosure is not limited to the preferred embodiments disclosed here. Embodiments of the present disclosure are intended to cover various modifications and equivalent arrangements within the spirit and scope of the appended claims. Although the foregoing disclosure has been described in some detail for purposes of clarity of understanding, it will be apparent that certain changes and modifications may be practised within the scope of the appended claims. Examples of the present disclosure have been described in language specific to structural features and/or methods. It should be noted that there are many alternative ways of implementing both the process and apparatus of the present invention. Accordingly, embodiments of the present disclosure are to be considered illustrative and not restrictive, and the invention is not to be limited to the details given herein but may be modified within the scope and equivalents of the appended claims. It should be understood that the appended claims are not necessarily limited to the specific features or methods described. Rather, the specific features and methods are disclosed and explained as examples of the present disclosure.
, C , C , Claims:1. A method for artificial intelligence based multimodal collaboration between a human and a robot, the method comprising:
identifying, by a robot, a first pre-defined gesture made by a human hand, wherein the first pre-defined gesture is configured to activate the robot;
activating, of the robot, in response to identifying the first pre-defined gesture;
receiving inputs from the human, by the robot, wherein the inputs received are for performing a task or a set of tasks;
performing the task or the set of tasks, by the robot, wherein, on receiving the inputs from the human, the inputs comprising a second gesture and an audio input, the robot is configured for processing the input received and performing the task or the set of tasks.

2. The method as claimed in claim 1, wherein the robot is placed proximate to a human, wherein the interaction of the robot and the human may be within a pre-defined range.

3. The method as claimed in claim 1 and 2, wherein on determination of a second gesture being made from within a pre-defined range and the audio input is received from outside the range or from another source:
providing an error indicator to the human interacting with the robot.

4. The method as claimed in claim 1 and 2, wherein on determination of a second gesture being made from outside the pre-defined range and the audio input is received from within the pre-defined range or from another source:
providing an error indicator to the human interacting with the robot.

5. The method as claimed in claim 1, wherein while the robot is in an operation mode performing a first task or a first set of tasks, if a second input from the human is provided to the robot, wherein the second input includes a further gesture and a further audio input, the robot is configured to process the second input, and provide an indicator to the human that the task or set of tasks is in a pending queue, and assign a task associated with the second input to a queue.

6. The method as claimed in claim 1, wherein on negative determination of a mapping of a voice command with the second gesture, the robot configured to provide an error indicator to the human user.

7. The method as claimed in claim 1, wherein after activation, the robot is configured to identify the second gestures and related audio command within a specific time period, and on negative determination of the second gesture and/or the voice command, the robot configured to provide an error indicator.

8. The method as claimed in claims 2 to 7, wherein the error indicator or the indicator comprises at least one of an audio command issued from the robot to the human or a visual signal issued from the robot to the human or a combination thereof.

9. The method as claimed in claim 1, wherein the input from the human comprises:
a second gesture and an audio input in combination, wherein the second gesture and the audio input is in near real-time or within a pre-defined time interval.

10. The method as claimed in claim 1, wherein the audio input is a voice command provided to the robot by the human, and the voice command being associated with a task to be performed by the robot.

11. The method as claimed in claim 1, wherein an imaging device coupled to the robot is configured to identify the first gesture and/or the second gesture, and the robot configured to distinguish the first gesture from the second gesture.

12. The method as claimed in claim 1, wherein a visual perception unit coupled to the imaging device of the robot is configured identify and process the gesture from the human, wherein an output from the imaging device is configured through an AI module to identify gestures made by human

13. The method as claimed in claim 1, wherein an audio perception unit coupled to the robot is configured to identify and process the voice command from the human.

14. The method as claimed in claim 13, wherein the visual perception unit and the audio perception unit are configured to map the audio input with the second gesture.

15. The method as claimed in claim1, wherein the audio input is converted to text in the audio perception unit, and the audio perception unit configured to identifying keywords from the text and relate the keywords to the task or the set of tasks.

16. The method as claimed in claim 15, wherein task or set of tasks to be performed are identified from a mapping of the keywords to the task or set of tasks.

17. The method as claimed in claim 16, wherein the mapping is a ranked list of keywords with the task or set of tasks or a confidence score associated with the keywords and the task or the set of tasks.

18. The method as claimed in claim 1, wherein the gestures are processed by an artificial intelligence detection algorithm.

19. The method as claimed in claim 1, wherein the audio signal are processed by a neural network algorithm.

20. A artificial intelligence based multimodal human-robot collaborative environment comprising at least a human and a robot, where in the human and the robot, wherein the robot having at least a processor and a memory, the human and the robot interacting to perform a task or a set of tasks by performing the method as claimed in any of the claims 1 to 17.

Documents

Application Documents

# Name Date
1 202341087195-STATEMENT OF UNDERTAKING (FORM 3) [20-12-2023(online)].pdf 2023-12-20
2 202341087195-PROOF OF RIGHT [20-12-2023(online)].pdf 2023-12-20
3 202341087195-POWER OF AUTHORITY [20-12-2023(online)].pdf 2023-12-20
4 202341087195-FORM FOR SMALL ENTITY(FORM-28) [20-12-2023(online)].pdf 2023-12-20
5 202341087195-FORM 1 [20-12-2023(online)].pdf 2023-12-20
6 202341087195-FIGURE OF ABSTRACT [20-12-2023(online)].pdf 2023-12-20
7 202341087195-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [20-12-2023(online)].pdf 2023-12-20
8 202341087195-EVIDENCE FOR REGISTRATION UNDER SSI [20-12-2023(online)].pdf 2023-12-20
9 202341087195-EDUCATIONAL INSTITUTION(S) [20-12-2023(online)].pdf 2023-12-20
10 202341087195-DRAWINGS [20-12-2023(online)].pdf 2023-12-20
11 202341087195-DECLARATION OF INVENTORSHIP (FORM 5) [20-12-2023(online)].pdf 2023-12-20
12 202341087195-COMPLETE SPECIFICATION [20-12-2023(online)].pdf 2023-12-20
13 202341087195-FORM-9 [21-12-2023(online)].pdf 2023-12-21
14 202341087195-FORM-8 [21-12-2023(online)].pdf 2023-12-21
15 202341087195-FORM 18A [22-12-2023(online)].pdf 2023-12-22
16 202341087195-EVIDENCE OF ELIGIBILTY RULE 24C1f [22-12-2023(online)].pdf 2023-12-22
17 202341087195-RELEVANT DOCUMENTS [16-05-2024(online)].pdf 2024-05-16
18 202341087195-POA [16-05-2024(online)].pdf 2024-05-16
19 202341087195-FORM 13 [16-05-2024(online)].pdf 2024-05-16
20 202341087195-FER.pdf 2024-05-16
21 202341087195-OTHERS [12-11-2024(online)].pdf 2024-11-12
22 202341087195-FER_SER_REPLY [12-11-2024(online)].pdf 2024-11-12
23 202341087195-DRAWING [12-11-2024(online)].pdf 2024-11-12
24 202341087195-COMPLETE SPECIFICATION [12-11-2024(online)].pdf 2024-11-12
25 202341087195-CLAIMS [12-11-2024(online)].pdf 2024-11-12
26 202341087195-Proof of Right [16-01-2025(online)].pdf 2025-01-16
27 202341087195-PatentCertificate25-03-2025.pdf 2025-03-25
28 202341087195-IntimationOfGrant25-03-2025.pdf 2025-03-25

Search Strategy

1 SearchHistory(1)E_30-04-2024.pdf
2 202341087195_SearchStrategyAmended_E_humanRobotAE_18-03-2025.pdf

ERegister / Renewals

3rd: 30 Apr 2025

From 20/12/2025 - To 20/12/2026

4th: 30 Apr 2025

From 20/12/2026 - To 20/12/2027

5th: 30 Apr 2025

From 20/12/2027 - To 20/12/2028

6th: 30 Apr 2025

From 20/12/2028 - To 20/12/2029

7th: 30 Apr 2025

From 20/12/2029 - To 20/12/2030

8th: 30 Apr 2025

From 20/12/2030 - To 20/12/2031

9th: 30 Apr 2025

From 20/12/2031 - To 20/12/2032

10th: 30 Apr 2025

From 20/12/2032 - To 20/12/2033