A System, A Method And A Computer System For Assisting Users Using

< Back

A System, A Method And A Computer System For Assisting Users Using Audio Visual Cues

Abstract: A method (200) for detecting objects (3110) and assisting users in performing specific tasks using visual, acoustic and cognitive cues of real and virtual objects within mixed reality environment. The method (200) comprises the steps of receiving (210) audio-visual data of the environment, processing (220) the visual data and detecting objects, determining (230) pose estimation, RGB, depth, image spectroscopy and thermal imaging for identifying characteristics (3220) of each of the detected objects, displaying (240) the detected objects along with the characteristics of the detected objects, processing (250) the audio data for recognizing the sound as a machine noise (3320) or a spoken language (3410) and assisting (260) the user in performing specific tasks based on the processed visual data, audio data and/or the thermal imaging values. [FIGURE 1]

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

01 November 2019

Publication Number

42/2020

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Email

vivek@boudhikip.com

Parent Application

Patent Number

Legal Status

Grant Date

2024-01-12

Renewal Date

Applicants

DIMENSION NXG PRIVATE LIMITED

501, Arcadia, Hiranandani Estate, Patlipada, GB Road, Thane West, Maharashtra- 400607, India

Inventors

1. Pankaj Raut

501, Arcadia, Hiranandani Estate, Patlipada, GB Road, Thane West, Maharashtra - 400607, India

2. Yukti Suri

501, Arcadia, Hiranandani Estate, Patlipada, GB Road, Thane West, Maharashtra - 400607, India

3. Abhishek Tomar

501, Arcadia, Hiranandani Estate, Patlipada, GB Road, Thane West, Maharashtra - 400607, India

4. Suraj

501, Arcadia, Hiranandani Estate, Patlipada, GB Road, Thane West, Maharashtra- 400607, India

5. Abhijit Patil

501, Arcadia, Hiranandani Estate, Patlipada, GB Road, Thane West, Maharashtra - 400607, India

6. Sunit Kumar Adhikary

501, Arcadia, Hiranandani Estate, Patlipada, GB Road, Thane West, Maharashtra - 400607, India

Specification

Claims:We claim
1. A method (200) for detecting objects (3110) and assisting users in performing specific tasks using visual, acoustic and cognitive cues of real and virtual objects within mixed reality environment, the method (200) comprising the steps of:
receiving audio-visual data of the environment from an image and audio acquisition device using a Mixed Reality (MR) based Head Mounted Device (HMD (102));
processing the visual data and detecting objects (3110) within a captured scene;
determining 6DOF pose estimation, RGB values, depth values, image spectroscopy values and thermal imaging values for identifying one or more characteristics (3220) of each of the detected objects (3110);
displaying the detected objects (3110) along with the identified one or more characteristics (3220) of each of the detected objects (3110) using the HMD (102), thereby enabling a user to develop an understanding of the detected objects (3110);
processing the audio data for recognizing the sound as a machine noise (3320) or a spoken language (3410); and
assisting the user in performing specific tasks based on the processed visual data, audio data and/or the thermal imaging values.
2. The method (200) as claimed in claim 1, wherein the sound is recognised as the machine noise (3320), the step of assisting the user comprises:
identifying a machine and a working condition of the machine based on the sound received, the visual data and/or the thermal imaging values; and
guiding the user with next steps (3310) towards handling and/or debugging the machine in a Mixed Reality space, thereby assisting the user in performing specific tasks.
3. The method (200) as claimed in claim 1, wherein the sound is recognised as the spoken language (3410), the step of assisting the user comprises:
identifying a language based on the sound received; and
translating the recognised language to the user understandable interpretation (3420) using the HMD (102), thereby assisting the user in performing specific tasks.
4. The method (200) as claimed in claim 1, wherein the detected objects (3110) are selected from previously unseen objects (3110) and/or previously seen objects (3110).
5. The method (200) as claimed in claim 4, wherein the previously unseen objects (3110) and/or previously seen objects (3110) include industrial machines, industrial tools, daily life objects (3110), food items and beverages.
6. The method (200) as claimed in claim 4, further comprising a step of registering the detected objects (3110) with the help of a feedback of the user, in case the detected objects (3110) are previously unseen objects (3110).
7. The method (200) as claimed in claim 6, wherein the step of registering the detected objects (3110) with the help of the feedback of the user, comprises the steps of:
prompting the user to provide information about the previously unseen object that is detected;
receiving the information from the user as a feedback; and
registering the previously unseen object as a previously seen object based on the information received for future reference.
8. The method (200) as claimed in claim 7, wherein the information is in the form of audio, video, hand gestures, written document and/or a combination thereof.
9. The method (200) as claimed in claim 1, wherein the one or more characteristics (3220) of each of the detected objects (3110), are selected from a group comprising material used, properties of the material, usage of the detected object, estimated life of the detected object, information regarding product lifecycle and information regarding a present state of the detected object including a temperature, quality of the material used, maintenance status and possible chances of failures of the detected object.
10. The method (200) as claimed in claim 1, wherein the working condition of the machine is selected from a normal condition and a faulty condition.
11. The method (200) as claimed in claim 1, wherein the HMD (102) comprises one or more depth cameras (402) for capturing the visual data, one or more microphones (404) for capturing the audio data, a voice recognition module (406), a processing module (414), an RGB module (408), a spectroscopy module (410), a thermal imaging module (412) and a storage module (416) having prestored data.
12. A computer system (104) for detecting objects (3110) and assisting users in performing specific tasks using visual, acoustic and cognitive cues of real and virtual objects within mixed reality environment, the computer system (104) being connected with the Mixed Reality (MR) based Head Mounted Device (HMD (102)), the computer system (104) comprising:
a memory unit (1044) configured to store machine-readable instructions; and
a processor (1042) operably connected with the memory unit, the processor (1042) obtaining the machine-readable instructions from the memory unit, and being configured by the machine-readable instructions to:
receive audio-visual data of the environment from an image and audio acquisition device using a Mixed Reality (MR) based Head Mounted Device (HMD (102));
process the visual data and detecting objects (3110) within a captured scene;
determine 6DOF pose estimation, RGB values, depth values, image spectroscopy values and thermal imaging values for identifying one or more characteristics (3220) of each of the detected objects (3110) using the HMD (102);
display the detected objects (3110) along with the identified one or more characteristics (3220) of each of the detected objects (3110) using the HMD (102), thereby enabling a user to develop an understanding of the detected objects (3110);
process the audio data for recognizing the sound as a machine noise (3320) or a spoken language (3410);
assist the user in performing specific tasks based on the processed visual data, audio data and/or the thermal imaging values.
13. The computer system (104) as claimed in claim 12, wherein the sound is recognised as the machine noise (3320) and the processor (1042) is further configured to:
identify a machine and a working condition of the machine based on the sound received, the visual data and/or the thermal imaging values; and
guide the user with next steps (3310) towards handling and/or debugging the machine in a Mixed Reality space, thereby assisting the user in performing specific tasks.
14. The computer system (104) as claimed in claim 12, wherein the sound is recognised as the spoken language (3410) and the processor (1042) is further configured to:
identify a language based on the sound received; and
translate the recognised language to the user understandable interpretation (3420) using the HMD (102), thereby assisting the user in performing specific tasks.
15. The computer system (104) as claimed in claim 12, wherein the detected objects (3110) are selected from previously unseen objects (3110) and/or previously seen objects (3110).
16. The computer system (104) as claimed in claim 15, wherein the previously unseen objects (3110) and/or previously seen objects (3110) include industrial machines, industrial tools, daily life objects (3110), food items and beverages.
17. The computer system (104) as claimed in claim 15, wherein the processor (1042) is further configured to register the detected objects (3110) with the help of a feedback of the user, in case the detected objects (3110) are previously unseen objects (3110).
18. The computer system (104) as claimed in claim 17, wherein the processor (1042) is further configured to register the previously unseen objects (3110) by:
prompting the user to provide information about the previously unseen object that is detected;
receiving the information from the user as a feedback; and
registering the previously unseen object as a previously seen object based on the information received for future reference.
19. The computer system (104) as claimed in claim 18, wherein the information may be in the form of audio, video, hand gestures, written document and/or a combination thereof.
20. The computer system (104) as claimed in claim 12, wherein the one or more characteristics (3220) of each of the detected objects (3110), are selected from a group comprising material used, properties of the material, usage of the detected object, estimated life of the detected object, information regarding product lifecycle and information regarding a present state of the detected object including a temperature, quality of the material used, maintenance status and possible chances of failures of the detected object.
21. The computer system (104) as claimed in claim 12, wherein the working condition of the machine is selected from a normal condition and a faulty condition.
22. The computer system (104) as claimed in claim 12, wherein the HMD (102) comprises one or more depth cameras (402) for capturing the visual data, one or more microphones (404) for capturing the audio data, a voice recognition module (406), a processing module (414), an RGB module (408), a spectroscopy module (410), a thermal imaging module (412) and a storage module (416) having prestored data.
23. The computer system (104) as claimed in claim 22, wherein the prestored data in the data repository includes a plurality of previously seen detected objects (3110), a plurality of sound samples corresponding to multiple languages, sounds and the thermal imaging values of several industrial machines and tools in respective normal working conditions and respective faulty working conditions.
24. A system (400) for detecting objects (3110) and assisting users in performing specific tasks using visual, acoustic and cognitive cues of real and virtual objects within mixed reality environment, the system (400) comprising:
a Mixed Reality (MR) based Head Mounted Device (HMD (102)) having one or more depth cameras (402) for capturing the visual data, one or more microphones (404) for capturing the audio data, a voice recognition module (406), an RGB module (408), a spectroscopy module (410) and a thermal imaging module (412);
a processing module (414); and
a storage module (416) having prestored data;
wherein the processing module (414) is configured to:
receive audio-visual data of the environment from an image and audio acquisition device using a Mixed Reality (MR) based Head Mounted Device (HMD (102));
process the visual data and detecting objects (3110) within a captured scene;
determine 6DOF pose estimation, RGB values, depth values, image spectroscopy values and thermal imaging values for identifying one or more characteristics (3220) of each of the detected objects (3110) using the HMD (102);
display the detected objects (3110) along with the identified one or more characteristics (3220) of each of the detected objects (3110) using the HMD (102), thereby enabling a user to develop an understanding of the detected objects (3110);
process the audio data for recognizing the sound as a machine noise (3320) or a spoken language (3410);
assist the user in performing specific tasks based on the processed visual data, audio data and/or the thermal imaging values.
25. The system (400) as claimed in claim 24, wherein the sound is recognised as the machine noise (3320) and the processor (1042) is further configured to:
identify a machine and a working condition of the machine based on the sound received, the visual data and/or the thermal imaging values; and
guide the user with next steps (3310) towards handling and/or debugging the machine in a Mixed Reality space, thereby assisting the user in performing specific tasks.
26. The system (400) as claimed in claim 24, wherein the sound is recognised as the spoken language (3410) and the processor (1042) is further configured to:
identify a language based on the sound received; and
translate the recognised language to the user understandable interpretation (3420) using the HMD (102), thereby assisting the user in performing specific tasks.
27. The system (400) as claimed in claim 24, wherein the detected objects (3110) are selected from previously unseen objects (3110) and/or previously seen objects (3110).
28. The system (400) as claimed in claim 27, wherein the previously unseen objects (3110) and/or previously seen objects (3110) include industrial machines, industrial tools, daily life objects (3110), food items and beverages.
29. The system (400) as claimed in claim 27, wherein the processing module (414) is further configured to register the detected objects (3110) with the help of a feedback of a user, in case the detected objects (3110) are previously unseen objects (3110).
30. The system (400) as claimed in claim 29, wherein the processing module (414) is further configured to register the previously unseen objects (3110) by:
prompting the user to provide information about the previously unseen object that is detected;
receiving the information from the user as a feedback; and
registering the previously unseen object as a previously seen object based on the information received for future reference.
31. The system (400) as claimed in claim 30, wherein the information may be in the form of audio, video, hand gestures, written document and/or a combination thereof.
32. The system (400) as claimed in claim 24, wherein the one or more characteristics (3220) of each of the detected objects (3110), are selected from a group comprising material used, properties of the material, usage of the detected object, estimated life of the detected object, information regarding product lifecycle and information regarding a present state of the detected object including a temperature, quality of the material used, maintenance status and possible chances of failures of the detected object.
33. The system (400) as claimed in claim 24, wherein the working condition of the machine is selected from a normal condition and a faulty condition.
34. The system (400) as claimed in claim 24, wherein the prestored data in the data repository includes a plurality of previously seen detected objects (3110), a plurality of sound samples corresponding to multiple languages, sounds and the thermal imaging values of several industrial machines and tools in respective normal working conditions and respective faulty working conditions.
35. A method for detecting objects (3110) and assisting users in performing specific tasks using visual, acoustic and cognitive cues of real and virtual objects within mixed reality environment, the method comprising the steps of:
receiving audio-visual data of the environment from an image and audio acquisition device using a Mixed Reality (MR) based Head Mounted Device (HMD (102));
processing the visual data and detecting objects (3110) within a captured scene;
determining 6DOF pose estimation, RGB values, depth values, image spectroscopy values and thermal imaging values for identifying one or more characteristics (3220) of each of the detected objects (3110); and
displaying the detected objects (3110) along with the identified one or more characteristics (3220) of each of the detected objects (3110) using the HMD (102) in a Mixed Reality space, thereby enabling a user to develop an understanding of the detected objects (3110) and perform a specific task.
36. The method as claimed in claim 35, wherein the detected objects (3110) are selected from previously unseen objects (3110) and/or previously seen objects (3110).
37. The method as claimed in claim 36, wherein the previously unseen objects (3110) and/or previously seen objects (3110) include industrial machines, industrial tools, daily life objects (3110), food items and beverages.
38. The method as claimed in claim 36, further comprising a step of registering the detected objects (3110) with the help of a feedback of a user, in case the detected objects (3110) are previously unseen objects (3110).
39. The method as claimed in claim 38, wherein the step of registering the detected objects (3110) with the help of the feedback of the user, comprises the steps of:
prompting the user to provide information about the previously unseen object that is detected;
receiving the information from the user as a feedback; and
registering the previously unseen object as a previously seen object based on the information received for future reference.
40. The method as claimed in claim 39, wherein the information is in the form of audio, video, hand gestures, written document and/or a combination thereof.
41. The method as claimed in claim 35, wherein the one or more characteristics (3220) of each of the detected objects (3110), are selected from a group comprising material used, properties of the material, usage of the detected object, estimated life of the detected object, information regarding product lifecycle and information regarding a present state of the detected object including a temperature, quality of the material used, maintenance status and possible chances of failures of the detected object.
42. The method as claimed in claim 35, wherein the HMD (102) comprises one or more depth cameras (402) for capturing the visual data, one or more microphones (404) for capturing the audio data, a voice recognition module (406), a processing module (414), an RGB module (408), a spectroscopy module (410), a thermal imaging module (412) and a storage module (416) having prestored data.

Dated this the 1st day of November 2019

[VIVEK DAHIYA]
AGENT FOR THE APPLICANT- IN/PA 1491
, Description:FORM 2
THE PATENTS ACT 1970
(39 of 1970)
&
THE PATENTS RULES, 2003
COMPLETE SPECIFICATION
[See section 10 and rule 13]

"A SYSTEM, A METHOD AND A COMPUTER SYSTEM FOR ASSISTING USERS USING AUDIO-VISUAL CUES"

We, DIMENSION NXG PRIVATE LIMITED, an Indian company, having a registered office at 501, Arcadia, Hiranandani Estate, Patlipada, Ghodbunder Road, Thane, Maharashtra -400607, India

The following specification particularly describes the invention and the manner in which it is to be performed.
FIELD OF THE INVENTION
Embodiment of the present invention generally relates to an artificial intelligence powered mixed reality system and more particularly to a system, a method and a computer system for detecting objects and assisting users in performing specific tasks using visual, acoustic and cognitive cues of real and virtual objects within mixed reality environment.
BACKGROUND OF THE INVENTION
Object detection refers to a technology incorporating computer vision and image processing techniques aimed towards detecting instances of semantic objects of a certain class (human, animals, vegetables, cars etc) in digital images and videos. The implementation of such a technology on embedded platforms extends its branches through various different approaches. The mainstream solution towards deploying a computationally inexpensive robust object detection framework mostly incorporates miniaturised versions of state of the art deep learning architectures like YOLO, Faster RCNN etc.
In the era of artificial intelligence most of the task performed by humans are being automated to minimise human errors and increase efficiency of the tasks performed. For example: industrial automation has grown leaps and bounds in recent times. Every industry involves a lot of machinery and equipment involved in the processing, manufacturing, production and/or packaging of the products. These require specific tasks such as diagnostic/maintenance of machines and equipment in warehouses and factories. Another example may be of personal communications being automated such as interpretation of languages, counselling and consultations. Such specific tasks are generally automated using artificial intelligence algorithms that only take into consideration, the visual, constructional and operational data of the components engaged in the task.
Considering the first example, existing technologies incorporate visual modality to perform the analysis of machines in different sections of industries. Often, the placement of visual input device fetches occluded view of spare parts of machines, this makes it very difficult to analyse the proper/improper functioning. In case of interpretation of languages, not much innovation has been done to enable effective communication between individuals in real time. The communication may be constrained with their language barriers towards understanding concepts and sharing their thoughts with other people of foreign languages. Existing solutions do not make use of the sounds produced by the humans or the machines to assist humans towards performing specific tasks.
Moreover, traditional methods for detecting objects like scanning barcodes imprinted over objects have been used actively for understanding details about an object in retail outlets, groceries etc. However, all of the above solutions existing in industries haven’t extended its implementation to leverage transfer learning approaches in deep learning technologies towards adaptivity over unseen objects which in turn help in identifying the pose and characteristics of objects in real world.
Therefore, there is a need in the art for a system, a method and a computer system for detecting objects and assisting users in performing specific tasks using visual, acoustic and cognitive cues of real and virtual objects within mixed reality environment.
OBJECT OF THE INVENTION
An object of the present invention is to provide a method for detecting objects and assisting users in performing specific tasks using visual, acoustic and cognitive cues of real and virtual objects within mixed reality environment.
Another object of the present invention is to provide a system for detecting objects and assisting users in performing specific tasks using visual, acoustic and cognitive cues of real and virtual objects within the mixed reality environment.
Yet another object of the present invention is to provide computer system for detecting objects and assisting users in performing specific tasks using visual, acoustic and cognitive cues of real and virtual objects within the mixed reality environment.
Yet another object of the present invention is to understand humans as well as machines using intelligence imparted through audio & visual modalities using artificial intelligence.
Yet another object of the invention is to provide an over the head mounted device multimodal annotation tool which will let users annotate unseen objects using voice or discrete hand gestures.
SUMMARY OF THE INVENTION
Embodiments of the present invention discloses a system and a method for assisting humans in performing specific tasks using cognitive acoustic technology powered by artificial intelligence in mixed reality environment towards the aim to understand humans as well as machines using sound modality.
According to first aspect of the present invention, a method for detecting objects and assisting users in performing specific tasks using visual, acoustic and cognitive cues of real and virtual objects within a mixed reality environment is provided. The method comprises the steps of receiving audio-visual data of the environment from an image and audio acquisition device using a Mixed Reality (MR) based Head Mounted Device (HMD), processing the visual data and detecting objects within a captured scene, determining 6DOF pose estimation, RGB values, depth values, image spectroscopy values and thermal imaging values for identifying one or more characteristics of each of the detected objects, displaying the detected objects along with the identified one or more characteristics of each of the detected objects using the HMD, thereby enabling a user to develop an understanding of the detected objects, processing the audio data for recognizing the sound as a machine noise or a spoken language and assisting the user in performing specific tasks based on the processed visual data, audio data and/or the thermal imaging values.
In accordance with an embodiment the present invention, the sound is recognised as the machine noise. Further, the step of assisting the user comprises identifying a machine and a working condition of the machine based on the sound received, the visual data and/or the thermal imaging values and guiding the user with next steps towards handling and/or debugging the machine in a Mixed Reality space, thereby assisting the user in performing specific tasks.
In accordance with an embodiment the present invention, the sound is recognised as the spoken language. Further, the step of assisting the user comprises identifying a language based on the sound received and translating the recognised language to the user understandable interpretation using the HMD, thereby assisting the user in performing specific tasks.
In accordance with an embodiment the present invention, the detected objects are selected from previously unseen objects and/or previously seen objects.
In accordance with an embodiment the present invention, the previously unseen objects and/or previously seen objects include industrial machines, industrial tools, daily life objects, food items and beverages.
In accordance with an embodiment the present invention, further comprises a step of registering the detected objects with the help of a feedback of the user, in case the detected objects are previously unseen objects.
In accordance with an embodiment the present invention, the step of registering the detected objects with the help of the feedback of the user, comprises the steps of prompting the user to provide information about the previously unseen object that is detected, receiving the information from the user as a feedback and registering the previously unseen object as a previously seen object based on the information received for future reference.
In accordance with an embodiment the present invention, the information is in the form of audio, video, hand gestures, written document and/or a combination thereof.
In accordance with an embodiment the present invention, the one or more characteristics of each of the detected objects, are selected from a group comprising material used, properties of the material, usage of the detected object, estimated life of the detected object, information regarding product lifecycle and information regarding a present state of the detected object including a temperature, quality of the material used, maintenance status and possible chances of failures of the detected object.
In accordance with an embodiment the present invention, the working condition of the machine is selected from a normal condition and a faulty condition.
In accordance with an embodiment the present invention, the HMD comprises one or more depth cameras for capturing the visual data, one or more microphones for capturing the audio data, a voice recognition module, a processing module, an RGB module, a spectroscopy module, a thermal imaging module and a storage module having prestored data.
In accordance with an embodiment the present invention, the prestored data in the data repository includes a plurality of previously seen detected objects, a plurality of sound samples corresponding to multiple languages and sounds of several industrial machines and tools in respective normal working conditions and respective faulty working conditions.
According to second aspect of the present invention, a computer system for detecting objects and assisting users in performing specific tasks using visual, acoustic and cognitive cues of real and virtual objects within the mixed reality environment, the computer system being connected with the Mixed Reality (MR) based Head Mounted Device (HMD) is provided. The computer system comprises a memory unit configured to store machine-readable instructions and a processor operably connected with the memory unit, the processor obtaining the machine-readable instructions from the memory unit, and being configured by the machine-readable instructions to receive audio-visual data of the environment from an image and audio acquisition device using a Mixed Reality (MR) based Head Mounted Device (HMD), process the visual data and detecting objects within a captured scene, determine 6DOF pose estimation, RGB values, depth values, image spectroscopy values and thermal imaging values for identifying one or more characteristics of each of the detected objects using the HMD, display the detected objects along with the identified one or more characteristics of each of the detected objects using the HMD, thereby enabling a user to develop an understanding of the detected objects, process the audio data for recognizing the sound as a machine noise or a spoken language, and assist the user in performing specific tasks based on the processed visual data, audio data and/or the thermal imaging values.
In accordance with an embodiment the present invention, the sound is recognised as the machine noise. Further, the processor is configured to identify a machine and a working condition of the machine based on the sound received, the visual data and/or the thermal imaging values and guide the user with next steps towards handling and/or debugging the machine in a Mixed Reality space, thereby assisting the user in performing specific tasks.
In accordance with an embodiment the present invention, the sound is recognised as the spoken language. Further, the processor is configured to identify a language based on the sound received and translate the recognised language to the user understandable interpretation using the HMD, thereby assisting the user in performing specific tasks.
In accordance with an embodiment the present invention, the detected objects are selected from previously unseen objects and/or previously seen objects.
In accordance with an embodiment the present invention, the previously unseen objects and/or previously seen objects include industrial machines, industrial tools, daily life objects, food items and beverages.
In accordance with an embodiment the present invention, the processor is further configured to register the detected objects with the help of a feedback of the user, in case the detected objects are previously unseen objects.
In accordance with an embodiment the present invention, the processor is further configured to register the previously unseen objects by prompting the user to provide information about the previously unseen object that is detected, receiving the information from the user as a feedback and registering the previously unseen object as a previously seen object based on the information received for future reference.
In accordance with an embodiment the present invention, the information may be in the form of audio, video, hand gestures, written document and/or a combination thereof.
In accordance with an embodiment the present invention, the one or more characteristics of each of the detected objects, are selected from a group comprising material used, properties of the material, usage of the detected object, estimated life of the detected object, information regarding product lifecycle and information regarding a present state of the detected object including a temperature, quality of the material used, maintenance status and possible chances of failures of the detected object.
In accordance with an embodiment the present invention, the working condition of the machine is selected from a normal condition and a faulty condition.
In accordance with an embodiment the present invention, the HMD comprises one or more depth cameras for capturing the visual data, one or more microphones for capturing the audio data, a voice recognition module, a processing module, an RGB module, a spectroscopy module, a thermal imaging module and a storage module having prestored data.
In accordance with an embodiment the present invention, the prestored data in the data repository includes a plurality of previously seen detected objects, a plurality of sound samples corresponding to multiple languages, sounds and thermal imaging values of several industrial machines and tools in respective normal working conditions and respective faulty working conditions.
According to a third aspect of the present invention, a system for detecting objects and assisting users in performing specific tasks using visual, acoustic and cognitive cues of real and virtual objects within mixed reality environment is provided. The system comprises a Mixed Reality (MR) based Head Mounted Device (HMD) having one or more depth cameras for capturing the visual data, one or more microphones for capturing the audio data, a voice recognition module, an RGB module, a spectroscopy module and a thermal imaging module, a processing module and a storage module having prestored data. Further, the processing module is configured to receive audio-visual data of the environment from an image and audio acquisition device using a Mixed Reality (MR) based Head Mounted Device (HMD), process the visual data and detecting objects within a captured scene, determine 6DOF pose estimation, RGB values, depth values, image spectroscopy values and thermal imaging values for identifying one or more characteristics of each of the detected objects using the HMD, display the detected objects along with the identified one or more characteristics of each of the detected objects using the HMD, thereby enabling a user to develop an understanding of the detected objects, process the audio data for recognizing the sound as a machine noise or a spoken language, and assist the user in performing specific tasks based on the processed visual data, audio data and/or the thermal imaging values.
In accordance with an embodiment the present invention, the sound is recognised as the machine noise. Further, the processing module is configured to identify a machine and a working condition of the machine based on the sound received, the visual data and/or the thermal imaging values and guide the user with next steps towards handling and/or debugging the machine in a Mixed Reality space, thereby assisting the user in performing specific tasks.
In accordance with an embodiment the present invention, the sound is recognised as the spoken language. Further, the processing module is configured to identify a language based on the sound received and translate the recognised language to the user understandable interpretation using the HMD, thereby assisting the user in performing specific tasks.
In accordance with an embodiment the present invention, the detected objects are selected from previously unseen objects and/or previously seen objects.
In accordance with an embodiment the present invention, the previously unseen objects and/or previously seen objects include industrial machines, industrial tools, daily life objects, food items and beverages.
In accordance with an embodiment the present invention, the processing module is further configured to register the detected objects with the help of a feedback of a user, in case the detected objects are previously unseen objects.
In accordance with an embodiment the present invention, the processing module is further configured to register the previously unseen objects by prompting the user to provide information about the previously unseen object that is detected, receiving the information from the user as a feedback and registering the previously unseen object as a previously seen object based on the information received for future reference.
In accordance with an embodiment the present invention, the information may be in the form of audio, video, hand gestures, written document and/or a combination thereof.
In accordance with an embodiment the present invention, the one or more characteristics of each of the detected objects, are selected from a group comprising material used, properties of the material, usage of the detected object, estimated life of the detected object, information regarding product lifecycle and information regarding a present state of the detected object including a temperature, quality of the material used, maintenance status and possible chances of failures of the detected object.
In accordance with an embodiment the present invention, the working condition of the machine is selected from a normal condition and a faulty condition.
In accordance with an embodiment the present invention, the prestored data in the data repository includes a plurality of previously seen detected objects, a plurality of sound samples corresponding to multiple languages, sounds and the thermal imaging values of several industrial machines and tools in respective normal working conditions and respective faulty working conditions.
According to fourth aspect of the present invention, a method for detecting objects and assisting users in performing specific tasks using visual, acoustic and cognitive cues of real and virtual objects within mixed reality environment. the method comprises the steps of receiving audio-visual data of the environment from an image and audio acquisition device using a Mixed Reality (MR) based Head Mounted Device (HMD), processing the visual data and detecting objects within a captured scene, determining 6DOF pose estimation, RGB values, depth values, image spectroscopy values and thermal imaging values for identifying one or more characteristics of each of the detected objects and displaying the detected objects along with the identified one or more characteristics of each of the detected objects using the HMD in a mixed reality space, thereby enabling a user to develop an understanding of the detected objects and perform a specific task.
In accordance with an embodiment the present invention, the detected objects are selected from previously unseen objects and/or previously seen objects.
In accordance with an embodiment the present invention, the previously unseen objects and/or previously seen objects include industrial machines, industrial tools, daily life objects, food items and beverages.
In accordance with an embodiment the present invention, further comprising a step of registering the detected objects with the help of a feedback of a user, in case the detected objects are previously unseen objects.
In accordance with an embodiment the present invention, the step of registering the detected objects with the help of the feedback of the user, comprises the steps of prompting the user to provide information about the previously unseen object that is detected, receiving the information from the user as a feedback and registering the previously unseen object as a previously seen object based on the information received for future reference.
In accordance with an embodiment the present invention, the information is in the form of audio, video, hand gestures, written document and/or a combination thereof.
In accordance with an embodiment the present invention, the one or more characteristics of each of the detected objects, are selected from a group comprising material used, properties of the material, usage of the detected object, estimated life of the detected object, information regarding product lifecycle and information regarding a present state of the detected object including a temperature, quality of the material used, maintenance status and possible chances of failures of the detected object.
In accordance with an embodiment the present invention, the HMD comprises one or more depth cameras for capturing the visual data, one or more microphones for capturing the audio data, a voice recognition module, a processing module, an RGB module, a spectroscopy module, a thermal imaging module and a storage module having prestored data.
BRIEF DESCRIPTION OF THE DRAWINGS
So that the manner in which the above recited features of the present invention can be understood in detail, a more particular to the description of the invention, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only typical embodiments of this invention and are therefore not to be considered limiting of its scope, the invention may admit to other equally effective embodiments.
These and other features, benefits and advantages of the present invention will become apparent by reference to the following text figure, with like reference numbers referring to like structures across the views, wherein:
Fig. 1 illustrates an exemplary environment of computing devices to which the various embodiments described herein may be implemented, in accordance with an embodiment of the present invention;
Fig. 2 illustrates a method for detecting objects and assisting users in performing specific tasks using visual, acoustic and cognitive cues of real and virtual objects within mixed reality environment, in accordance with an embodiment of the present invention;
Fig. 3 illustrates an information flow diagram of receiving and processing audio-visual data, in accordance with an embodiment of the present invention;
Fig. 4 illustrates an information flow diagram of displaying the detected objects along with the identified characteristics, in accordance with an embodiment of the present invention;
Fig. 5 illustrates an information flow diagram of identifying a machine and a working condition of the machine based on the sound received and guide a user with next steps towards handling and/or debugging the machine while displaying the detected objects along with the identified characteristics, in accordance with an embodiment of the present invention;
Fig. 6 illustrates an information flow diagram of identifying a language from the sound and translate the recognised language while displaying the detected objects along with the identified characteristics, in accordance with an embodiment of the present invention; and
Fig. 7 illustrates a system to detect objects and assisting users in performing specific tasks using visual, acoustic and cognitive cues of real and virtual objects within mixed reality environment, in accordance with an embodiment of the present invention.
DETAILED DESCRIPTION OF DRAWINGS
While the present invention is described herein by way of example using embodiments and illustrative drawings, those skilled in the art will recognize that the invention is not limited to the embodiments of drawing or drawings described and are not intended to represent the scale of the various components. Further, some components that may form a part of the invention may not be illustrated in certain figures, for ease of illustration, and such omissions do not limit the embodiments outlined in any way. It should be understood that the drawings and detailed description thereto are not intended to limit the invention to the particular form disclosed, but on the contrary, the invention is to cover all modifications, equivalents, and alternatives falling within the scope of the present invention as defined by the appended claims. As used throughout this description, the word "may" is used in a permissive sense (i.e. meaning having the potential to), rather than the mandatory sense, (i.e. meaning must). Further, the words "a" or "an" mean "at least one” and the word “plurality” means “one or more” unless otherwise mentioned. Furthermore, the terminology and phraseology used herein is solely used for descriptive purposes and should not be construed as limiting in scope. Language such as "including," "comprising," "having," "containing," or "involving," and variations thereof, is intended to be broad and encompass the subject matter listed thereafter, equivalents, and additional subject matter not recited, and is not intended to exclude other additives, components, integers or steps. Likewise, the term "comprising" is considered synonymous with the terms "including" or "containing" for applicable legal purposes. Any discussion of documents, acts, materials, devices, articles and the like is included in the specification solely for the purpose of providing a context for the present invention. It is not suggested or represented that any or all of these matters form part of the prior art base or were common general knowledge in the field relevant to the present invention.
In this disclosure, whenever a composition or an element or a group of elements is preceded with the transitional phrase “comprising”, it is understood that we also contemplate the same composition, element or group of elements with transitional phrases “consisting of”, “consisting”, “selected from the group of consisting of, “including”, or “is” preceding the recitation of the composition, element or group of elements and vice versa.
The present invention is described hereinafter by various embodiments with reference to the accompanying drawings, wherein reference numerals used in the accompanying drawing correspond to the like elements throughout the description. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiment set forth herein. Rather, the embodiment is provided so that this disclosure will be thorough and complete and will fully convey the scope of the invention to those skilled in the art. In the following detailed description, numeric values and ranges are provided for various aspects of the implementations described. These values and ranges are to be treated as examples only and are not intended to limit the scope of the claims. In addition, a number of materials are identified as suitable for various facets of the implementations. These materials are to be treated as exemplary and are not intended to limit the scope of the invention.
Figure 1 illustrates an exemplary environment of computing devices to which the various embodiments described herein may be implemented, in accordance with an embodiment of the present invention.
As shown in figure 1, the environment (100) comprises a display means connected with a computer system (104). The display means may be, but not limited to, a Head Mounted Device (HMD) (102) or a computing means having a display screen such as a laptop, a desktop computer, mobile computer, handheld computer, a holographic display operated by users. Preferably, the display means is the HMD (102). The HMD (102) may be envisaged to include capabilities of generating an augmented reality (AR) environment, mixed reality (MR) environment and a virtual reality (VR) environment in a single device. The HMD (102) is envisaged to include, but not limited to, a number of electromagnetic radiation sensors which encompasses all kinds of sensor device able to detect electromagnetic radiation such as visible light and infra-red (IR) radiation.
The electromagnetic radiation sensors may be used to gather and track spatial data of the real world environment as well as to track eye movement and hand gesture of a user so as to update the 3D generated object in VR, AR and/or MR. The electromagnetic radiation sensors may have an IR projector, an IR camera, an RGB camera, an RGB-D camera, a microphone. RGB camera captures coloured imagery of the real world environment. The IR projector and IR camera together capture depth data of the real world environment using any one or more of Time of Flight based and passive stereoscopic depth imaging.
The microphone is envisaged to receive and record audio. The HMD (102) may further comprise visors of may be partially or fully reflective surface. In other words, the visors may have a variable transparency. The visors are used to view human or object in virtual reality, mixed reality or augmented reality. The HMD (102) may further include cooling vent to ensure that internal circuitry and devices of the HMD (102) are provided with enough amount of air for convection cooling. A wire outlet may be provided to allow the connecting wires and chords to connect to various components such as power supply, computational and control units and data acquisition devices.
Further, the HMD (102) is envisaged to include extendable bands and straps and a strap lock for securing the HMD (102) positioned on the head. The HMD (102) is envisaged to include one or more display sources which may be LCD, LED or TFT screens with respective drivers. The HMD (102) may have a driver board including a part of computational software and hardware needed to run devices provided with the HMD (102). The HMD (102) may further include power supply unit for receiving AC power supply. Moreover, the HMD (102) may include, HDMI output to allow data to be transferred. A Universal serial bus (USB) connector to allow data and power transfer. The HMD (102) is also envisaged to include a plurality of electronic components for example, a graphics processor unit (GPU) and a power source provide electrical power to the HMD (102).
A Graphics Processing Unit (GPU) is a single-chip processor primarily used to manage and boost the performance of video and graphics such as 2-D or 3-D graphics, texture mapping, hardware overlays etc. The GPU may be selected from, but not limited to, NVIDIA, AMD, Intel and ARM for real time 3D imaging. The power source may be inbuilt inside the HMD (102). A plurality of indicators such as LED to indicate various parameters such as battery level or connection disconnection may be included in the HMD (102). The indications may be colour coded for differentiation and distinctiveness.
In accordance with an embodiment, the computer system (104) connected with the HMD (102), may be encased inside the HMD (102) itself. The computer system (104) is comprises a memory unit (1044) configured to store machine-readable instructions. The machine-readable instructions may be loaded into the memory unit (1042) from a non-transitory machine-readable medium, such as, but not limited to, CD-ROMs, DVD-ROMs and Flash Drives. Alternately, the machine-readable instructions may be loaded in a form of a computer software program into the memory unit (1042). The memory unit (1042) in that manner may be selected from a group comprising EPROM, EEPROM and Flash memory. Further, the computer system (104) includes a processor (1044) operably connected with the memory unit (1042). In various embodiments, the processor (1044) is one of, but not limited to, a general-purpose processor, an application specific integrated circuit (ASIC) and a field-programmable gate array (FPGA).
Figure 2 illustrates a method (200) for detecting objects (3110) and assisting users in performing specific tasks using visual, acoustic and cognitive cues of real and virtual objects within mixed reality environment, in accordance with an embodiment of the present invention. The method (200) begins at step 210 where the processor (1042) of the HMD (102) is configured to receive audio-visual data of the environment from the image and audio acquisition device. The HMD (102) envisaged to include one or more depth cameras for capturing the visual data, one or more microphones for capturing the audio data, a voice recognition module, a processing module, an RGB module, a spectroscopy module, a thermal imaging module and a storage module having prestored data. In one embodiment, the HMD (102) may include X-ray emitters and LED emitters forming a grid. In another embodiment, the X-ray emitters and the LED emitters may be provided as plug & play modules. The prestored data in the data repository may include a plurality of previously seen detected objects (3110), a plurality of sound samples corresponding to multiple languages and sounds, a plurality of thermal imaging values of several industrial machines and tools in respective normal working conditions and respective faulty working conditions.
For example: The HMD (102) comes across one or more objects in an exemplary scenario as shown in figure 3. The scenario may include objects such as a car, human, a Saw etc. Then, with the help of image and audio acquisition device such as one or more depth cameras one or more microphones, the processor (1042) may receive an image of a saw, an image (visual and thermal) and audio of a live car, live image of a person along with audio comprising of that person’s language. The language may be a foreign language, or a language known to the user. The same has been illustrated in figure 3.
Then at step 220, the processor (1042) is configured to process the visual data and detect objects (3110) within a captured scene. The HMD (102) on its own may detect objects (3110) with in the captured scene or the user may be able to select an object in the captured scene using the HMD (102) though the inputs using may be, but not limited to, hand gestures, pointing finger/fingers, Joy sticks, touch based input, audio input. the processor (1042) may then distinguish the detected objects as being a living (i.e. human, animal or plant) or any non-living entity. Referring to the above-mentioned example shown in figure 3, the processor (1042) distinguishes the saw and the car as non-living entities and the human as a living entity.
Onwards, at step 230, the processor (1042) is further configured to determine 6DOF pose estimation, RGB values, depth values, image spectroscopy values and thermal imaging values for identifying one or more characteristics (3220) of each of the detected objects (3110). The detected objects (3110) may be selected from previously unseen objects (3110) and/or previously seen objects (3110). The previously unseen objects (3110) and/or previously seen objects (3110) may include industrial machines, industrial tools, daily life objects (3110), food items and beverages. Further, in case the detected objects (3110) are previously unseen objects (3110) the processor (1042) may be configured to register the detected objects (3110) with the help of a feedback of a user.
The processor (1042), in order to register the detected objects (3110) with the help of the feedback of the user, prompt the user to provide information about the previously unseen object that is detected. Further, the processor (1042) may receive the information from the user as a feedback. Then, the processor (1042) may register the previously unseen object as a previously seen object based on the information received for future reference. The information may be in the form of audio, video, hand gestures, written document and/or a combination thereof. For example, the HMD (102) captures a scene having a dog in the captured scene. For instance, the HMD (102) data base is not aware of the dog or its characteristics (3220). In such cases, the user may provide feedback to the HMD (102) in the form of audio, video, hand gestures, written document and/or a combination thereof about the dog. The processor (1042) may register the previously unseen object as a previously seen object based on the information received for future reference.
Further, the one or more characteristics (3220) of each of the detected objects (3110), may be selected from a group comprising material used, properties of the material, usage of the detected object, estimated life span of the detected object, information regarding product lifecycle and information regarding a present state of the detected object including a temperature, quality of the material used, maintenance status and possible chances of failures of the detected object.
Afterwards, at step 240, the processor (1042) is configured to display the detected objects (3110) along with the identified one or more characteristics (3220) of each of the detected objects (3110) using the HMD (102). This way the HMD (102) may enable the user to develop an understanding of the detected objects (3110). Extending the example mentioned above (now referring to figure 4), the HMD (102) may display the saw along with its characteristic is Mixed reality. The HMD (102) may detect the object present in reality in a scene in the captured scene as the Saw. Further, the HMD (102) display the Saw in Mixed reality along with characteristics (3220) of the Saw such as the blade of the Saw is made of steel, the blade is a used-up blade, wooden handle, its remaining life. The same has been illustrated in figure 4. The information about the object detected along with the characteristic detected by HMD (102) may be available to the user in may be, but not limited to, audio, video or image.
Next, at step 250, the processor (1042) is configured to process the audio data for recognizing the sound received, as a machine noise (3320) or a spoken language (3410). At this step, the processor (1042) may compare the received audio data with pre-stored audio data. The pre-stored audio data may include a plurality of sound samples corresponding to multiple spoken languages and sounds of several industrial machines and tools in respective normal working conditions and respective faulty working conditions. By comparing the audio data with normal data in database, the processor (1042) may differentiate among the different type of audios of humans and machines received by the processor (1042).
Afterwards, at step 260, the processor (1042) is configured to assist the user in performing specific tasks based on the processed visual data, audio data and/or the thermal imaging values. In accordance with an embodiment of the present invention, if the sound is recognized to be the machine noise (3320) the processor (1042) is configured to identify a machine and a working condition of the machine based on the sound received, the visual data and the thermal imaging values. The working condition may be a normal working condition or a faulty working condition. Apart from the pre-stored audio data, the working condition of the machine can be identified by visual cues like rust, visual faults, cracks, dents, defects, parts misalignments, unexpected movements of parts, vibration analysis, visual quality and dimensional analysis of output product etc. Also, working condition of machine can be inferred from thermal cues such as overheating of specific parts. In one embodiment, the processor (1042) may detect the dents and other defects using the x-ray emitters (that may scan external as well as internal defects in a material) or LED emitters forming a grid of coloured LED lights (where any break/discontinuity in the grid may be indicative of the defect in surface/material). In one embodiment, an external dockable or plug & play sensor/projector for pattern visualization may also be used for dent and/or defect detection. The identified defects then indicate the working condition being faulty.
In case the detected objected has a normal working condition, then the processor (1042) may simply display the one or more characteristics of step 240 and a message displaying “the detected object working normally” in the Mixed reality space (322). In case the detected object is recognized to have a faulty working condition, then accordingly, the HMD (102) may guide the user with next steps towards handling and/or debugging the machine in a Mixed reality space (322). In this way, the HMD (102) may assist the user in performing specific tasks. Continuing the example mentioned above, where the HMD (102) through image and audio acquisition device captures a scene having a car. The HMD (102), by comparing the captured scene with the pre-stored data, may display the detected car along with its characteristics (3220) in the Mixed reality.
Further, the HMD (102) may be able to detect any fault in working/ functioning of the car through the received image, audio and/or thermal imaging values. As shown in figure 5, the HMD (102) may be able to display the fault in Mixed reality and display a video or simulation in the Mixed reality space (322), depicting the steps towards handling and/or debugging the machine. Continuing from the example, for instance, the HMD (102) has detected the fault being related to battery of the car and concludes that the battery of the car needs maintenance/ replacement. Further, the HMD (102) may display a video/simulation in Mixed reality depicting the procedure showing maintenance procedure and/or removal or changing of the battery. The video/simulation may be superimposed on the car itself or may be played in Mixed reality space (322). Further, the HMD (102) may be able to display the characteristic of the car such as name of car, its model year, its history etc. The information about the object detected along with the characteristic detected by HMD (102) may be available to the user in may be, but not limited to, audio, video, image.
In accordance with an embodiment of the present invention, if (at step 250) the processor (1042) recognizes the sound to be the spoken language, then the processor (1042) is configured to identify the spoken language after comparison with pre-stored data. On the basis of received sound, the processor (1042) is configured to translate the recognised language to a user understandable interpretation (3420) using the HMD (102). It is envisaged that the user may pre-select a preferred language such as Hindi, English etc. in which he/wants to see and/or listen a translation. In this way the HMD (102) assists the user in performing specific tasks by removing the language barrier. This would be understood more clearly with the help of an example illustrated in figure 6, the HMD (102) through the image and audio acquisition device captures a human. The human may be speaking a language (3410) which is foreign to the user, for example, the human says “Bonjour!”. The sound is first identified to be a spoken language (3410) and then the language (3410) is recognised as “French” by the processor (1042). The user is envisaged to have selected “English” as the preferred understandable language. Then, the HMD (102) may translate the recognised language (3410) to the user understandable interpretation (3420) i.e. “Good Day!”. The information about the object detected along with the characteristic detected by HMD (102) may be available to the user in may be, but not limited to, audio, video, image in the MR space (322). In this way, the present invention assists the user in in multiple tasks related to living and non-living entities.
According to another aspect of the invention, there is provided a system (400) for detecting objects (3110) and assisting users in performing specific tasks using visual, acoustic and cognitive cues of real and virtual objects within mixed reality environment within mixed reality environment is provided. As illustrated in figure 7, the system (400) comprises a Mixed reality (MR) based Head Mounted Device (HMD (102)) having one or more depth cameras (402) for capturing the visual data, one or more microphones (404) for capturing the audio data, a voice recognition module (406), an RGB module (408), a spectroscopy module (410) and a thermal imaging module (412). The system (400) further comprises a processing module (414). The processing module (414) is configured to receive audio-visual data of the environment from the image acquisition device using a Mixed reality (MR) based Head Mounted Device (HMD (102)). Further, the processing module (414) is configured to process the visual data and detecting objects (3110) within a captured scene. The processing module (414) further configured to determine 6DOF pose estimation, RGB values, depth values, image spectroscopy values and thermal imaging values for identifying one or more characteristics (3220) of each of the detected objects (3110) using the HMD (102). The detected objects (3110) are selected from previously unseen objects (3110) and/or previously seen objects (3110). The previously unseen objects (3110) and/or previously seen objects (3110) include industrial machines, industrial tools, daily life objects (3110), food items and beverages. The processing module (414) is further configured to register the detected objects (3110) with the help of a feedback of a user, in case the detected objects (3110) are previously unseen objects (3110). The processing module (414) is further configured to register the previously unseen objects (3110). The processing module (414) prompts the user to provide information about the previously unseen object that is detected. Further the processing module (414) receives the information from the user as a feedback and register the previously unseen object as a previously seen object based on the information received for future reference. The information may be in the form of audio, video, hand gestures, written document and/or a combination thereof.
Further, the processing module (414) is configured to display the detected objects (3110) along with the identified one or more characteristics (3220) of each of the detected objects (3110) using the HMD (102), thereby enabling a user to develop an understanding of the detected objects (3110). The one or more characteristics (3220) of each of the detected objects (3110), are selected from a group comprising material used, properties of the material, usage of the detected object, estimated life of the detected object, information regarding product lifecycle and information regarding a present state of the detected object including a temperature, quality of the material used, maintenance status and possible chances of failures of the detected object.
Further, the processing module (414) is configured to process the audio data for recognizing the sound as a machine noise (3320) or a spoken language (3410). The processing module (414) further configured to assist the user in performing specific tasks based on the processed visual data, audio data and/or the thermal imaging values. In accordance with an embodiment of the present invention, in case the sound is recognized to be the machine noise (3320), the processing module (414) is configured to identify a machine and a working condition of the machine based on the sound received, visual data and the thermal imaging values. The working condition may be the normal working condition or the faulty working condition. Apart from the pre-stored audio data, the working condition of the machine can be identified by visual cues like rust, visual faults, cracks, dents, defects, parts misalignments, unexpected movements of parts, vibration analysis, visual quality and dimensional analysis of output product etc. Also, working condition of machine can be inferred from thermal cues such as overheating of specific parts. Accordingly guide a user with next steps (3310) towards handling and/or debugging the machine in a Mixed reality space (322), thereby assisting the user in performing specific tasks.
In accordance with an embodiment of the present invention, if the processing module (414) recognizes the sound to be the spoken language, then the processing module (414) is configured to identify the language based on the sound received and translate the recognised language to a user understandable interpretation using the HMD (102), thereby assisting the user in performing specific tasks.
The system (400) further comprises a storage module (416) having prestored data. The prestored data in the data repository includes a plurality of previously seen detected objects, a plurality of sound samples corresponding to multiple languages, sounds, a plurality of thermal imaging values of several industrial machines and tools in respective normal working conditions and respective faulty working conditions.
In accordance with another embodiment of the present invention, there is provided a method (not shown) for detecting objects (3110) and assisting users in performing specific tasks using visual, acoustic and cognitive cues of real and virtual objects within mixed reality environment. The method may be implemented using the computer system (104) as well as the system (400). The method begins by receiving the audio-visual data of the environment from the image acquisition device using a Mixed reality (MR) based Head Mounted Device (HMD (102)). The HMD (102) is envisaged to comprise the one or more depth cameras (402) for capturing the visual data, one or more microphones (404) for capturing the audio data, a voice recognition module (406), a processing module (414), an RGB module (408), a spectroscopy module (410), a thermal imaging module (412) and a storage module (416) having prestored data.
Further, the processor (1042) may process the visual data and detect objects (3110) within a captured scene. The processor (1042) may be further configured to determine 6DOF pose estimation, RGB values, depth values, image spectroscopy values and thermal imaging values for identifying one or more characteristics (3220) of each of the detected objects (3110).
The detected objects (3110) are selected from previously unseen objects (3110) and/or previously seen objects (3110). The previously unseen objects (3110) and/or previously seen objects (3110) include industrial machines, industrial tools, daily life objects (3110), food items and beverages. The processor (1042) may be further configured to register the detected objects (3110) with the help of a feedback of a user, in case the detected objects (3110) are previously unseen objects (3110). The processor (1042), in order to register the detected objects (3110) with the help of the feedback of the user, prompt the user to provide information about the previously unseen object that is detected. Further, the processor (1042) may receive the information from the user as a feedback. Then, the processor (1042) may register the previously unseen object as a previously seen object based on the information received for future reference. The information may be in the form of audio, video, hand gestures, written document and/or a combination thereof.
Further, the processor (1042) may be further configured to display the detected objects (3110) along with the identified one or more characteristics (3220) of each of the detected objects (3110) using the HMD (102) in a Mixed reality space (322), thereby enabling the user to develop an understanding of the detected objects (3110) and perform a specific task. The one or more characteristics (3220) of each of the detected objects (3110), may be selected from a group comprising material used, properties of the material, usage of the detected object, estimated life of the detected object, information regarding product lifecycle and information regarding a present state of the detected object including a temperature, quality of the material used, maintenance status and possible chances of failures of the detected object. The same has been illustrated in figure 4. The information about the object detected along with the characteristic detected by HMD (102) may be available to the user in, may be, but not limited to, audio, video, image.
In accordance with an embodiment of the present invention, the above-mentioned methods, the system (400) and the computer system (104) may be implemented in a number of applications. For example: the system (400) and the computer system (104)_may be used to assist user in developing an understanding of the food item being in the field of view, by showing the ingredients of the food item, sources of the ingredients, no. of calories, nearby or best to have the food item etc. In another example, the system (400) and the computer system (104)_may be used to assist user in developing an understanding of a motherboard or any electronic component and it peripherals etc. However, it will be appreciated by a person skilled in the art, that methods, the system (400) and the computer system (104)_may be implemented in use cases other than the those mentioned in the above description without departing from the scope of the present invention.
The present invention offers a number of advantages. Firstly, it is capable of analysing the audio, visual and thermal information around us and utilise these to assist individuals towards performing industry wide tasks and operations. The cognitive acoustic technology also acts as an interpreter which aims to bridge the linguistic boundaries between humans all around the world. This ensures that individuals don’t feel constrained with their language barriers towards understanding concepts and sharing their thoughts with other people. The other most noticeable feature of the invention lies in its ability to decode the type of event using just the audio modality inputted over the mic in HMD. This feature extends its use case to industry wide purpose in the automation, maintenance sectors where on-site workers irrespective of their expertise in those fields can be assisted by the invention towards detecting various properties of machines like whether the machine is functioning properly or not, which motor or gear in the machines needs to be repaired. The addition of audio features will enhance the accuracy of such devices aimed towards assisting humans towards enhanced warehouse maintenance of devices in the factories. The present invention may also be implemented in the field of metrology for measurement, quality control, calibration (of measuring instruments) or for any other purpose.
It also enables this bridge between the linguistic constraints of humans due to which often they fall short of expressing and communicating their ideas to others. The proposed invention aims to understand humans as well as machines using intelligence imparted to audio modality using artificial intelligence. Further this invention can be used by people associated with field-work in the domain of mechanical automation and maintenance by training the artificial intelligence algorithm with a dataset having correspondence between the faults in the machine and the sound clips for each corresponding case.
The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments explained herein above. Rather, the embodiment is provided so that this disclosure will be thorough and complete and will fully convey the scope of the invention to those skilled in the art.
The one or more RGB-D sensors may be a stereo based depth camera mounted on a Mixed Reality (AR) based device such as, but not limited to, an AR headset. Additionally, the one or more RGB-D sensors are envisaged to include capabilities of capturing images and determining RGB, depth and 6-D pose from the images. The one or more RGB-D sensors may further include, but not limited to, color camera (for determining RGB values, an IR camera (for taking depth data and microphone array (for speech recognition).
Additionally, examples of the computing device may include, but not limited to, a wearable device such as a Head Mounted Device (HMD) or smart eyewear glasses. The computing device is envisaged to include computing capabilities such as a memory unit configured to store machine readable instructions. The machine readable instructions may be loaded into the memory unit from a non-transitory machine-readable medium, such as, but not limited to, CD-ROMs, DVD-ROMs and Flash Drives. Alternately, the machine readable instructions may be loaded in a form of a computer software program into the memory unit. The memory unit in that manner may be selected from a group comprising EPROM, EEPROM and Flash memory.
The HMD may be provided with user interface to enable the user to interact with the computing device as well as operate the other components of the system (400). Exemplary user interface includes, but not limited to, one or more buttons, a gesture interface, a knob, an audio interface, and a touch-based interface, and the like. The interaction with the computer system and other connected components may be, performed through pressing the button, hovering the hand and/or other body parts, providing audio input and/or tactile input through one or more fingers.
Further, one would appreciate that a communication network may also be used in the system. The communication network can be a short-range communication network and/or a long-range communication network, wire or wireless communication network. The communication interface includes, but not limited to, a serial communication interface, a parallel communication interface or a combination thereof.
In general, the word “module,” as used herein, refers to logic embodied in hardware or firmware, or to a collection of software instructions, written in a programming language, such as, for example, Java, C, Python or assembly. One or more software instructions in the modules may be embedded in firmware, such as an EPROM. It will be appreciated that modules may comprised connected logic units, such as gates and flip-flops, and may comprise programmable units, such as programmable gate arrays or processors. The modules described herein may be implemented as either software and/or hardware modules and may be stored in any type of computer-readable medium or other computer storage device.
Further, while one or more operations have been described as being performed by or otherwise related to certain modules, devices or entities, the operations may be performed by or otherwise related to any module, device or entity. As such, any function or operation that has been described as being performed by a module could alternatively be performed by a different server, by the cloud computing platform, or a combination thereof. It should be understood that the techniques of the present disclosure might be implemented using a variety of technologies. For example, the methods described herein may be implemented by a series of computer executable instructions residing on a suitable computer readable medium. Suitable computer readable media may include volatile (e.g. RAM) and/or non-volatile (e.g. ROM, disk) memory, carrier waves and transmission media. Exemplary carrier waves may take the form of electrical, electromagnetic or optical signals conveying digital data steams along a local network or a publicly accessible network such as the Internet.
It should also be understood that, unless specifically stated otherwise as apparent from the following discussion, it is appreciated that throughout the description, discussions utilizing terms such as "controlling" or "obtaining" or "computing" or "storing" or "receiving" or "determining" or the like, refer to the action and processes of a computer system, or similar electronic computing device, that processes and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.
Various modifications to these embodiments are apparent to those skilled in the art from the description and the accompanying drawings. The principles associated with the various embodiments described herein may be applied to other embodiments. Therefore, the description is not intended to be limited to the embodiments shown along with the accompanying drawings but is to be providing broadest scope of consistent with the principles and the novel and inventive features disclosed or suggested herein. Accordingly, the invention is anticipated to hold on to all other such alternatives, modifications, and variations that fall within the scope of the present invention and the appended claims.

Documents

Application Documents

#	Name	Date
1	201924044290-FORM-27 [10-04-2025(online)].pdf	2025-04-10
1	201924044290-IntimationOfGrant12-01-2024.pdf	2024-01-12
1	201924044290-STATEMENT OF UNDERTAKING (FORM 3) [01-11-2019(online)].pdf	2019-11-01
2	201924044290-FORM FOR STARTUP [01-11-2019(online)].pdf	2019-11-01
2	201924044290-FORM-15 [27-03-2025(online)].pdf	2025-03-27
2	201924044290-PatentCertificate12-01-2024.pdf	2024-01-12
3	201924044290-Annexure [23-12-2022(online)].pdf	2022-12-23
3	201924044290-FORM FOR STARTUP [01-11-2019(online)]-1.pdf	2019-11-01
3	201924044290-IntimationOfGrant12-01-2024.pdf	2024-01-12
4	201924044290-PatentCertificate12-01-2024.pdf	2024-01-12
4	201924044290-FORM FOR SMALL ENTITY(FORM-28) [01-11-2019(online)].pdf	2019-11-01
4	201924044290-FORM 13 [23-12-2022(online)].pdf	2022-12-23
5	201924044290-REQUEST FOR ADJOURNMENT OF HEARING UNDER RULE 129A [23-12-2022(online)].pdf	2022-12-23
5	201924044290-FORM 1 [01-11-2019(online)].pdf	2019-11-01
5	201924044290-Annexure [23-12-2022(online)].pdf	2022-12-23
6	201924044290-Written submissions and relevant documents [23-12-2022(online)].pdf	2022-12-23
6	201924044290-FORM 13 [23-12-2022(online)].pdf	2022-12-23
6	201924044290-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [01-11-2019(online)].pdf	2019-11-01
7	201924044290-US(14)-ExtendedHearingNotice-(HearingDate-16-12-2022).pdf	2022-11-30
7	201924044290-REQUEST FOR ADJOURNMENT OF HEARING UNDER RULE 129A [23-12-2022(online)].pdf	2022-12-23
7	201924044290-EVIDENCE FOR REGISTRATION UNDER SSI [01-11-2019(online)].pdf	2019-11-01
8	201924044290-EVIDENCE FOR REGISTRATION UNDER SSI [01-11-2019(online)]-1.pdf	2019-11-01
8	201924044290-US(14)-ExtendedHearingNotice-(HearingDate-30-11-2022).pdf	2022-11-17
8	201924044290-Written submissions and relevant documents [23-12-2022(online)].pdf	2022-12-23
9	201924044290-DRAWINGS [01-11-2019(online)].pdf	2019-11-01
9	201924044290-US(14)-ExtendedHearingNotice-(HearingDate-08-11-2022).pdf	2022-10-20
9	201924044290-US(14)-ExtendedHearingNotice-(HearingDate-16-12-2022).pdf	2022-11-30
10	201924044290-Covering Letter [20-07-2022(online)].pdf	2022-07-20
10	201924044290-DECLARATION OF INVENTORSHIP (FORM 5) [01-11-2019(online)].pdf	2019-11-01
10	201924044290-US(14)-ExtendedHearingNotice-(HearingDate-30-11-2022).pdf	2022-11-17
11	201924044290-COMPLETE SPECIFICATION [01-11-2019(online)].pdf	2019-11-01
11	201924044290-PETITION u-r 6(6) [20-07-2022(online)].pdf	2022-07-20
11	201924044290-US(14)-ExtendedHearingNotice-(HearingDate-08-11-2022).pdf	2022-10-20
12	201924044290-Covering Letter [20-07-2022(online)].pdf	2022-07-20
12	201924044290-Power of Authority [20-07-2022(online)].pdf	2022-07-20
12	Abstract1.jpg	2019-11-05
13	201924044290-US(14)-HearingNotice-(HearingDate-23-06-2022).pdf	2022-06-07
13	201924044290-REQUEST FOR CERTIFIED COPY [15-11-2019(online)].pdf	2019-11-15
13	201924044290-PETITION u-r 6(6) [20-07-2022(online)].pdf	2022-07-20
14	201924044290-CLAIMS [28-01-2022(online)].pdf	2022-01-28
14	201924044290-FORM-26 [16-11-2019(online)].pdf	2019-11-16
14	201924044290-Power of Authority [20-07-2022(online)].pdf	2022-07-20
15	201924044290-CORRESPONDENCE(IPO)-(CERTIFIED COPY)-(18-11-2019).pdf	2019-11-18
15	201924044290-FER_SER_REPLY [28-01-2022(online)].pdf	2022-01-28
15	201924044290-US(14)-HearingNotice-(HearingDate-23-06-2022).pdf	2022-06-07
16	201924044290-CLAIMS [28-01-2022(online)].pdf	2022-01-28
16	201924044290-OTHERS [28-01-2022(online)].pdf	2022-01-28
16	201924044290-STARTUP [11-10-2021(online)].pdf	2021-10-11
17	201924044290-FER.pdf	2021-11-03
17	201924044290-FER_SER_REPLY [28-01-2022(online)].pdf	2022-01-28
17	201924044290-FORM28 [11-10-2021(online)].pdf	2021-10-11
18	201924044290-FORM 18A [11-10-2021(online)].pdf	2021-10-11
18	201924044290-OTHERS [28-01-2022(online)].pdf	2022-01-28
19	201924044290-FER.pdf	2021-11-03
19	201924044290-FORM28 [11-10-2021(online)].pdf	2021-10-11
20	201924044290-FORM 18A [11-10-2021(online)].pdf	2021-10-11
20	201924044290-OTHERS [28-01-2022(online)].pdf	2022-01-28
20	201924044290-STARTUP [11-10-2021(online)].pdf	2021-10-11
21	201924044290-FORM28 [11-10-2021(online)].pdf	2021-10-11
21	201924044290-FER_SER_REPLY [28-01-2022(online)].pdf	2022-01-28
21	201924044290-CORRESPONDENCE(IPO)-(CERTIFIED COPY)-(18-11-2019).pdf	2019-11-18
22	201924044290-CLAIMS [28-01-2022(online)].pdf	2022-01-28
22	201924044290-FORM-26 [16-11-2019(online)].pdf	2019-11-16
22	201924044290-STARTUP [11-10-2021(online)].pdf	2021-10-11
23	201924044290-CORRESPONDENCE(IPO)-(CERTIFIED COPY)-(18-11-2019).pdf	2019-11-18
23	201924044290-REQUEST FOR CERTIFIED COPY [15-11-2019(online)].pdf	2019-11-15
23	201924044290-US(14)-HearingNotice-(HearingDate-23-06-2022).pdf	2022-06-07
24	Abstract1.jpg	2019-11-05
24	201924044290-FORM-26 [16-11-2019(online)].pdf	2019-11-16
24	201924044290-Power of Authority [20-07-2022(online)].pdf	2022-07-20
25	201924044290-REQUEST FOR CERTIFIED COPY [15-11-2019(online)].pdf	2019-11-15
25	201924044290-COMPLETE SPECIFICATION [01-11-2019(online)].pdf	2019-11-01
25	201924044290-PETITION u-r 6(6) [20-07-2022(online)].pdf	2022-07-20
26	201924044290-Covering Letter [20-07-2022(online)].pdf	2022-07-20
26	201924044290-DECLARATION OF INVENTORSHIP (FORM 5) [01-11-2019(online)].pdf	2019-11-01
26	Abstract1.jpg	2019-11-05
27	201924044290-COMPLETE SPECIFICATION [01-11-2019(online)].pdf	2019-11-01
27	201924044290-DRAWINGS [01-11-2019(online)].pdf	2019-11-01
27	201924044290-US(14)-ExtendedHearingNotice-(HearingDate-08-11-2022).pdf	2022-10-20
28	201924044290-US(14)-ExtendedHearingNotice-(HearingDate-30-11-2022).pdf	2022-11-17
28	201924044290-EVIDENCE FOR REGISTRATION UNDER SSI [01-11-2019(online)]-1.pdf	2019-11-01
28	201924044290-DECLARATION OF INVENTORSHIP (FORM 5) [01-11-2019(online)].pdf	2019-11-01
29	201924044290-DRAWINGS [01-11-2019(online)].pdf	2019-11-01
29	201924044290-EVIDENCE FOR REGISTRATION UNDER SSI [01-11-2019(online)].pdf	2019-11-01
29	201924044290-US(14)-ExtendedHearingNotice-(HearingDate-16-12-2022).pdf	2022-11-30
30	201924044290-EVIDENCE FOR REGISTRATION UNDER SSI [01-11-2019(online)]-1.pdf	2019-11-01
30	201924044290-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [01-11-2019(online)].pdf	2019-11-01
30	201924044290-Written submissions and relevant documents [23-12-2022(online)].pdf	2022-12-23
31	201924044290-EVIDENCE FOR REGISTRATION UNDER SSI [01-11-2019(online)].pdf	2019-11-01
31	201924044290-FORM 1 [01-11-2019(online)].pdf	2019-11-01
31	201924044290-REQUEST FOR ADJOURNMENT OF HEARING UNDER RULE 129A [23-12-2022(online)].pdf	2022-12-23
32	201924044290-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [01-11-2019(online)].pdf	2019-11-01
32	201924044290-FORM 13 [23-12-2022(online)].pdf	2022-12-23
32	201924044290-FORM FOR SMALL ENTITY(FORM-28) [01-11-2019(online)].pdf	2019-11-01
33	201924044290-Annexure [23-12-2022(online)].pdf	2022-12-23
33	201924044290-FORM 1 [01-11-2019(online)].pdf	2019-11-01
33	201924044290-FORM FOR STARTUP [01-11-2019(online)]-1.pdf	2019-11-01
34	201924044290-FORM FOR SMALL ENTITY(FORM-28) [01-11-2019(online)].pdf	2019-11-01
34	201924044290-FORM FOR STARTUP [01-11-2019(online)].pdf	2019-11-01
34	201924044290-PatentCertificate12-01-2024.pdf	2024-01-12
35	201924044290-FORM FOR STARTUP [01-11-2019(online)]-1.pdf	2019-11-01
35	201924044290-IntimationOfGrant12-01-2024.pdf	2024-01-12
36	201924044290-FORM FOR STARTUP [01-11-2019(online)].pdf	2019-11-01
36	201924044290-FORM-15 [27-03-2025(online)].pdf	2025-03-27
37	201924044290-FORM-27 [10-04-2025(online)].pdf	2025-04-10
37	201924044290-STATEMENT OF UNDERTAKING (FORM 3) [01-11-2019(online)].pdf	2019-11-01
38	201924044290-FORM FOR SMALL ENTITY [06-05-2025(online)].pdf	2025-05-06
39	201924044290-EVIDENCE FOR REGISTRATION UNDER SSI [06-05-2025(online)].pdf	2025-05-06
40	498815.pdf	2025-06-26
41	201924044290-RELEVANT DOCUMENTS [26-06-2025(online)].pdf	2025-06-26

Search Strategy

1	201924044290_searchE_03-11-2021.pdf

ERegister / Renewals

3rd: 16 Jan 2024

From 01/11/2021 - To 01/11/2022

4th: 16 Jan 2024

From 01/11/2022 - To 01/11/2023

5th: 26 Jun 2025

From 01/11/2023 - To 01/11/2024

6th: 26 Jun 2025

From 01/11/2024 - To 01/11/2025

7th: 26 Jun 2025

From 01/11/2025 - To 01/11/2026

8th: 26 Jun 2025

From 01/11/2026 - To 01/11/2027

9th: 26 Jun 2025

From 01/11/2027 - To 01/11/2028

10th: 26 Jun 2025

From 01/11/2028 - To 01/11/2029