Visual Assistance System And Method For Assisting Users For Performing

< Back

Visual Assistance System And Method For Assisting Users For Performing A Task

Abstract: Systems and methods for assisting user for performing a task are described. The system receives a user query based on which an object and an intent is determined. Using the object and the intent, the system captures annotated content provided by human agent while responding to the user query. The capturing may be performed by locating the markers inserted by the human agent into multimedia content, and then retrieving the annotated content marked by the markers. The system compares the annotated content with an existing response associated with the object and the intent stored in resolution database. Based on the comparison, the system identifies the difference and accordingly updates the resolution database by synchronizing the annotated content with the existing response in a sequence. Now, when the subsequent users raise the similar query, the system is now able to assist them for performing their task by providing the updated content. FIG. 1

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

23 August 2019

Publication Number

09/2021

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

ipr@akshipassociates.com

Parent Application

Applicants

WIPRO LIMITED

Doddakannelli, Sarjapur Road, Bangalore 560035, Karnataka, India.

Inventors

1. GAURAV SRIVASTAVA

Flat 402, Castle Town Apartment, AS Raju Nagar, Miyapur Hyderabad 500049, Telangana

Specification

TECHNICAL FIELD
The present disclosure relates in general to a visual assistance system. More particularly, but not exclusively, the present disclosure discloses a method and a system for assisting a user in real-time for performing a task.
BACKGROUND
Online guidance system for assisting users in performing their tasks is known from quite a long time. Wide range of companies providing services or selling products are implementing virtual assistance in resolving customer’s or user’s queries. It further includes visual assistance that eliminates the need of calling a person physically at home/office and make it more comfortable for the users in addressing their queries. It also addresses the issue of timely assistance by the agents for resolving a user’s query. Such assistance is provided in different modes like chatbot or human-like animated character which are capable of interacting with the users in understanding and solving their queries.
However, to provide such interactive assistance, a lot of time and effort is invested in background to keep such visual assistance system up to date. With the advent of technology, the services or products offered by the companies are changing every moment. If the timely update is not provided to the visual assistance system, it may fail to adequately assist the users in real-time for addressing their queries. Hence, it is a challenge to update the visual assistance system dynamically. To provide such update, another technical challenge faced is handling of enormous amount of data which is required for updating the visual assistance system.
The information disclosed in this background of the disclosure section is only for enhancement of understanding of the general background of the invention and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person skilled in the art.

SUMMARY
Accordingly, the present disclosure relates to a method of assisting users for performing a task. The method comprises receiving a user query for performing a task in real-time. The method further comprises a step of determining at least one object and at least one intent associated with the user query. Further, the method comprises the step of capturing annotated content provided by a human agent while responding to the user query based on the at least one object and the at least one intent associated with the user query. The method further comprises the step of comparing the annotated content with an existing response associated with the at least one object and the at least one intent stored in a resolution database. Further, the method comprises updating the resolution database with the annotated content based on the comparison. The annotated content is synchronized with the existing response in a sequence. Further, the method comprises the step of assisting subsequent users for performing the task associated with the user query by providing the existing response along with the annotated content. In one aspect, the aforementioned method for assisting the users in the real-time for performing the task may be performed by a processor using programmed instructions stored in a memory.
Further, the present disclosure relates to a visual assistance system for assisting users in performing a task. The visual assistance system comprises a processor and a memory communicatively coupled to the processor. The visual assistance system further comprises a resolution database stored in the memory. The memory further stores processor-executable instructions, which, on execution, causes the processor to perform one or more operations comprising receiving a user query for performing a task in real-time. Further, the visual assistance is configured to determine at least one object and at least one intent associated with the user query. The visual assistance system further captures annotated content provided by a human agent while responding to the user query based on the at least one object and the at least one intent associated with the user query. Further, the visual assistance system compares the annotated content with an existing response associated with the at least one object and the at least one intent stored in a resolution database. The visual assistance system is further configured to update the resolution database with the annotated content based on the comparison. The annotated content is synchronized with the existing response in a sequence. The visual assistance system is further

configured to assist subsequent users for performing the task associated with the user query by providing the existing response along with the annotated content.
The foregoing summary is illustrative only and is not intended to be in any way limiting. In addition to the illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features will become apparent by reference to the drawings and the following detailed description.
BRIEF DESCRIPTION OF THE ACCOMPANYING DRAWINGS
The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same numbers are used throughout the figures to reference like features and components. Some embodiments of system and/or methods in accordance with embodiments of the present subject matter are now described, by way of example only, and with reference to the accompanying figures, in which:
FIG. 1 shows an exemplary environment illustrating a visual assistance system for assisting users in performing a task in accordance with some embodiments of the present disclosure;
FIG. 2 shows a detailed block diagram illustrating the visual assistance system in accordance with some embodiments of the present disclosure;
FIG. 3 shows an exemplary process of updating a resolution database of the visual assistance system in accordance with some embodiments of the present disclosure;
FIG. 4 shows a flowchart illustrating a method of assisting users for performing a task in accordance with some embodiments of the present disclosure; and
FIG. 5 illustrates a block diagram of an exemplary computer system for implementing embodiments consistent with the present disclosure.

It should be appreciated by those skilled in the art that any block diagrams herein represent conceptual views of illustrative systems embodying the principles of the present subject matter. Similarly, it will be appreciated that any flow charts, flow diagrams, state transition diagrams, pseudo code, and the like represent various processes which may be substantially represented in computer readable medium and executed by a computer or processor, whether or not such computer or processor is explicitly shown.
DETAILED DESCRIPTION
In the present document, the word "exemplary" is used herein to mean "serving as an example, instance, or illustration." Any embodiment or implementation of the present subject matter described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
While the disclosure is susceptible to various modifications and alternative forms, specific embodiment thereof has been shown by way of example in the drawings and will be described in detail below. It should be understood, however that it is not intended to limit the disclosure to the particular forms disclosed, but on the contrary, the disclosure is to cover all modifications, equivalents, and alternative falling within the spirit and the scope of the disclosure.
The terms “comprises”, “comprising”, “includes”, or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a setup, device or method that comprises a list of components or steps does not include only those components or steps but may include other components or steps not expressly listed or inherent to such setup or device or method. In other words, one or more elements in a system or apparatus proceeded by “comprises… a” does not, without more constraints, preclude the existence of other elements or additional elements in the system or method.
The present disclosure relates to a visual assistance system (alternatively also referred as “system”) and a method for assisting users for performing a task. Although, the method for assisting the user is described in conjunction with a server, the said method can also be implemented in various computing systems/devices, other than the server. Assisting the users over the web or Internet has become much popular now a days. Not only because it is convenient and

fast, but also because such type of assistance has created trust and confidence over the users, and the users understand that the assistance provided to them comes from an expert.
However, to maintain the trust and confidence, it is important to enable the visual assistance system to dynamically update itself with latest information in terms of technology and processes, which are required for efficiently assisting the users. For example, if someone buys a new smartphone equipped with latest technologies and unable to operate few of its features, the first thing he/she may do is to call customer care or connect with the interactive systems like chatbot or other modes (e.g. human-like animated character or video) for resolving their queries. To address the user’s query, the chatbot or other modes must be capable enough to quickly provide the solution to the users.
It may happen that the existing chatbot or other modes of visual assistance may not be instantly able to provide assistance to the user. In such scenario, the existing chatbot may pass the user query to a human agent to provide assistance to the user.
While assisting the user, the human agent may provide multimedia content to the user. The multimedia content may be provided in a form of at least one of audio data, text data, image data, graphics data, augmented data, and video data or combination of these. According to an embodiment, the human agent may use markers while providing the multimedia content to the user. The markers may be invisible or inaudible to the user or even to the human agent. Such markers may comprise for example, but not limited to, an ultrasonic sound or a parasonic sound which are not audible to the human. However, such markers may be identified by the visual assistance system. Based on the markers, annotated content may be retrieved from the multimedia content. Further, the visual assistance system may compare the annotated content with existing response or existing content which were previously used for addressing the same/ similar user query. It may be understood that, the annotated content is nothing, but the useful content marked by the human agent while interacting with the user. Based on the comparison, the system may identify the portion of the extracted useful content which may be required for updating. Hence, when subsequent users request the same or similar query, the visual assistance system may not have to consult the human agent again. This is because, the visual assistance system with the updated content is now able to handle the similar user query. This way, the visual assistance system

dynamically updates itself with the newer/updated information which further helps assisting the subsequent users with their queries.
In the following detailed description of the embodiments of the disclosure, reference is made to the accompanying drawings that form a part hereof, and in which are shown by way of illustration specific embodiments in which the disclosure may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the disclosure, and it is to be understood that other embodiments may be utilized and that changes may be made without departing from the scope of the present disclosure. The following description is, therefore, not to be taken in a limiting sense.
FIG. 1 shows an exemplary environment illustrating a visual assistance system for assisting users for performing a task in accordance with some embodiments of the present disclosure.
The environment 100 includes the visual assistance system 102, a user 104, a human agent 106, user query 108, an object 110, an intent 112, a resolution database 114, existing response 116, multimedia content 118, and annotated content 120. According to an embodiment of present disclosure, the user 104 may raise a request to the human agent 106 for performing a task. The user 104 and the human agent 106 may use devices, such as mobile device, laptop, desktop and the like for communicating with each other and with the system 102. The human agent 106 may be a subject-matter expert or a domain expert who may help/assist the user 104 in performing the task during the interaction.
The communication between the user 104 and the human agent 106 may be stored in the resolution database 114. According to an embodiment of the present disclosure, the resolution database 114 may contain two different databases, one for storing the existing response 116 and another for storing the multimedia content 118. However, it may be understood to the skilled person there may be more than two databases present for storing the data such as existing response 116 and multimedia content 118. The existing response 116 may comprise “previous responses” provided by the human agent 106 for the user query 108. Hence, the existing response 116 may already be stored in the resolution database 114. Whereas, the multimedia content 118 may

comprise “current response” provided by the human agent 106 during the interaction with the user 104. Here, it may be possible that the system 102 may not be able to assist the users in addressing the user query 108 by using the existing response 116. This may happen because processes or operations required for handling the user query 108 has been changed or updated, and hence the existing response 116 are not able to support the system 102 for addressing the user query 108. In such situation, the user query 108 is passed to the human agent 106 for providing assistance to the user 104.
While interacting with the user 104, the human agent 106 may insert markers (invisible text markers or inaudible markers) into the multimedia content 118. Once the interaction between the user 104 and the human agent 106 completes, the task of the system 102 may be to update the resolution database 114 with newest information provided by the human agent 106, from the multimedia content 118, for solving the subsequent user queries. For this, the system 102 may analyse the user query 108, using Natural Language Processing (NLP), to determine an object 110 and an intent 112 associated with the user query 108. However, it may be understood to the skilled person that, more than one object 110 and more than one intent 112 may be determined which may be associated with the user query 108.
Based on the object 110 and the intent 112, the annotated content 120 may be captured from the multimedia content 118. However, for capturing the annotated content 120, the visual assistance system 102, at first, identifies the markers in the multimedia content 118, and then retrieves the annotated content 120 based on markers inserted by the human agent 106 while providing the multimedia content 118 to the user 104.
According to an embodiment, the annotated content 120 may comprise annotation of a Region of Interest (ROI) associated with the object 110 and corresponding instruction provided by the human agent 106 for performing the task. The annotated content 120 may be compared with the existing response 116 to determine the newest information required for updating the resolution database 114. As it may happen that some portion of the annotated content 120 may overlap with the existing response 116, and hence the comparison is required to identify the exact content required for updating the resolution database 114. Thus, the resolution database 114 may be updated by synchronizing the annotated content 120 (having the latest content) with the existing

response 116. When the subsequent user provides similar user query 108, the system 102 may be able to assist him/her by providing the existing response 116 along with the annotated content 120 based on the update.
FIG. 2 shows a detailed block diagram illustrating the visual assistance system in accordance with some embodiments of the present disclosure.
The visual assistance system 102 (alternatively also referred as “system”) may comprises an I/O interface 202, a processor 204, and a memory 206. The I/O interface 202 may include various software and hardware interfaces, for example, a web interface, a graphical user interface, and the like. The I/O interface 202 may allow the system 102 to interact with the user 104 and the human agent 106 through the devices associated with the user and the human agent. The memory 206 may be communicatively coupled to the processor 204. The processor 204 may be configured to perform one or more functions of the system 102 for assisting the user 104 for performing the task. In one implementation, the system 102 may comprise data 208 and modules 210 for performing various operations in accordance with the embodiments of the present disclosure. In an embodiment, the data 208 may include, without limitation, user query 108, object 110, intent 112, existing response 116, multimedia data 118, annotated content 120, and other data 212. According to an embodiment, the memory 206 may also store the resolution database 114 which may comprise two separate databases, one for storing the existing response 116 and other for storing the multimedia content 118 which further includes annotated content 120. According to an embodiment, the existing response 116 and the multimedia content 118 may be provided in a form of at least one of audio data, voice data, text data, image data, graphics data, augmented data, and video data.
In one embodiment, the data 208 may be stored within the memory 206 in the form of various data structures. Additionally, the aforementioned data 208 can be organized using data models, such as relational or hierarchical data models. The other data 212 may store data, including temporary data and temporary files, generated by the modules 210 for performing the various functions of the system 102. For example, other data 212 may include information related to one or more pairs of markers inserted into the multimedia content 118. The one or more pairs of markers may be inaudible markers which may include, but not limited to, ultrasonic marker and

parasonic marker. According to an embodiment, the one or more pairs of markers may be invisible text markers, which may include, but not limited to, ultraviolet markers. The purpose of using the markers is to extract only relevant content i.e., annotated content 120 from the entire multimedia content 118 in order to reduce the overload of the system 102, hence making the system 102 efficient.
As it may happen that, during the interaction, some formal introduction or feedback related conversation may also be captured. However, such conversation may be not be useful for providing the assistance to the user 104. Thus, the human agent 106 may use the markers to tag only those content which are useful for providing the assistance. According to an embodiment, the human agent 106 may use any button or key or any other means for introducing the markers during the interaction. For example, after having the formal introduction, if the human agent 106 believes that the upcoming information or content which he/she is going to explain to the user 104 is useful for performing user’s task, he/she immediately, before providing upcoming content, presses a button to insert the markers. According to another embodiment, the human agent 106 may provide the markers in a pair having a starting point of the instruction/content and an ending point of the same instruction/content for tagging.
In an embodiment, the user query 108 may comprise a request made by the user 104 for performing a task. For example, if the user 104 wishes to configure a printer with his/her laptop, and however not able to do it on own, the user 104 may raise the request as “I want to install my printer’s software into my laptop”. According to an embodiment, the user 104 may also specify brand name or model number of printer (like HPTM, CanonTM) in the request. It may be understood to the skilled person that, there may be various types of user request 108 related to different domains, which may be made by the user 104. For example, the user 104 may raise a request for operating latest features of his/her smartphone or a request for operating dashboard of his/her car for playing music.
In an embodiment, the object 110 and the intent 112 may be determined, by the system 102, using Natural Language Processing (NLP). However, according to another embodiment, more than one object 110 and more than one intent 112 may be determined which may be associated with the user query 108. Determining of the object 110 and intent 112 may help the system 102 to

understand the user query 108 or the task which the user 104 wish to perform. According to an embodiment, the system 102 may split the user query 108 “I want to install my printer’s software into my laptop” into plurality of words. Then, by using the NLP, the system 102 determines the meaning of each of the split words to determine the object 110 and the intent 112. Considering the above printer configuration request “I want to install my printer’s software into my laptop”, the system 102 may determine the object 110 as “printer software” and intent 112 as “software installation”. The system 102 may later use the object 110 and the intent 112 for retrieving the annotated content 120 explained in subsequent paragraphs of the specification.
In an embodiment, the resolution database 114 may comprise two separate databases, one for storing the existing response 116 and another for storing the multimedia content 118. In an embodiment, the existing response 116 may comprise previous instructions provided by the human agent 106 for addressing the user query 108 which may be related to same printer installation request. However, according to other embodiments of present disclosure, the existing response 116 may also comprise previous instructions, provided by the human agent 106, associated with different user queries related to different domains. For example, the existing response 116 may comprise previous instructions related to credit card redemption process, operating of washing machine and the like. Whereas, the multimedia content 118 may comprise the recent response or instructions provided by the human agent 106 while interacting with the user 104 for performing the task. While providing the instructions (multimedia content 118), the human agent 106 may mark the useful instructions/content using the markers (inaudible marker and invisible text marker) which are invisible or inaudible to the user 104 and even to the human agent 106. The inaudible markers may be ultrasonic sound or parasonic sound which the system 102 can only detect. Further, the invisible text markers may include, but not limited to, an ultraviolet marker.
Now, from the multimedia content 118, the annotated content 120 may be captured/retrieved by the system 102 based on the object 110 and the intent 112. For this, the system 102 may locate the markers present in the multimedia content 118, and then retrieve the annotated content 120 which are marked by the markers. According to an embodiment, the annotated content 120 may comprise annotation of one or more Region of Interests (ROIs) associated with the object 110 and corresponding instructions provided by the human agent 106 for assisting the user 104 to perform the task. Referring back to the example of the user’s query

108 “I want to install my printer’s software into my laptop” along with the determined object 110 as “printer software” and the intent 112 as “software installation”, the annotated content 120 may comprise, for example, the step-wise instructions like “install cartridges in the printer”, “insert paper into tray”, “insert printer installation Compact Disk (CD) and then run the .exe file”, “connect your laptop with the printer using Universal Serial Bus (USB) cable”, and “Switch ON the printer”.
In an embodiment, the above discussed data 208 (user query 108, object 110, intent 112, existing response 116, multimedia content 118, annotated content 120, and other data 212) may be processed by one or more modules 210. In one implementation, the one or more modules 210 may also be stored as a part of the processor 204. In an example, the one or more modules 210 may be communicatively coupled to the processor 204 for performing one or more functions of the system 102.
In one implementation, the one or more modules 210 may include, without limitation, a receiving module 214, a determining module 216, a capturing module 218, an identifying module 220, a retrieving module 222, a comparing module 224, an updating module 226, an assisting module 228, a training module 230, and other modules 232. As used herein, the term module refers to an application specific integrated circuit (ASIC), an electronic circuit, a processor 204 (shared, dedicated, or group) and memory 206 that execute one or more software or firmware programs, a combinational logic circuit, and/or other suitable components that provide the described functionality. The other modules 232 may include programs or coded instructions that supplement applications and functions of the system 102 for assisting the user 104 for performing the task.
Now, how the system 102 is implemented for assisting the user 104 for performing the task is discussed herein in detail with reference to an embodiment illustrated in Fig. 3. The objective of the present disclosure is to dynamically update the resolution database 114 by using interaction captured between the user 104 and the human agent 106. According to an embodiment of present disclosure, the user 104 may initiate the interaction by providing a user query 108 to the human agent 106 for performing the task. In response, the human agent 106 may understand the user query 108 and respond to the user 104 with steps/instructions required for assisting the user 104 to perform the task.

However, before starting the interaction with the human agent 108, it is important to understand that why such interaction is required, since the resolution database 114 already stores the existing response 116 which were previously used for assisting the users in addressing the user query 108. The reason being the existing response 116 now may not be able to help the user 104 for performing their task associated with the user query 108. This may happen due to various reasons, for example, but not limited to, change in procedure for performing the task, change in operation for performing the task, user query 108 completely new for the system 102, and hence no instructions are available in the resolution database 114 for assisting the user 104. In any of these situations, the existing response 116 may not be sufficient to help the user 104 in performing their task. Hence, the aim of the present disclosure is to analyze the multimedia content 118 received from the human agent 106 to dynamically update the resolution database 114 with the newest information.
When the interaction between the user 104 and the human agent 106 is completed, the multimedia content 118, provided by the human agent 106 during the interaction, may be stored in the resolution database 114. As discussed earlier, the resolution database 114 may comprise two separate databases, one for storing the multimedia content 118 captured during the current interaction between the user 104 and the human agent 106, and another database for storing the existing response 116 which includes previously stored instructions for addressing the same user query 108 for performing the task.
According to an embodiment, the receiving module 214 may receive the user query 108. The user query 108 may comprise, for example, “How to block senders in webmail”. This example of the user query 108 has been considered for explaining the working of the system 102 according to an embodiment of the present disclosure. However, it may be understood to the skilled person that, there may be different types of user queries from different domain which the system 102 may handle. Therefore, the examples of the user query 108 taken in the above paragraphs are only for the explanation purpose and not for limiting the scope of the present disclosure.
Once the user query 108 is received, in next step, the determining module 216 may determine the object 110 and the intent 112 associated with the user query 108 using Natural Language Processing (NLP). For the user query 108 “How to block senders in webmail”, the object

110 may be determined as “Webmail” and the intent 112 may be determined as “blocking email senders”. Based on the object 110 and the intent 112, the capturing module 218 may capture the annotated content 120 provided by the human agent 106 while responding to the user query 108.
As discussed in the earlier paragraphs, the human agent 106 may insert one or more pairs of markers into the multimedia content 118 while responding to the user query 108 in order to easily locate useful content from the multimedia content 118.
Thus, in next step, the identifying module 220 may identify one or more pairs of markers in the multimedia content 118 provided by the human agent 106 while responding to the user query 108. Upon identifying the one or more pairs of markers, in next step, the retrieving module 222 may retrieve the annotated content 120 in each pair of the one or more pairs of markers from the multimedia content 118. According to an embodiment, the annotated content 120 may comprise an annotation of a Region of Interest (ROI) associated with the object 110 and corresponding instruction provided by the human agent 106 for performing the task. In other words, the annotated content 120 may be stepwise instructions in the form of text/image/video snippets or combination thereof for helping the user 104 in performing their task.
Since the aim of the present disclosure is to dynamically update the resolution database 114 with the latest information, it is important to check whether the already stored existing response 116 differs from the annotated content 120 extracted based on the markers. An example of the existing response 116 and the annotated content 120 is shown in Fig. 3. It may be understood that, the existing response 116 shown in Fig. 3 may comprise the previously followed stepwise instructions for addressing the user query 108 “How to block senders in webmail”. In the existing response 116, there are five steps to be followed for blocking the senders in the webmail. However, as discussed earlier, it may happen that the webmail may get upgraded, and therefore the five steps (Step 1-Step 5) of the previously stored existing response 116 may not help the user 104 to block the senders in the webmail. Further, with the newest or latest information received from the human agent 106 in a form of multimedia content 118, from which the annotated content 120 is extracted, the system 102 may understand that the new information is helpful for addressing the user query 108. However, the system 102, at this stage, may not figure out what exact information from the

newly captured information (annotated content 120) is required for updating the resolution database 114.
For this purpose, the comparing module 224 may compare the steps of the annotated content 120 with the steps of the existing response 116 which are associated with the same object 110 and the intent 112. From the Fig. 3, it may be observed that, the Step 1 and Step 2 of both the tables (existing response 116 and the annotated content 120) are same. However, Step 3 of both the tables are slightly different. In the earlier Step 3 (of the existing response 116), the user 104 was required to click on “block or allow” option (shown as 302A). That is, blocking and allowing option used to appear in a same link to the user 104. However, due to the upgrade of the webmail, the interface or layout of the webmail may be slightly changed. Now, the user 104 is required to click only on the “block” option (shown as 302B) rather than clicking on “block or allow” option. Hence, the system 102 understands that the Step 3 (of the annotated content 120 table) is a new information which is required to be updated. Now, after clicking on the “block” option, the next step (Step 4) is to enter the email ids in “block box” which the user 104 may want to block. From Fig. 3, it may be observed that, the Step 4 in both the tables are same.
However, in the annotated content 120 table, again a new step 4a may be observed which is not present in the existing response 116 table. The new step 4a (shown as 304) requires the user 104 to click on “+” sign to confirm the email ids (inserted in the block box in Step 4) to block. Further, last Step 5 in both the tables are again same, and hence no update is required for that step. Based on the above comparison explained with respect to the Fig. 3, it becomes clear to the system 102 that Step 3 and Step 4a (emphasized by bold and underline) are the newest information which are required to be updated in the resolution database 114. Further, the system 102 also understands that other steps i.e., Step 1, Step 2, and Step 5 of the existing response 116 table are still the same, and hence no update is required for these steps.
Further, the updating module 226 may update the resolution database 114 by synchronizing the step 3 and step 4a (of the annotated content 120 table) with the steps 1-5 (of the existing response 116 table) in a sequence. In other words, the sequence of the steps may comprise like Step 1^Step 2^Step 3 (newest information)^Step 4^Step 4a (newest information)^Step 5. Now, in the next step, the assisting module 228 may assist the subsequent users for performing the

task associated with the user query 108 by providing the existing response 116 along with the annotated content 120.
It may be understood to the skilled person that the above discussed scenarios are just an example taken for explaining the working of the system 102. There may different types of user queries 108 in different domains for which the human agent 106 may provide the multimedia content 118. From each multimedia content 118, the system 102 may be able to update the resolution database 114. Further, the training module 230 may also train the system 102 based on plurality of interactions between the users 104 and the human agents 106 on different domains.
FIG. 4 shows a flowchart illustrating a method of assisting users for performing a task in accordance with some embodiments of the present disclosure.
As illustrated in FIG. 4, the method 400 comprises one or more blocks for assisting a user 104 in performing a task using a visual assistance system 102. The method 400 may be described in the general context of computer executable instructions. Generally, computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, and functions, which perform particular functions or implement particular abstract data types.
The order in which the method 400 is described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method. Additionally, individual blocks may be deleted from the methods without departing from the scope of the subject matter described herein. Furthermore, the method can be implemented in any suitable hardware, software, firmware, or combination thereof.
At block 402, the visual assistance system 102 may receive a user query 108 from the user 104 for performing the task in real-time. The user query 108 may comprise, for example, “how to block senders in webmail”
At block 404, the visual assistance system 102 may determine an object 110 and an intent 112 associated with the user query 108. However, according to another embodiment, it may be understood that more than one object 110 and more than one intent 112 may be determined which

may be associated with the user query 108. For the user query 108 “how to block senders in webmail”, the object 110 determined may be “webmail” and the intent 112 determined may be “blocking senders”.
At block 406, the visual assistance system 102 may capture annotated content 120 provided by the human agent 106 while responding to the user query 108 based on the object 110 and the intent 112 associated with the user query 108.
At block 408, the visual assistance system 102 may compare the annotated content 120 with an existing response 116 associated with the object 110 and the intent 112 stored in a resolution database 114. As discussed earlier, the existing response 116 may comprise previous responses/instructions stored in the resolution database 114 for addressing the user query 108.
At block 410, the visual assistance system 102 may update the resolution database 114 with the annotated content 120 based on the comparison performed at block 408. According to an embodiment, the annotated content 120 may be synchronized with the existing response 116 in a sequence.
At block 412, the visual assistance system 102 may assist subsequent users for performing the task associated with the user query 108 by providing the existing response 116 along with the annotated content 120.
Computer System
FIG. 5 illustrates a block diagram of an exemplary computer system for implementing embodiments consistent with the present disclosure. In an embodiment, the computer system 500 can be the visual assistance system 102 which is used for assisting a user 104 for performing a task. According to an embodiment, the computer system 500 may receive the user query 108 from the user 104 for performing the task. The computer system 500 may comprise a central processing unit (“CPU” or “processor”) 502. The processor 502 may comprise at least one data processor for executing program components for executing user- or system-generated business processes. The processor 502 may include specialized processing units such as integrated system (bus) controllers,

memory management control units, floating point units, graphics processing units, digital signal processing units, etc.
The processor 502 may be disposed in communication with one or more input/output (I/O) devices (511 and 512) via I/O interface 501. The I/O interface 501 may employ communication protocols/methods such as, without limitation, audio, analog, digital, stereo, IEEE-1394, serial bus, Universal Serial Bus (USB), infrared, PS/2, BNC, coaxial, component, composite, Digital Visual Interface (DVI), high-definition multimedia interface (HDMI), Radio Frequency (RF) antennas, S-Video, Video Graphics Array (VGA), IEEE 802.n /b/g/n/x, Bluetooth, cellular (e.g., Code-Division Multiple Access (CDMA), High-Speed Packet Access (HSPA+), Global System For Mobile Communications (GSM), Long-Term Evolution (LTE) or the like), etc.
Using the I/O interface 501, the computer system 500 may communicate with one or more I/O devices (511 and 512).
In some embodiments, the processor 502 may be disposed in communication with a communication network 509 via a network interface 503. The network interface 503 may communicate with the communication network 509. The network interface 503 may employ connection protocols including, without limitation, direct connect, Ethernet (e.g., twisted pair 10/100/1000 Base T), Transmission Control Protocol/Internet Protocol (TCP/IP), token ring, IEEE 802.11a/b/g/n/x, etc. The communication network 509 can be implemented as one of the different types of networks, such as intranet or Local Area Network (LAN) and such within the organization. The communication network 509 may either be a dedicated network or a shared network, which represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), Wireless Application Protocol (WAP), etc., to communicate with each other. Further, the communication network 509 may include a variety of network devices, including routers, bridges, servers, computing devices, storage devices, etc.
In some embodiments, the processor 502 may be disposed in communication with a memory 505 (e.g., RAM 513, ROM 514, etc. as shown in FIG. 5) via a storage interface 504. The storage interface 504 may connect to memory 505 including, without limitation, memory drives,

removable disc drives, etc., employing connection protocols such as Serial Advanced Technology Attachment (SATA), Integrated Drive Electronics (IDE), IEEE-1394, Universal Serial Bus (USB), fiber channel, Small Computer Systems Interface (SCSI), etc. The memory drives may further include a drum, magnetic disc drive, magneto-optical drive, optical drive, Redundant Array of Independent Discs (RAID), solid-state memory devices, solid-state drives, etc.
The memory 505 may store a collection of program or database components, including, without limitation, user/application data 506, an operating system 507, web browser 508 etc. In some embodiments, the computer system 500 may store user/application data 506, such as the data, variables, records, etc. as described in this invention. Such databases may be implemented as fault-tolerant, relational, scalable, secure databases such as Oracle or Sybase.
The operating system 507 may facilitate resource management and operation of the computer system 500. Examples of operating systems include, without limitation, APPLE® MACINTOSH® OS X®, UNIX®, UNIX-like system distributions (E.G., BERKELEY SOFTWARE DISTRIBUTION® (BSD), FREEBSD®, NETBSD®, OPENBSD, etc.), LINUX® DISTRIBUTIONS (E.G., RED HAT®, UBUNTU®, KUBUNTU®, etc.), IBM®OS/2®, MICROSOFT® WINDOWS® (XP®, VISTA®/7/8, 10 etc.), APPLE® IOS®, GOOGLETM ANDROIDTM, BLACKBERRY® OS, or the like. The User interface (I/O interface) 506 may facilitate display, execution, interaction, manipulation, or operation of program components through textual or graphical facilities. For example, user interfaces may provide computer interaction interface elements on a display system operatively connected to the computer system 500, such as cursors, icons, checkboxes, menus, scrollers, windows, widgets, etc. Graphical User Interfaces (GUIs) may be employed, including, without limitation, Apple® Macintosh® operating systems’ Aqua®, IBM® OS/2®, Microsoft® Windows® (e.g., Aero, Metro, etc.), web interface libraries (e.g., ActiveX®, Java®, Javascript®, AJAX, HTML, Adobe® Flash®, etc.), or the like.
In some embodiments, the computer system 500 may implement the web browser 508 stored program components. The web browser 508 may be a hypertext viewing application, such as MICROSOFT® INTERNET EXPLORER®, GOOGLETM CHROMETM, MOZILLA® FIREFOX®, APPLE® SAFARI®, etc. Secure web browsing may be provided using Secure Hypertext Transport Protocol (HTTPS), Secure Sockets Layer (SSL), Transport Layer Security

(TLS), etc. Web browsers 508 may utilize facilities such as AJAX, DHTML, ADOBE® FLASH®, JAVASCRIPT®, JAVA®, Application Programming Interfaces (APIs), etc. In some embodiments, the computer system 500 may implement a mail server stored program component. The mail server 516 may be an Internet mail server such as Microsoft Exchange, or the like. The mail server 516 may utilize facilities such as Active Server Pages (ASP), ACTIVEX®, ANSI® C++/C#, MICROSOFT®, .NET, CGI SCRIPTS, JAVA®, JAVASCRIPT®, PERL®, PHP, PYTHON®, WEBOBJECTS®, etc. The mail server 516 may utilize communication protocols such as Internet Message Access Protocol (IMAP), Messaging Application Programming Interface (MAPI), MICROSOFT® exchange, Post Office Protocol (POP), Simple Mail Transfer Protocol (SMTP), or the like. In some embodiments, the computer system 500 may implement a mail client 515 stored program component. The mail client 515 may be a mail viewing application, such as APPLE® MAIL, MICROSOFT® ENTOURAGE®, MICROSOFT® OUTLOOK®, MOZILLA® THUNDERBIRD®, etc.
Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present invention. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., non-transitory. Examples include Random Access Memory (RAM), Read-Only Memory (ROM), volatile memory, nonvolatile memory, hard drives, Compact Disc (CD) ROMs, Digital Video Disc (DVDs), flash drives, disks, and any other known physical storage media.
Advantages of the embodiment of the present disclosure are illustrated herein.
In an embodiment, the present disclosure provides a method for dynamically updating resolution database based on the interaction between the user and human agent to provide assistance to the subsequent users for performing the task.

In an embodiment, the present disclosure minimizes the human agent interaction with the user for the same query, thereby reducing the cost required for repeatedly consulting/hiring of the human agent.
In an embodiment, the present disclosure reduces the downtime of the visual assistance system due to dynamic update of the resolution database.
In an embodiment, the present disclosure reduces the need of constant availability of the human agent for addressing the user query.
In an embodiment, the present disclosure makes the system faster and efficient in terms of its processing speed and time by retrieving the annotated content based on the markers.
In an embodiment, the method of present disclosure learns from the interaction between the user and human agent for providing the assistance.
The terms "an embodiment", "embodiment", "embodiments", "the embodiment", "the embodiments", "one or more embodiments", "some embodiments", and "one embodiment" mean "one or more (but not all) embodiments of the invention(s)" unless expressly specified otherwise.
The terms "including", "comprising", “having” and variations thereof mean "including but not limited to", unless expressly specified otherwise.
The enumerated listing of items does not imply that any or all of the items are mutually exclusive, unless expressly specified otherwise.
The terms "a", "an" and "the" mean "one or more", unless expressly specified otherwise.
A description of an embodiment with several components in communication with each other does not imply that all such components are required. On the contrary, a variety of optional components are described to illustrate the wide variety of possible embodiments of the invention.
When a single device or article is described herein, it will be readily apparent that more than one device/article (whether or not they cooperate) may be used in place of a single

device/article. Similarly, where more than one device or article is described herein (whether or not they cooperate), it will be readily apparent that a single device/article may be used in place of the more than one device or article or a different number of devices/articles may be used instead of the shown number of devices or programs. The functionality and/or the features of a device may be alternatively embodied by one or more other devices which are not explicitly described as having such functionality/features. Thus, other embodiments of the invention need not include the device itself.
Finally, the language used in the specification has been principally selected for readability and instructional purposes, and it may not have been selected to delineate or circumscribe the inventive subject matter. It is therefore intended that the scope of the invention be limited not by this detailed description, but rather by any claims that issue on an application based here on. Accordingly, the embodiments of the present invention are intended to be illustrative, but not limiting, of the scope of the invention, which is set forth in the following claims.
While various aspects and embodiments have been disclosed herein, other aspects and embodiments will be apparent to those skilled in the art. The various aspects and embodiments disclosed herein are for purposes of illustration and are not intended to be limiting, with the true scope and spirit being indicated by the following claims.

We Claim:
1. A method of assisting users (104) for performing a task, the method comprising:
receiving, by a visual assistance system (102), a user query (108) for performing a task in real-time;
determining, by the visual assistance system (102), at least one object (110) and at least one intent (112) associated with the user query (108);
capturing, by the visual assistance system (102), annotated content (120) provided by a human agent (106) while responding to the user query (108) based on the at least one object (110) and the at least one intent (112) associated with the user query (108);
comparing, by the visual assistance system (102), the annotated content (120) with an existing response (116) associated with the at least one object (110) and the at least one intent (112) stored in a resolution database (114);
updating, by the visual assistance system (102), the resolution database (114) with the annotated content (120) based on the comparison, wherein the annotated content (120) is synchronized with the existing response (116) in a sequence; and
assisting, by the visual assistance system (102), subsequent users for performing the task associated with the user query (108) by providing the existing response (116) along with the annotated content (120).
. The method as claimed in claim 1, wherein the at least one object (110) and the at least one intent (112) associated with the user query (108) are determined using Natural Language Processing (NLP).
. The method as claimed in claim 1, wherein capturing the annotated content (120) provided by the human agent (106) comprises:
identifying, by the visu al assistance system (102), one or more pairs of markers in multimedia content (118) provided by the human agent (106) while responding to the user query (108); and
retrieving, by the visual assistance system (102), the annotated content (120) in each pair of the one or more pairs of markers from the multimedia content (118), wherein the annotated content (120) comprises an annotation of a Region of Interest (ROI) associated with the at least

one object (110) and corresponding instruction provided by the human agent (106) for performing the task.
4. The method as claimed in claim 3, wherein the one or more pairs of markers include at least one of an inaudible marker and an invisible text marker.
5. The method as claimed in claim 1, wherein the existing response (116) and the multimedia content (118) comprises at least one of audio data, text data, image data, graphics data, augmented data, and video data.
6. A visual assistance system (102) for assisting users (104) in performing a task, the visual assistance system (102) comprising:
a processor (204); and
a resolution database (114) stored in a memory (206), wherein the memory (206) communicatively coupled to the processor (204), and wherein the memory (206) further stores processor-executable instructions, which, on execution, causes the processor (204) to:
receive a user query (108) for performing a task in a real-time;
determine at least one object (110) and at least one intent (112) associated with the user query (108);
capture annotated content (120) provided by a human agent (106) while responding to the user query (108) based on the at least one object (110) and the at least one intent (112) associated with the user query (108);
compare the annotated content (120) with an existing response (116) associated with the at least one object (110) and the at least one intent (112) stored in the resolution database (114);
update the resolution database (114) with the annotated content (120) based on the comparison, wherein the annotated content (120) is synchronized with the existing response (116) in a sequence; and
provide assistance to subsequent users for performing the task associated with the user query (108) by providing the existing response (116) along with the annotated content (120).

7. The visual assistance system (102) as claimed in claim 6, wherein the processor (204) determines the at least one object (110) and the at least one intent (112) associated with the user query (108) using Natural Language Processing (NLP).
8. The visual assistance system (102) as claimed in 6, wherein the processor (204) captures the annotated content (120) provided by the human agent (106) by:
identifying one or more pairs of markers in multimedia content (118) provided by the human agent (106) while responding to the user query (108); and
retrieving the annotated content (120) in each pair of the one or more pairs of markers from the multimedia content (118), wherein the annotated content (120) comprises an annotation of a Region of Interest (ROI) associated with the at least one object (110) and corresponding instruction provided by the human agent (106) for performing the task.
9. The visual assistance system (102) as claimed in claim 8, wherein the one or more pairs of markers include at least one of an inaudible marker and an invisible text marker.
10. The visual assistance system (102) as claimed in claim 6, wherein the existing response (116) and the multimedia content (118) comprises at least one of audio data, text data, image data, graphics data, augmented data, and video data.
11. The visual assistance system (102) as claimed in claim 6, wherein the processor (204) trains the visual assistance system (102) based on plurality of interactions between the users (104) and the human agents (106) on one or more domains.

Documents

Application Documents

#	Name	Date
1	201941034111-ABSTRACT [23-03-2022(online)].pdf	2022-03-23
1	201941034111-STATEMENT OF UNDERTAKING (FORM 3) [23-08-2019(online)].pdf	2019-08-23
2	201941034111-REQUEST FOR EXAMINATION (FORM-18) [23-08-2019(online)].pdf	2019-08-23
2	201941034111-AMENDED DOCUMENTS [23-03-2022(online)].pdf	2022-03-23
3	201941034111-POWER OF AUTHORITY [23-08-2019(online)].pdf	2019-08-23
3	201941034111-CLAIMS [23-03-2022(online)].pdf	2022-03-23
4	201941034111-FORM 18 [23-08-2019(online)].pdf	2019-08-23
4	201941034111-FER_SER_REPLY [23-03-2022(online)].pdf	2022-03-23
5	201941034111-FORM 13 [23-03-2022(online)].pdf	2022-03-23
5	201941034111-FORM 1 [23-08-2019(online)].pdf	2019-08-23
6	201941034111-OTHERS [23-03-2022(online)].pdf	2022-03-23
6	201941034111-DRAWINGS [23-08-2019(online)].pdf	2019-08-23
7	201941034111-PETITION UNDER RULE 137 [23-03-2022(online)].pdf	2022-03-23
7	201941034111-DECLARATION OF INVENTORSHIP (FORM 5) [23-08-2019(online)].pdf	2019-08-23
8	201941034111-POA [23-03-2022(online)].pdf	2022-03-23
8	201941034111-COMPLETE SPECIFICATION [23-08-2019(online)].pdf	2019-08-23
9	abstract 201941034111.jpg	2019-08-27
9	201941034111-Proof of Right [23-03-2022(online)].pdf	2022-03-23
10	201941034111-FER.pdf	2021-10-17
10	201941034111-Request Letter-Correspondence [27-08-2019(online)].pdf	2019-08-27
11	201941034111-Form 1 (Submitted on date of filing) [27-08-2019(online)].pdf	2019-08-27
11	201941034111-Power of Attorney [27-08-2019(online)].pdf	2019-08-27
12	201941034111-Form 1 (Submitted on date of filing) [27-08-2019(online)].pdf	2019-08-27
12	201941034111-Power of Attorney [27-08-2019(online)].pdf	2019-08-27
13	201941034111-FER.pdf	2021-10-17
13	201941034111-Request Letter-Correspondence [27-08-2019(online)].pdf	2019-08-27
14	201941034111-Proof of Right [23-03-2022(online)].pdf	2022-03-23
14	abstract 201941034111.jpg	2019-08-27
15	201941034111-COMPLETE SPECIFICATION [23-08-2019(online)].pdf	2019-08-23
15	201941034111-POA [23-03-2022(online)].pdf	2022-03-23
16	201941034111-DECLARATION OF INVENTORSHIP (FORM 5) [23-08-2019(online)].pdf	2019-08-23
16	201941034111-PETITION UNDER RULE 137 [23-03-2022(online)].pdf	2022-03-23
17	201941034111-DRAWINGS [23-08-2019(online)].pdf	2019-08-23
17	201941034111-OTHERS [23-03-2022(online)].pdf	2022-03-23
18	201941034111-FORM 1 [23-08-2019(online)].pdf	2019-08-23
18	201941034111-FORM 13 [23-03-2022(online)].pdf	2022-03-23
19	201941034111-FORM 18 [23-08-2019(online)].pdf	2019-08-23
19	201941034111-FER_SER_REPLY [23-03-2022(online)].pdf	2022-03-23
20	201941034111-POWER OF AUTHORITY [23-08-2019(online)].pdf	2019-08-23
20	201941034111-CLAIMS [23-03-2022(online)].pdf	2022-03-23
21	201941034111-REQUEST FOR EXAMINATION (FORM-18) [23-08-2019(online)].pdf	2019-08-23
21	201941034111-AMENDED DOCUMENTS [23-03-2022(online)].pdf	2022-03-23
22	201941034111-STATEMENT OF UNDERTAKING (FORM 3) [23-08-2019(online)].pdf	2019-08-23
22	201941034111-ABSTRACT [23-03-2022(online)].pdf	2022-03-23
23	201941034111-US(14)-HearingNotice-(HearingDate-04-12-2025).pdf	2025-11-03
24	201941034111-Correspondence to notify the Controller [04-11-2025(online)].pdf	2025-11-04

Search Strategy

1	SearchStrategyMatrixE_22-03-2021.pdf