Abstract: Systems and methods for automated query response management based on multi-query classification. Traditional systems and methods provide for an automated response management using business rule based algorithms but none of them provide for an accurate, artificial intelligence (AI) based solution by identifying multi-intent queries. Embodiments of the present disclosure provide for automated query response management based on AI and machine learning based multi-query classification by extracting normalized data from a set of multi-queries using a natural language processing module, classifying the normalized data by intent identification and computing a mean confidence value, a proximity relevancy factor and a final confidence factor for performing a correlation of the classified data with predefined system generated templates to generate an automated response to the set of multi-queries.
Claims:1. A method for automated query response management based on multi-query classification, the method comprising a processor implemented steps of:
obtaining, by one or more hardware processors, a first set of information comprising of a set of multi-queries by one or more users;
extracting, by a natural language processing module, a set of normalized data from the first set of information by performing a normalization of one or more multi-queries from the set of multi-queries based upon one or more machine learning algorithms, wherein the set of normalized data comprises one or more keywords matching with one or more correct intent word pairs in the set of multi-queries;
classifying, by an intent identification technique, the set of normalized data for generating a second set of information to correlate the second set of information and one or more predefined system generated templates; and
performing, based upon a final confidence factor (FCF), a correlation of the second set of information and the one or more predefined system generated templates to generate an automated response to the set of multi-queries.
2. The method of claim 1, wherein the step of classifying the set of normalized data comprises computing a set of mean confidence values based upon a plurality of confidence factors for mapping the second set of information with the predefined system generated templates to generate automated response to the set of multi-queries.
3. The method of claim 1, wherein the step of performing the correlation is preceded by computing the FCF based upon a set of mean confidence values and a proximity relevancy factor (PRF) for performing a comparison of a predefined threshold value and the FCF to generate automated response to the set of multi-queries.
4. The method of claim 3, wherein the step of computing the FCF is preceded by computing the PRF based upon the set of mean confidence values for performing a correlation of the PRF and the set of mean confidence values based upon a set of mapping rules to determine whether the correlation of the second set of information and the one or more predefined system generated templates may be performed.
5. A system comprising:
a memory storing instructions;
one or more communication interfaces; and
one or more hardware processors coupled to the memory via the one or more communication interfaces, wherein the one or more hardware processors are configured by the instructions to:
obtain, by one or more hardware processors, a first set of information comprising of a set of multi-queries by one or more users;
extract, by a natural language processing module, a set of normalized data from the first set of information by performing a normalization of one or more multi-queries from the set of multi-queries based upon one or more machine learning algorithms, wherein the set of normalized data comprises one or more keywords matching with one or more correct intent word pairs in the set of multi-queries;
classify, by an intent identification technique, the set of normalized data for generating a second set of information to correlate the second set of information and one or more predefined system generated templates; and
perform, based upon a final confidence factor (FCF), a correlation of the second set of information and the one or more predefined system generated templates to generate an automated response to the set of multi-queries.
6. The system of claim 5, wherein the one or more hardware processors are further configured to classify the set of normalized data by computing a set of mean confidence values based upon a plurality of confidence factors for mapping the second set of information with the predefined system generated templates to generate automated response to the set of multi-queries.
7. The system of claim 5, wherein the one or more hardware processors are further configured to compute the FCF based upon a set of mean confidence values and a proximity relevancy factor (PRF) for performing a comparison of a predefined threshold value and the FCF to generate automated response to the set of multi-queries.
8. The system of claim 7, wherein the one or more hardware processors are further configured to compute the PRF based upon the set of mean confidence values for performing a correlation of the PRF and the set of mean confidence values based upon a set of mapping rules to determine whether the correlation of the second set of information and the one or more predefined system generated templates may be performed.
, Description:FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003
COMPLETE SPECIFICATION
(See Section 10 and Rule 13)
Title of perform:
SYSTEMS AND METHODS FOR AUTOMATED QUERY RESPONSE
MANAGEMENT BASED ON MULTI-QUERY CLASSIFICATION
Applicant:
Tata Consultancy Services Limited
A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th Floor,
Nariman Point, Mumbai 400021,
Maharashtra, India
The following specification particularly describes the invention and the manner in which it is to be performed.
TECHNICAL FIELD
[0001] The present disclosure generally relates to automated query response management based on multi-query classification. More particularly, the present disclosure relates to systems and methods for automated query response management based on multi-query classification.
BACKGROUND
[0002] Electronic mail (or email) messaging is a critical tool of daily use. Email may now be considered as a primary mode of communication for individuals, businesses, governmental agencies, establishments and other entities. Almost all organizations in modern world rely on email for customer service because said entities can communicate quickly and can concurrently disseminate information to a larger audience while serving a larger customer base. A typical customer can use email messages to communicate multiple times during a day with a business. In the current context, customer sent email queries are addressed by human agents. An agent or an employee reads the mail and accordingly initiates required action and responds back customer with appropriate response. Required action comprises processing the transaction in backend enterprise systems, validating provided data etc.
[0003] Response to customer is formatted by population of predefined templates with data output as result of transaction processing. The whole process requires extensive human involvement; good amount of efforts spent in educating human agents on domain knowledge, soft skills, adherence to strict SLA timelines. Also the agent might miss a query when customer sends multiple queries in single mail. Further, the inbox folder often becomes cluttered because emails, new incoming emails as well as previously read emails, are stored in the inbox folder until a user manually moves the emails out of it. Emails also get cluttered in the sent folder because new emails sent out by the user are stored in the sent folder, and replies and forwards sent by the user to others are also stored in the sent folder.
[0004] All above leads to deterioration of service quality, customer dissatisfaction, increase in operational costs. It is also not unusual to send email messages to business after hours. Further, for service organizations providing round-the-clock customer service, millions of emails may be received, read and acted upon during a day. Hence, it becomes imperative to send replies even when a user or an employee is not around to monitor the emails so that the business and client service may continue without any obstacles. Voice based mail response systems are also typically manually configured, with the user recording a greeting change each time the user is away from the office and again when the user returns. However, the process of manually changing voice mail greetings is especially tedious for those users who need to make changes often, such as those who travel frequently, who attend many off-site meetings or somewhat lengthy meetings, and so forth.
[0005] Still further, identifying, extracting and using data residing in emails can get extremely cumbersome because the data may be residing in conceivably twenty, thirty, forty or more than a hundred emails on a particular subject. These emails may also reside in different folders, such as the inbox, sent and other named folders, and possibly in the deleted folder too. Information on a particular topic cannot be easily collated from various emails, and printing the various emails for a hard copy reference is a voluminous job with diminishing utility as the number of emails grows. Some of the traditional systems and methods provide for an automated response to the user emails. However, they rely on keyword based, business ruled based algorithms. Because of this, there is less accuracy and results may be wrong.
SUMMARY
[0006] The following presents a simplified summary of some embodiments of the disclosure in order to provide a basic understanding of the embodiments. This summary is not an extensive overview of the embodiments. It is not intended to identify key/critical elements of the embodiments or to delineate the scope of the embodiments. Its sole purpose is to present some embodiments in a simplified form as a prelude to the more detailed description that is presented below.
[0007] Systems and methods of the present disclosure enable automated query response management based on multi-query classification. In an embodiment of the present disclosure, there is provided a method for automated query response management based on multi-query classification, the method comprising: obtaining, by one or more hardware processors, a first set of information comprising of a set of multi-queries by one or more users; extracting, by a natural language processing module, a set of normalized data from the first set of information by performing a normalization of one or more multi-queries from the set of multi-queries based upon one or more machine learning algorithms, wherein the set of normalized data comprises one or more keywords matching with one or more correct intent word pairs in the set of multi-queries; classifying, by an intent identification technique, the set of normalized data for generating a second set of information to correlate the second set of information and one or more predefined system generated templates; performing, based upon a final confidence factor (FCF), a correlation of the second set of information and the one or more predefined system generated templates to generate an automated response to the set of multi-queries; classifying the set of normalized data by computing a set of mean confidence values based upon a plurality of confidence factors for mapping the second set of information with the predefined system generated templates to generate automated response to the set of multi-queries; performing the correlation by computing the FCF based upon a set of mean confidence values and a proximity relevancy factor (PRF) for performing a comparison of a predefined threshold value and the FCF to generate automated response to the set of multi-queries; and computing the FCF by computing the PRF based upon the set of mean confidence values for performing a correlation of the PRF and the set of mean confidence values based upon a set of mapping rules to determine whether the correlation of the second set of information and the one or more predefined system generated templates may be performed.
[0008] In an embodiment of the present disclosure, there is provided a system for automated query response management based on multi-query classification, the system comprising one or more processors; one or more data storage devices operatively coupled to the one or more processors and configured to store instructions configured for execution by the one or more processors to: obtain, by one or more hardware processors, a first set of information comprising of a set of multi-queries by one or more users; extract, by a natural language processing module, a set of normalized data from the first set of information by performing a normalization of one or more multi-queries from the set of multi-queries based upon one or more machine learning algorithms, wherein the set of normalized data comprises one or more keywords matching with one or more correct intent word pairs in the set of multi-queries; classify, by an intent identification technique, the set of normalized data for generating a second set of information to correlate the second set of information and one or more predefined system generated templates; perform, based upon a final confidence factor (FCF), a correlation of the second set of information and the one or more predefined system generated templates to generate an automated response to the set of multi-queries; classify the set of normalized data by computing a set of mean confidence values based upon a plurality of confidence factors for mapping the second set of information with the predefined system generated templates to generate automated response to the set of multi-queries; to compute the FCF based upon a set of mean confidence values and a proximity relevancy factor (PRF) for performing a comparison of a predefined threshold value and the FCF to generate automated response to the set of multi-queries; and compute the PRF based upon the set of mean confidence values for performing a correlation of the PRF and the set of mean confidence values based upon a set of mapping rules to determine whether the correlation of the second set of information and the one or more predefined system generated templates may be performed.
[0009] It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS
[0010] The embodiments herein will be better understood from the following detailed description with reference to the drawings, in which:
[0011] Fig. 1 illustrates a block diagram of a system for automated query response management based on multi-query classification according to an embodiment of the present disclosure;
[0012] Fig. 2 is an architecture illustrating the components and flow of a system for automated query response management based on multi-query classification according to an embodiment of the present disclosure; and
[0013] Fig. 3 is a flowchart illustrating the steps involved for automated query response management based on multi-query classification according to an embodiment of the present disclosure.
DETAILED DESCRIPTION OF THE EMBODIMENTS
[0014] The embodiments herein and the various features and advantageous details thereof are explained more fully with reference to the non-limiting embodiments that are illustrated in the accompanying drawings and detailed in the following description. The examples used herein are intended merely to facilitate an understanding of ways in which the embodiments herein may be practiced and to further enable those of skill in the art to practice the embodiments herein. Accordingly, the examples should not be construed as limiting the scope of the embodiments herein.
[0015] The embodiments of the present disclosure provide systems and methods for automated query response management based on multi-query classification. Electronic mail (or email) messaging have become critical tools of everyday personal and business life. Email may now be considered as a primary mode of communication for individuals, businesses, governmental agencies, establishments and other entities. Almost all organizations in modern world rely on email for customer service because said entities can communicate quickly and can concurrently disseminate information to a larger audience while serving a larger customer base. In contrast to manual text classification, automatic text classification is time-saving and thus less expensive in terms of labor costs. It makes the e-mail handling process more efficient. Corresponding output messages are generated based on the input messages and are automatically composed as email response messages (accessible by email systems administrator) that are then sent to a single or multiple users and responsive to their email communication inquiries. While email autoresponders spit out a general and pre-written message response, intelligent answers may require use of artificial intelligence and machine learning to understand human messages and respond with helpful answers in an automated way.
[0016] Hence, there is a need for a technology that provides for generating an automated response to multi-intent user queries by leveraging machine learning and natural language processing (NLP) techniques to cleanse emails, classify emails into pre-identified categories based upon a set of defined rules. It must extract meaningful data items (required to process customer request) from mail using keyword matching, pattern matching, and the NLP techniques. The solution further must also validate the extracted data with backend enterprise systems if required, and processes the request in backend enterprise systems by using exposed web services from enterprise systems. Once processing done, depending upon the outcome, the technology must provide for a solution for filling up an identified response template for that category (for which an action is required), and for formatting the response. Finally it must generate an automated response to customer mail with a drafted response.
[0017] Referring now to the drawings, and more particularly to FIGS. 1 through FIGS. 3, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.
[0018] FIG. 1 illustrates an exemplary block diagram of a system 100 for automated query response management based on multi-query classification. In an embodiment, the system 100 includes one or more processors 104, communication interface device(s) or input/output (I/O) interface(s) 106, and one or more data storage devices or memory 102 operatively coupled to the one or more processors 104. The one or more processors 104 that are hardware processors can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the processor(s) is configured to fetch and execute computer-readable instructions stored in the memory. In an embodiment, the system 100 can be implemented in a variety of computing systems, such as laptop computers, notebooks, hand-held devices, workstations, mainframe computers, servers, a network cloud and the like.
[0019] The I/O interface device(s) 106 can include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like and can facilitate multiple communications within a wide variety of networks N/W and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. In an embodiment, the I/O interface device(s) can include one or more ports for connecting a number of devices to one another or to another server.
[0020] The memory 102 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes.
[0021] According to an embodiment of the present disclosure, referring to FIG. 2, the architecture and components of the system for automated mail response management based on multi-query classification may now be considered in detail. A user interface 201 allows users to connect with the system 100 and other components. A natural language processing module 202 facilitates extracting normalized data from the various information sources. A machine learning module 203 facilitates implementing machine learning algorithms, data mining, model prediction and model building and evaluation. A rules engine 204 comprises set of mapping rules and thereby facilitating rules definition, pattern matching and context mapping. A rules database 205 and a dictionary 206 communicates with the rules engine 204 for this purpose. A response generating module 207 provides for procuring of templates, template mapping and formatting final responses for automated response to the users. A template repository 208 comprises a plurality of predefined templates for generating automated response to multi-queries from the users. An application programming interface (API) exchange 209 facilitates communication between application provider’s requirements and operators capabilities through cross-operator cooperation. A database 210 comprises a plurality of information organized for automated mail response management and further communicates with response generator module 207 for transacting organized information. An exchange server 211 may comprise of a Microsoft Exchange Server™ comprising of a Microsoft's™ email, calendaring, contact, scheduling and collaboration platform deployed on Windows™ Server operating system for use within a business or larger enterprise. An application programming interface 212 facilitates communication between two or more applications. An upstream application 213 comprises a plurality of applications, inter-alia, for facilitating transaction processing, transactions validation and updating user credentials.
[0022] FIG. 3, with reference to FIGS. 1 and 2, illustrates an exemplary flow diagram of a method for automated mail response management based on multi-query classification. In an embodiment the system 100 comprises one or more data storage devices of the memory 102 operatively coupled to the one or more hardware processors 104 and is configured to store instructions for execution of steps of the method by the one or more processors 104. The steps of the method of the present disclosure will now be explained with reference to the components of the system 100 as depicted in FIG. 1 and the flow diagram. In the embodiments of the present disclosure, the hardware processors 104 when configured the instructions performs one or more methodologies described herein.
[0023] In an embodiment of the present disclosure, at step 301, the one or more hardware processors 104 obtain a first set of information comprising one or more user emails. The first set of information comprising the one or more user emails may be in the form of a text, numbers, digits, words, special characters etc. or any combination thereof. Further, the first set of information or the one or more user emails may comprise, inter-alia, of a content, header, footer, sender name and email address, names and addresses of interested persons copied and signature of the sender. According to an embodiment, the one or more user emails comprises one or more queries or one or more questions or the content or the text for which an automated mail response management based on multi-query classification may be generated. For example, the one or more user emails may be:
Dear XYZ,
Please quote the price for the item “car”.
Regards,
ABC
abc@xyz.com
ADD TOWERS, BANGALORE
Or
Dear XYZ,
What is the price of black car?
From – abc@abc.com
[0024] The one or more user emails may be sent to the system 100 for handling from any device suitable for sending email, including computers, mobile phones, etc. The one or more user emails may also be sent by an automated system, e.g., for automatically generated messages or reports. The one or more user emails may be composed using an email client, such as outlook™, thunderbird™ etc. Further, the one or more user emails may also be composed by other means, e.g., using a contact form on a web site. In the latter the case the message may be converted to email and sent via conventional email forwarding and/or routing facilities. On the other hand, the message may also not be converted to email but forwarded to the email handling system via other means, for example, using proprietary protocols. Also, the data file may be stored in storage, e.g. sequential storage, independent from the destination address.
[0025] According to an embodiment, a mail server is operable to manage and maintain the one or more user emails messages for one or more users of the system 100. For example, the mail server may be operable to send and receive email to and from one or more external sources via the Internet, and to send and receive email between one or more users via an intranet. The mail server may be operable to store email messages in one or more folders, each of which may be owned or managed by the one or more users. The one or more user emails messages may be stored in, and copied between, hard drive and the memory 102, in accordance with the execution of a mail server.
[0026] According to an embodiment of the present disclosure, at step 302, the one or more hardware processors 104, perform, using a natural language processing module 202, an extraction of a set of normalized data from the first set of information by performing a normalization of one or more queries from a set of multi-queries using one or more machine learning algorithms (via the machine learning module 203), wherein the set of normalized data comprises one or more keywords matching with one or more correct intent word pairs in the set of multi-queries. The classification of the first set of information, that is, the one or more user emails (comprising of the one or more queries or the one or more questions etc.) may now be considered in detail. In an embodiment, the one or more user emails which require the classification are those which might be having the one or more stop words or stop sentences, long sentences or having one or more queries in the one or more user emails and/or any combination thereof.
[0027] According to an embodiment, the normalization of the one or more queries comprises preprocessing of the one or more user emails by firstly, breaking the one or more user email bodies into short sentences. It may be noted that even if the one or more user email bodies comprises one or more bridge words or linking words, the one or more user email bodies will be broken into two or multiple sentences depending upon the frequency of the one or more bridge words. For example, if the one or more user emails content comprises “People believe in supernatural powers but they are afraid to admit”, then the content may be broken as “People believe in supernatural powers” and “they are afraid to admit”.
[0028] According to an embodiment of the present disclosure, the preprocessing of the one or more user emails through the normalization further comprises implementing lemmatization and stemming techniques after breaking the one or more user email bodies into the two or multiple sentences for obtaining normal and plain form of the sentences. This may also comprise removing of the one or more stop words from the sentences. According to an embodiment, the lemmatization and stemming may be implemented to bring one or more words (from the broken sentences) to common base form, so that it can be identified as verb, noun and accordingly interpreted. However, there is a difference between the two techniques. Stemming is a crude way, wherein the one or more words (that is one or more actual words) are chopped for obtaining one or more base words. While in the lemmatization, the system 100 (through a machine learning algorithm or natural language processing technique implemented via the natural language processing module 202) attempts to relate context in the whole sentence, may perform a dictionary search to obtain the one or more base words. For example, considering the two statements, “Iam rating this movie below average” and “Rating typically decides movie quality”. With stemming, in both cases, the one or more base words can be ‘to rate ‘which is verb form, while using lemmatization, it will be ‘to rate’ – which is verb form for first sentence, and it will be ‘rating’- which is noun for second sentence. Another example, “I am walking” or “I am eating”. In these case, both (stemming and lemmatization) will provide ‘to walk’ or ‘to eat’ as the one or more base words, which is verb. Hence, taking an example, if the one or more user emails bodies has content like “Please proceed with finance approvals”, the lemmatization may be performed to obtain the content as ”Please go ahead with finance approvals” or “Please go on with finance approvals”. Similarly, if the one or more user emails bodies has content like “Proceeding with finance approvals”, it may be stemmed as “Proceed finance approve”. It may be noted that the embodiments of the present disclosure support performing the preprocessing (through lemmatization and stemming) using any natural language or machine language or any combination thereof. For example, using natural language, lemmatization may be performed (for extract lemmatizing the word “dancing” to extract the normalized word “dance”) as:
>>> import nltk
>>> from nltk.stem import WordNetLemmatizer
>>> lemmatizer_output=WordNetLemmatizer()
>>> lemmatizer_output.lemmatize('dancing')
'dancing'
>>> lemmatizer_output.lemmatize('dancing',pos='v')
'dance'
>>> lemmatizer_output.lemmatize('dances')
'dance'
[0029] According to an embodiment of the present disclosure, the preprocessing may further involve removing the one or more stop words (for example, an, but, with) from the first set of information. The purpose of removing the one or more stop words is that they make the text look heavier and less important for analysts. Removing the one or more stop words reduces the dimensionality of term space. For example, if the one or more user emails comprises “Can meeting be made compulsory”, the one or more stop words may be removed to obtain the content (without the one or more stop words) as “meeting, compulsory”. The one or more stop words comprises words that a search engine (in the system 100) has been programmed to ignore, both when indexing entries for searching and when retrieving them as the result of a search query.
[0030] According to an embodiment, an example of extracting the set of normalized data from the first set of information may now be considered. Suppose, the first set of information comprise of “Hi there,
I have recently moved to new address. Earlier I was staying at 240, Red Bldg., Raulf Street, London, 23 2323. I moved to new address 243, Blue Bldg, Raulf Street, London, 232 12121.
Please update address in system.
Thanks,
Ronny Sillot”
[0031] The natural language processing module 202 may then extract the set of normalized data using the one or more machine learning algorithm (and the techniques described above) as:
“I recently move to new address. Earlier I stay at 240, Red Bldg., Raulf Street, London, M4 1PW. I move to new address 243, Blue Bldg, Raulf Street, London, M4 2PW. Please update address in system.”
[0032] According to an embodiment of the present disclosure, at step 303, the set of normalized data may be classified for generating a second set of information by an intent identification technique to correlate the second set of information and one or more predefined system generated templates. According to an embodiment, the classification comprises categorizing the intent of the set of normalized data (for example, if the intent is to change address or to ask for statement) using an intent identification algorithm (based upon which an automated response to the set of multi-queries may finally be generated). The process of classification using the intent identification technique may now be considered in detail. The set of normalized data may be fed as an input into the intent identification algorithm, which further comprises of the one or more machine learning algorithms to generate one or more confidence values. In an embodiment, the intent identification technique (implemented via the intent identification algorithm) may select one or more support vector machines (SVM) modeling or a Naïve Bayes like algorithm for performing the classification.
[0033] According to an embodiment, upon execution, the intent identification algorithm will store one or more keywords with certain weightage in the database 210 for each category of intent (for example, the intent to change address will have a corresponding set of the one or more keywords and if the intent is, taking another example, to ask for a price of a product, it will have another corresponding set of the one or more keywords). The one or more keywords may comprise of a combination of word or words (N-grams) which are extracted from the set of user queries (that is mail body sentence) for that category. According to an embodiment, one or more weightages may then be assigned on basis of how strongly those words in the sentence implies or related to that intent category.
[0034] Referring again to the same example above, if the set of normalized data may be generated as (obtained as an output from step 302):
“I recently move to new address. Earlier I stay at 240, Red Bldg., Raulf Street, London, M4 1PW. I move to new address 243, Blue Bldg, Raulf Street, London, M4 2PW. Please update address in system.”
[0035] According to an embodiment of the present disclosure, for performing the classification as discussed, the intent identification technique may select either the one or more support vector machines (SVM) radial modeling or execute a Naïve Bayes like algorithm (via the one or more hardware processors 104). In an embodiment for making selection, the one or more hardware processors 104 scan through the previous system records comprising of corpus of emails with and without any previous classifications. Referring to table 1 below, an example of the classified corpus comprising of a plurality of email records with their respective body contents and manually assigned classification category may be referred. Both the SVM radial modeling and Naïve Bayes utilized the classified corpus for building their respective models. The non-classified corpus will also comprise of a plurality of email records with their respective body contents but no classification. Both the SVM radial modeling and Naïve Bayes utilize the non-classified corpus for building their respective models.
[0036] In an embodiment, the non-classified corpus may be used as an input to verify accuracy or confidence factor of both the SVM radial modeling and Naïve Bayes. Based upon the accuracy percentage of both the SVM radial modeling and Naïve Bayes, either of them may be selected. Hence, if the SVM radial modeling is giving 92% accuracy and the Naïve Bayes has an accuracy of 85%, the one or more hardware processors 104 may select the SVM radial modeling. Similarly, if the SVM radial modeling has an accuracy of 82% while the Naïve Bayes has an accuracy of 55%, the one or more hardware processors 104 may select the Naïve Bayes for performing the classification. Either of them selected is stored in the memory 102 of the system 101 as base model.
Table 1
Set of multi-queries from the users (or actual mail) Classification
Hi, Please change my address. I have moved to new address 3232, Abc House, London , 23 23232 Address change
I would like to change my address. I moved recently to new Address 32, Home House, ABR street, 34 333 Address change
Hey, Please update address. I shifted my address to new address. Address change
Can you please let me know expiry of my policy no. 34434343. Policy expiry
Please let me know the expiry date of my policy no. 2323232. Policy expiry
[0037] According to an embodiment of the present disclosure, based upon the base model selected (from either the SVM radial modeling or Naive Bayes) for performing the classification, the one or more hardware processors 104 may further compute a set of mean confidence values for each category of a plurality of categories based upon a plurality of confidence factors assigned to each of the categories. For example, if there are Y emails from P categories to be classified and assuming that the SVM radial modeling has been selected, the SVM radial modeling may be applied on each of the emails for every other classification category from P categories. Referring to table 2 below, the SVM radial modeling algorithm may, on application, return the plurality of confidence factors (in terms of percentage) for each classified category as below:
Table 2
Actual mail Classification / Category Confidence factors in percentage (computed using the SVM radial modeling)
Hi, Please change my address. I have moved to to new address 3232, Abc House, London , 23 23232 Address change 98
I would like to change my address. I moved recently to new Address 32, Home House, ABR street, 34 333 Address change 99
Hey, Please update address. I shifted my address to new address. Address change 97
Please let me know the expiry date of my policy no. 2323232. Policy expiry 2
In an embodiment, based upon the plurality of confidence factors computed using the SVM radial modeling, the one or more hardware processors may then generate the set of mean confidence values for each category from amongst the plurality of categories classified. Hence, referring to table 2 above, mean confidence value for “address change” category may be obtained as 98+99+97/3=98.
[0038] According to an embodiment of the present disclosure, for performing the classification, a proximity relevancy factor (PRF) may then be computed using N-gram model and a set of rules (defined by the rules engine 204 and stored in the rules database 205). In short, N-Grams comprises of one or more word prediction algorithm using probabilistic methods to predict next word after observing N-1 words. Therefore, computing the probability of the next word is closely related to computing the probability of a sequence of words. For example, for word “TEXT”, N-grams may be generated as follows:
bi-grams: _T, TE, EX, XT, T_ tri-grams: _TE, TEX, EXT, XT_, T_ _ quad-grams: _TEX, TEXT, EXT_, XT_ _, T_ _ _
It may be noted that in an embodiment, the N-grams model may be implemented using python or any other natural language for performing the classification.
[0039] The process of applying N-grams for computing the PRF using the set of rules may now be considered in detail. The one or more hardware processors 104 based upon the intent identification technique, initially select an email record from the plurality of email records and then select a category from the plurality of categories. For example, the one or more hardware processors 104 may select below content and the category as “address change”
“I recently move to new address. Earlier I stay at 240, Red Bldg., Raulf Street, London, M4 1PW. I move to new address 243, Blue Bldg, Raulf Street, London, M4 2PW. Please update address in system.”
[0040] According to an embodiment of the present disclosure, using the N-gram model, a set of matches for the category “address change” may be obtained. In an embodiment, if there are N sequences, using the N-gram model maximum N searches may be made to find the set of matches for the category “address change”. Based upon direct or exact matches obtained using the N-gram model, the plurality of confidence factors may be generated as 100%. In an embodiment, if no direct or exact matches are found, a search for synonyms may further be made for all words in the dictionary 206 (comprising of a lexical dictionary) for the content. So, if there are Z sentences with D number of synonyms, the search may be made Z*D*N times and again based upon the search if direct or exact matches obtained, the plurality of confidence factors may be generated as 100%. According to an embodiment, if the above two methods of searches fail, using the N-gram model a set of direct actual keywords may be searched from domain table to check if a direct match may be found with provided space words limitations. Again, if direct or exact matches are obtained, the plurality of confidence factors may be generated as 100%. For example,
I moved to new address. Kindly change my address. Here intent is change (of address) and Entity is address.
The second sentence matching both intent and entity may be searched and part-of-speech (POS) tagging may performed like below:
Kindly -> AdjK1 Change->VerbC1 My ->PronM1 Address ->SubA1
The above output may be stored in hashtable like below.
{AdjK1: VerbC1: PronM1:SubA1} -> Address Change
[0041] Further, if the direct actual keywords search also fails, a synonyms of keywords from the dictionary 206 (or the lexical dictionary) may be obtained by the one or more hardware processors 104 to find a match. If direct or exact matches are obtained, the plurality of confidence factors may be generated as 100%. Finally, if none of the searches work, the one or more hardware processors 104 find out how many words in the email record are matching with N-gram sequences of that category (single or multiple) and further correlate the sequence of match. Based upon the correlation and match performed, the one or more hardware processors 104 may return a set of closeness factors (for example, between 0% to 60% or more, where 0% may indicate a no match and 60% or more may indicate the most closest match) to indicate how close the matches are. So, the set of closeness factors for a category may be computed as:
Based upon the set of closeness factors for a category (selected earlier), the PRF may be computed as:
where, refers to the category selected (for example, address change) and is 4/5.
Hence, for the category “address change”, if the 100, the may be obtained as .
[0042] According to an embodiment, referring to table 3 below, the PRF may finally be computed as or as where the PRF is defined terms of percentage and weightage factor is assigned between 0% to 100% (where 0% denotes no matching, 95% or above very close matching and 100% exact matching) by the one or more hardware processors 104 depending upon the closeness of the match with the category “address change”.
Table 3
Classification Intent Entity Space Words Weightage Factor Prev Context
Change Address Change Address 2 95 Y
Change Address Do changes 4 65 Y
Change Address Update Address 2 95 Y
Change Address do modification 1 90 Y
Provide Statement Provide Statement 4 95 Y
Provide Statement generate Statement 4 95 Y
Provide Statement help Statement 5 85 Y
Based upon one or more weightage factors assigned (referring to table 3 again), and the plurality of confidence factors and the PRF computed, the second set of information after performing the classification may be generated as:
“Old Address: 240, Red Bldg., Raulf Street, London, M4 1PW
New Address: 243, Blue Bldg, Raulf Street, London, M4 2PW
Customer Name: Ronny Sillot”
Email Id:ronny.s@gmail.com”
[0043] According to an embodiment of the present disclosure, at step 304, the one or more hardware processors 104 perform a correlation of the second set of information and the one or more predefined system generated templates to generate an automated response to the set of multi-queries based upon a final confidence factor (FCF). The correlation is performed using a set of mapping rules (defined by the rules engine 204 and stores in the rules database 205) for determining whether based upon the PRF and the plurality of confidence factors computed in the step 303 above, the set of multi-queries (comprising of the one or more user emails) after performing the normalization and classification may be classified as corresponding to or matching with the category identified to generate an automated response to the set of multi-queries. For example, for the category ‘Address Change’ the mail need to be classified as ‘Address Change’ request.
[0044] In an embodiment, the set of mapping rules for performing the correlation of the second set of information and the one or more predefined system generated templates may be defined as below:
i. If the PRF and a mean confidence value (from amongst the set of mean confidence values computed for the plurality of categories) are above 85 % for a category, map email to that category.
ii. If the mean confidence value is >85 but the PRF is < 60, consider it as false positive
iii. If the mean confidence value is <40 and more than 10 >, and the PRF is > 90, map email to that category
iv. If the PRF and the mean confidence value both in 60 to 85 % category, map email to that category and apply training model for N-gram, so that next time, similar item comes the PRF will be near to 100 %.
v. If the PRF and the mean confidence value both are in between 40 to 60 %, put them in exception category so that manual inspection is done to validate result.
As both the PRF and the mean confidence value for the category “address change” are greater than 90% (the PRF being and the mean confidence value computed as 98 in step 303 above), the mapping may be performed.
[0045] According to an embodiment of the present disclosure, based upon the PRF and the mean confidence value, a final confidence factor (FCF) may be computed for determining whether the mapping of the second set of information and the predefined system generated templates is to be performed. In an embodiment, the FCF may be computed as below:
Suppose, the PRF is 100 and MCV=96, then the FCF will be:
[0046] According to an embodiment of the present disclosure, the mapping is to be performed only when the ( may be regarded as a predefined threshold value) to ensure high level of accuracy while generating a final automated response to the set of multi-queries. Taking above scenario, as the , that is, the , the mapping may be performed.
[0047] According to an embodiment of the present disclosure, based upon the mean confidence value, the PRF and the FCF, mapping may be performed. The methodology for performing the mapping may now be considered in detail. In an embodiment, the one or more predefined system generated templates comprises a repository of templates 208 with predefined responses for generating an automated response to the set of multi-queries by the one or more users. This may be performed by the response generating module 208. For example, for the set of queries comprising of address change requests, the database 210 will comprise of a template with different responses with respect to the address change. Similarly, for the set of queries comprising of policy information / details requests, the database 210 may comprise of a template with different responses with respect to the policy information / details. For example, in case the set of queries requesting address change, the repository of templates 208 may comprise of a plurality of templates which may comprise of below predefined responses:
“Address is already changed. And customer is again requesting for the same with actually changed address.
It’s a fresh request. And indeed new address is to be updated. And customer has provided all requisite validation details like date of birth, user_id as well as new address and old address.
It’s a fresh request for address change. But Customer has not provided validation details like Date of birth, user id.
It’s a fresh request for address change. But Customer has provided incorrect validation details like Date of Birth, User id.
It’s a fresh request for address change and customer has provided incorrect old address, not matching with existing records”.
[0048] Further, according to an embodiment, for all the above scenarios (that is predefined responses in the templates) the templates may comprise of an entry with specific status code returned by backend application programming interface (API) 212, that is, when an address change request is sent to the system 100 via the API 212, depending upon the backend validation and processing, a status code may be generated and communicated along with result to indicate transaction status. For example, a customer has asked for address change and provided all requisite validation details like address, DOB, user_id etc., the API 212 will update the result and send back status code 101 along with result indicating the status of the request.
[0049] In an embodiment, for example, a template for an address change (based upon the status code generated) may be represented as:
Status_Code:100
Dear Customer , your address is already updated in the system.
Status_Code:101
Dear Customer , As per your request, your new address is updated in the system.
Status_Code:102
Dear Customer , kindly provide your Date of Birth, and User Id to validate, before we update the new address in the system.
Status_Code:103
Dear Customer , please provide correct Date of Birth. The Date of Birth provided by you is not matching with our records.
Status_Code:104
Dear Customer , Your old address is not matching with our existing records. Kindly contact customer care or visit nearest branch.
[0050] According to an embodiment of the present disclosure, using the predefined system generated templates and the status codes, the one or more hardware processors 104 perform the correlation of the second set of information (generated in the step 303 above) and the one or more predefined system generated templates. The steps for performing the correlation may now be considered in detail. According to an embodiment of the present disclosure, the one or more hardware processors 104 firstly scan all the predefined system generated templates stored in the database 210 and then upload the predefined system generated templates into a dynamic memory (cache) of an application in the system 100. The application may comprise of a word processor, a web browser or a media player or any other software that enables the one or more users to perform specific tasks.
[0051] According to an embodiment of the present disclosure, since there may be a large number of templates, the present disclosure may use any most frequently used (MFU) algorithm to upload and store only those templates that are frequently used. In an embodiment, the templates may be aggregated and then stored as an aggregated content in the dynamic memory (cache) using HashMap. The HashMap may comprise of a map based collection class that is used for storing Key & value pairs.
[0052] According to an embodiment, the aggregation may then be performed for determining one or more response messages to be selected later (while generating an automated response to set of multi-queries) by using a set of rules based upon the status code and mapped templates. In an embodiment, the identified intent (for example, address change) may then be mapped with one or more appropriate predefined system generated template from amongst the predefined system generated templates (for example, address change template for address change intent). A backend interface, (for example BonTime®) may then process the request and return the result with appropriate status code (for example, status_code:101 if it’s successfully updated) with necessary details. Further, applying a set of mapping rules (discussed above) on the appropriate status code and mapped template, the appropriate content from the predefined system generated templates may be identified and mapped. For example, for the status code ‘101’ and ‘address change’ as tagged template, the one or more hardware processors 104 may select the response message “-> Dear , As per your request, your new address is updated in the system”. Based upon the mapped template and content, a response message be communicated to the one or more users further comprising of place holders with specific data fields, for example, , .
[0053] In an embodiment, a set of data values may then be stored in the memory 102 of the system 100 by the one or more hardware processor 104 for the place holders in contextual session against place holder names in earlier stages. For example, there may be a hashmap with entries like {CustName:’Mr. Alex Dude’; New Address:’24, Watson House, Almer Street, London, 32343’}. The one or more hardware processors 104 may then select the entries against the place holders from hashmap-contextual session memory and further populates the entries against the place holders. Based upon the entries updated, the response message gets transformed into “Dear Mr. Alex Dude, As per your request, your new address ‘24, Watson house, Almer street, London, 323 43’ is updated in the system”.
[0054] According to an embodiment of the present disclosure, the advantages of the present disclosure may now be considered. By implementing artificial intelligence, natural language processing, machine learning and intent identification techniques in preprocessing and mapping, the present disclosure facilitates a very high level of accuracy in comparison with the traditional systems and methods. Some of the traditional systems and methods provide for preprocessing using well known techniques, however, in terms of accuracy and multi-intent user queries (where there are multi-queries from the one or more users in a single email) they still lack accuracy in performing mapping and generating multiple automated responses via a single email. The proposed disclosure provides for extracting meaningful data items (required to process multi-intent user emails) from email using keyword matching, pattern matching, corpus of data and the NLP techniques. The proposed methodology further also validates the extracted data with backend enterprise systems if required, and processes the request in backend enterprise systems by using exposed web services from the enterprise systems. Once processing done, depending upon the outcome and data output, the proposed disclosure also provides for a solution for filling up an identified response template for that category (for which an action is required), and formats response.
[0055] The proposed disclosure not only facilitates computing of the PRF, the set of confidence factors for preprocessing using artificial intelligence, the NLP techniques etc. but in order to ensure high level of accuracy in mapping and generating correct automated response for multi-user queries, the proposed disclosure provides for computing of the FCF based upon which an estimate may be made of the accuracy with respect to whether finally the set of classified data is to be mapped with the predefined system generated templates. This further ensures very less chances of finding false negatives and false positives and missing on true Positives and true Negatives.
[0056] Further, the proposed disclosure performs the intent identification on individual sentences from the set of multi-queries to identify the intents, so there is very high accuracy, and it tends to give more accurate result. Besides, the identification of multiple intents or categories using machine learning or the NLP techniques become easier as individual sentence are processed. Also because of unique algorithm of classification (weightage given to keywords), it further improves accuracy and reduces false positives, or false negatives drastically. Still further, the proposed disclosure is built using open source libraries plus and a custom classification algorithm. It has a low cost of implementation and is less resources consuming as it does not requires a very high data processing computing power and supporting hardware.
[0057] In an embodiment, the memory 102 can be configured to store any data that is associated with automated query response management based on multi-query classification. In an embodiment, the information pertaining to the first set of information, the set of normalized data, the set of classified data, the PRF, the FCF are stored in the memory 102. Similarly the set of mapping rules that are to be used for the correlation also are stored in the memory 102. Further, all information (inputs, outputs and so on) pertaining to automated query response management based on multi-query classification may also be stored in the database 210, as history data, for reference purpose.
[0058] The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
[0059] It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g. any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g. hardware means like e.g. an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g. an ASIC and an FPGA, or at least one microprocessor and at least one memory with software modules located therein. Thus, the means can include both hardware means and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g. using a plurality of CPUs.
[0060] The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various modules described herein may be implemented in other modules or combinations of other modules. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
[0061] The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope and spirit of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
[0062] Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, BLU-RAYs, flash drives, disks, and any other known physical storage media.
[0063] It is intended that the disclosure and examples be considered as exemplary only, with a true scope and spirit of disclosed embodiments being indicated by the following claims.
| # | Name | Date |
|---|---|---|
| 1 | 201721044041-STATEMENT OF UNDERTAKING (FORM 3) [07-12-2017(online)].pdf | 2017-12-07 |
| 2 | 201721044041-REQUEST FOR EXAMINATION (FORM-18) [07-12-2017(online)].pdf | 2017-12-07 |
| 3 | 201721044041-FORM 18 [07-12-2017(online)].pdf | 2017-12-07 |
| 4 | 201721044041-FORM 1 [07-12-2017(online)].pdf | 2017-12-07 |
| 5 | 201721044041-FIGURE OF ABSTRACT [07-12-2017(online)].jpg | 2017-12-07 |
| 6 | 201721044041-DRAWINGS [07-12-2017(online)].pdf | 2017-12-07 |
| 7 | 201721044041-COMPLETE SPECIFICATION [07-12-2017(online)].pdf | 2017-12-07 |
| 8 | 201721044041-Proof of Right (MANDATORY) [23-01-2018(online)].pdf | 2018-01-23 |
| 9 | 201721044041-FORM-26 [23-01-2018(online)].pdf | 2018-01-23 |
| 10 | abstract1.jpg | 2018-08-11 |
| 11 | 201721044041-ORIGINAL UNDER RULE 6 (1A)-310118.pdf | 2018-08-11 |
| 12 | 201721044041-OTHERS [25-04-2021(online)].pdf | 2021-04-25 |
| 13 | 201721044041-FER_SER_REPLY [25-04-2021(online)].pdf | 2021-04-25 |
| 14 | 201721044041-COMPLETE SPECIFICATION [25-04-2021(online)].pdf | 2021-04-25 |
| 15 | 201721044041-CLAIMS [25-04-2021(online)].pdf | 2021-04-25 |
| 16 | 201721044041-FER.pdf | 2021-10-18 |
| 17 | 201721044041-US(14)-HearingNotice-(HearingDate-09-02-2024).pdf | 2024-01-09 |
| 18 | 201721044041-FORM-26 [07-02-2024(online)].pdf | 2024-02-07 |
| 19 | 201721044041-FORM-26 [07-02-2024(online)]-1.pdf | 2024-02-07 |
| 20 | 201721044041-Correspondence to notify the Controller [07-02-2024(online)].pdf | 2024-02-07 |
| 21 | 201721044041-Written submissions and relevant documents [23-02-2024(online)].pdf | 2024-02-23 |
| 22 | 201721044041-PatentCertificate04-03-2024.pdf | 2024-03-04 |
| 23 | 201721044041-IntimationOfGrant04-03-2024.pdf | 2024-03-04 |
| 1 | search_strategyAE_02-09-2021.pdf |
| 2 | searchE_17-09-2020.pdf |