F O R M 2
THE PATENTS ACT, 1970
(39 of 1970)
The patent Rule, 2003
COMPLETE SPECIFICATION
(See section 10 and rule 13)
TITLE OF THE INVENTION
METHOD AND SYSTEM FOR PERFORMING TREND ANALYSIS OF END USER BEHAVIOUR IN REAL-TIME
APPLICANT:
Zensar Technologies Limited, A company Incorporated in India under the Companies Act, 1956
Having address:
Zensar knowledge park,
Plot # 4, MIDC, Kharadi, off Nagar road, Pune-411014,
Maharashtra, India
The following specification particularly describes the invention and the manner in which it is to be performed.
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] The present application claims priority from Indian Provisional Patent Application No. 202021036357 filed on 24th August 2020, the entirety of which is incorporate herein by a reference.
TECHNICAL FIELD
[0002] The present subject matter described herein, in general, discloses an artificial intelligence (AI) based techniques for performing trend analysis of end user behaviour during a call with a service provider in real-time to serve the end user better.
BACKGROUND
[0003] Customer satisfaction plays a pivotal role for any organization in establishing relationship with their customers and thereby increasing their profitability. Business Process Outsourcing (BPO) is a measure took by many organizations towards establishing the relationship and attaining the profitability objectives. Every time when a customer (end user) encounters some challenge with operability of the product/services offered by the organization, he/she tends to dial in the customer center to seek resolution from a customer agent to the encountered malfunctioning.
[0004] It has been noticed that the customer agent may or may not be aware about the resolution sought by the customer. This becomes the pivotal point where the relationship is actually being established or hampered with their customers and thereby profitability is impacted in the longer run. In absence of any resolution, the customer tends to get offended leading into an unpleasant conversation. Thus, it becomes utmost important to rate both the customer and the customer agent to maintain prolonged relationship. Further, it is also equally important to solve any such issue between end user and the agent, in real-time, during the call by analysing the behavioural trend of the end user.
[0005] Currently, there exist solutions which may categorize and rate both the customer and the customer agent. However, such solutions follow an offline approach that converts an audio conversation into a textual format to analyse the sentiments in the conversation to rate them. Thus, there exists a need of a system
and method to identify an instigation point leading to the unpleasant conversation and rate both the customer and the customer agent in real-time. Further, there is also need of a system and method which may provide instructions to the agent for providing recommendations/solution to the customer based on the behaviour of the customer to provide him/her with more satisfactory and pleasant experience.
SUMMARY
[0006] The present disclosure overcomes one or more shortcomings of the prior art and provides additional advantages discussed throughout the present disclosure. Additional features and advantages are realized through the techniques of the present disclosure. Other embodiments and aspects of the disclosure are described in detail herein and are considered a part of the claimed disclosure.
[0007] In one non-limiting embodiment of the present disclosure an Artificial Intelligence (AI) based method for performing trend analysis of end user behaviour during a call with a service provider in real-time is disclosed. The method comprises processing, by a processor, speech signals from said call to identify the end user and the service provider. The method further describes accessing, by the processor, profile of the identified end user, stored in a profile database, based on at least one of the processed speech signals and the end user contact details. Thereafter, the method describes fetching, by the processor, pre-stored pitch information associated with the profile of the end user from the profile database and analysing, by a tone analyser, the pre-stored pitch information with real-time pitch information of the end user, captured during said call, to determine change in said pitch information, wherein the real-time pitch information is determined by continuously analysing the speech signals of the end user during said call, and wherein the real-time pitch information is indicative of tone of the end user during said call. Further, the method describes evaluating, by a Long Short-Term Memory (LSTM) modelling unit, the determined change in said pitch information to ascertain probability of triggered behaviour. The method further describes determining, by a sentiment modelling unit, sentiment of the end user and calculating, by a score calculation unit, a rating score of the end user during said call, based on the sentiment and the probability of the triggered behaviour. Lastly, the method
describes providing, by an output unit, instructions, to the service provider, to adapt the conversations according to the end user behaviour, based on the rating score. [0008] In another non-limiting embodiment, the present disclosure recites that the processing of the speech signals from said call further comprises extracting a plurality of segments from the speech signals, generating D-vector for each segment of the plurality of segments, generating segment wise embedding by aggregating the D-vector for each segment, and clustering the segment wise embedding to segregate speech signals of the end user from speech signals of the service provider. [0009] In yet another non-limiting embodiment, the present disclosure recites that the method further comprises detecting, by the processor, noise in the segregated speech signals by categorizing the segregated speech signals based on amplitude in each frequency region, wherein lowest and highest amplitude regions represent the noise. Said method further recites converting, by a speech to text converter, the segregated speech signals into textual format, and analysing, by the processor, the textual format to identify the end user and the service provider. The end user and the service provider are identified by associating textual content of the textual format with either the end user or the service provider using a Support Vector Machine (SVM) classifier.
[0010] In yet another non-limiting embodiment, the present disclosure recites that the method further comprises determining, by the processor, profiles similar to the end user profile, and calculating a similarity score for each determined profile, if the rating score is above a pre-defined threshold and sign of sentiment is positive. The method further comprises providing, by the processor, instructions to the service provider to make recommendations to the end user by Apriori analysis based on recommendations made to determined similar profiles in past, if the calculated similarity scores for the determined profiles are above a predefined value. Otherwise, the method recites providing, by the processor, instructions to the service provider to initiate a resolution process to resolve concern of the end user. [0011] In yet another non-limiting embodiment of the present disclosure, the present application discloses an Artificial Intelligence (AI) system for performing trend analysis of end user behaviour during a call with a service provider in real-
time. The system comprises a microphone configured to capture speech signals from said call. The system further comprises a processor in communication with the microphone. The processor is configured to: process the speech signals from said call to identify the end user and the service provider, access profile of the identified end user, stored in a profile database, based on at least one of the processed speech signals and the end user contact details, and fetch pre-stored pitch information associated with the profile of the end user from the profile database. The system further comprises a tone analyser operatively coupled to said processor and configured to analyse the pre-stored pitch information with real-time pitch information of the end user, captured during said call, to determine change in said pitch information., wherein the real-time pitch information is determined by continuously analysing the speech signals of the end user during said call, and wherein the real-time pitch information is indicative of tone of the end user during said call. The system comprises a Long Short-Term Memory (LSTM) modelling unit operatively coupled to the processor and configured to evaluate the determined change in the said pitch information to ascertain probability of triggered behaviour. The system further comprises a sentiment modelling unit operatively coupled to said processor and configured to determine sentiment of the end user. The system further comprises a score calculation unit operatively coupled to the processor and configured to calculate a rating score of the end user during said call, based on the sentiment and the probability of the triggered behaviour. Furthermore, the system comprises an output unit operatively coupled to said processor and is configured to provide instructions to the service provider to adapt the conversations according to the end user behaviour.
[0012] In yet another non-limiting embodiment of the present disclosure, wherein to process the speech signals from said call, the processor is configured to: extract a plurality of segments from the speech signals, generate D-vector for each segment of the plurality of segments, generate segment wise embedding by aggregating the D-vector for each segment, and cluster the segment wise embedding to segregate speech signals of the end user from speech signals of the service provider.
[0013] In yet another non-limiting embodiment of the present disclosure, the present application discloses that the system further comprises a speech to text converter in communication with the processor and the microphone. The at least on processor is configured to: detect noise in the segregated speech signals by categorizing the segregated speech signals based on amplitude in each frequency region, wherein lowest and highest amplitude regions represent the noise. The processor is configured to convert, using the speech to text converter, said segregated speech signals into textual format, and analyse the textual format to identify the end user and the service provider. The end user and the service provider are identified by associating textual content of the textual format with either the end user or the service provider using a Support Vector Machine (SVM) classifier. [0014] In yet another non-limiting embodiment of the present disclosure, the present application discloses that the processor is further configured to: determine profiles similar to the end user profile, and calculating a similarity score for each determined profile, if the rating score is above a pre-defined threshold and sign of sentiment is positive: provide instructions to the service provider to make recommendations to the end user by Apriori analysis based on recommendations made to determined similar profiles in past, if the calculated similarity scores for the determined profiles are above a predefined value, or provide instructions to the service provider to initiate a resolution process to resolve concern of the end user.
BRIEF DESCRIPTION OF THE DRAWINGS
[0015] The accompanying drawings, which are incorporated in and
constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed embodiments. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same numbers are used throughout the figures to reference like features and components. Some embodiments of system and/or methods in accordance with embodiments of the present subject matter are now described, by way of example only, and with reference to the accompanying figures, in which:
[0016] Fig. 1 illustrates a network implementation of a system for performing trend
analysis of end user behaviour during a call with a service provider in real-time, in
accordance with an embodiment of the present subject matter.
[0017] Fig. 2 illustrates a block diagram of a system for performing trend analysis
of end user behaviour during a call with a service provider in real-time, in
accordance with an embodiment of the present subject matter.
[0018] Fig. 3 illustrates a method for extracting short segments from audio and
producing D-vectors for each sliding window, in accordance with an embodiment
of the present subject matter.
[0019] Fig. 4 is a flow diagram illustrating an AI based method for performing trend
analysis of end user behaviour during a call with a service provider in real-time, in
accordance with an embodiment of the present subject matter.
[0020] Fig. 5 illustrates a process of performing trend analysis of end user
behaviour during a call with a service provider in real-time, by way of various
modules, in accordance with an embodiment of the present subject matter.
[0021] It should be appreciated by those skilled in the art that any block diagrams
herein represent conceptual views of illustrative systems embodying the principles
of the present subject matter. Similarly, it will be appreciated that any flow charts,
flow diagrams, state transition diagrams, pseudo code, and the like represent
various processes which may be substantially represented in computer readable
medium and executed by a computer or processor, whether or not such computer or
processor is explicitly shown.
DETAILED DESCRIPTION
[0022] In the present document, the word “exemplary” is used herein to mean
“serving as an example, instance, or illustration.” Any embodiment or
implementation of the present subject-matter described herein as “exemplary” is
not necessarily to be construed as preferred or advantageous over other
embodiments.
[0023] While the disclosure is susceptible to various modifications and alternative
forms, specific embodiment thereof has been shown by way of example in the
drawings and will be described in detail below. It should be understood, however
that it is not intended to limit the disclosure to the particular forms disclosed, but on the contrary, the disclosure is to cover all modifications, equivalents, and alternatives falling within the scope of the disclosure.
[0024] The terms “comprises”, “comprising”, “include(s)”, or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a setup, system or method that comprises a list of components or steps does not include only those components or steps but may include other components or steps not expressly listed or inherent to such setup or system or method. In other words, one or more elements in a system or apparatus proceeded by “comprises… a” does not, without more constraints, preclude the existence of other elements or additional elements in the system or apparatus.
[0025] In the following detailed description of the embodiments of the disclosure, reference is made to the accompanying drawings that form a part hereof, and which are shown by way of illustration specific embodiments in which the disclosure may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the disclosure, and it is to be understood that other embodiments may be utilized and that changes may be made without departing from the scope of the present disclosure. The following description is, therefore, not to be taken in a limiting sense.
[0026] The present invention will be described herein below with reference to the accompanying drawings. In the following description, well known functions or constructions are not described in detail since they would obscure the description with unnecessary detail.
[0027] The present disclosure discloses an artificial intelligence (AI) enabled system that perform real-time trend analysis of customer (also referred as end user) behaviour so that a customer agent (also referred as a service provider) can adapt to conversation trend. The AI system can identify an instigation point in case of an unpleasant conversation. The AI system may also suggest remedies for particular types of instigation after identifying their type. The system may also identify if the conversation trend is moving positively and may suggest the possible recommendations for selling products/services for the end user’s profile. In this
manner, the AI system enables that the interaction remains focused around the issue and hence both end user and the service provider engage in more meaningful conversation.
[0028] Referring to figure 1, a network implementation 100 of an AI system 102 to assign rating and providing recommendations, in real-time, to an end user and a service provider/a customer agent during an ongoing conversation is disclosed. Although the present disclosure is explained considering that the system 102 is implemented on a server, it may be understood that the system 102 may be implemented in a variety of computing systems, such as a laptop computer, a desktop computer, a notebook, a workstation, a mainframe computer, a server, a network server, a cloud-based computing environment. It will be understood that the system 102 may be accessed by multiple users through one or more user devices 104-1, 104-2…104-N, collectively referred to as user 104 or stakeholders, hereinafter, or applications residing on the user devices 104. In one implementation, the system 102 may comprise the cloud-based computing environment in which a user may operate individual computing systems configured to execute remotely located applications. Examples of the user devices 104 may include, but are not limited to, a IoT device, IoT gateway, portable computer, a personal digital assistant, a handheld device, and a workstation. The user devices 104 are communicatively coupled to the system 102 through a network 106.
[0029] In one implementation, the network 106 may be a wireless network, a wired network or a combination thereof. The network 106 can be implemented as one of the different types of networks, such as intranet, local area network (LAN), wide area network (WAN), the internet, and the like. The network 106 may either be a dedicated network or a shared network. The shared network represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Hypertext Transfer Protocol Secure (HTTPS), Transmission Control Protocol/Internet Protocol (TCP/IP), Wireless Application Protocol (WAP), and the like, to communicate with one another. Further the network 106 may include a variety of network devices, including routers, bridges, servers, computing devices, storage devices, and the like.
[0030] Further referring now to Figure 2, the AI enabled system 102 is illustrated in accordance with an embodiment of the present disclosure. In one embodiment, the system 102 may include a processor 202, a microphone 204, a tone analyser 206, a Long Short-Term Memory (LSTM) modelling unit 208, a Sentiment modelling unit 210, a speech to text converter 212, a score calculation unit 214, an input/output (I/O) unit 216, and a profile database 218. The processor 202 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the processor 202 is configured to fetch and execute computer-readable instructions stored in the memory.
[0031] The I/O unit 216 may include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like. The I/O unit 216 may allow the system 102 to interact with through the user devices 104. Further, the I/O unit 216 may enable the system 102 to communicate with other computing devices, such as web servers and external data servers (not shown). The I/O unit 216 can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. The I/O unit 216 may include one or more ports for connecting many devices to one another or to another server.
[0032] According to an embodiment, the AI system 102 may perform trend analysis of customer (referred herein as end user) behaviour during a call with a service provider i.e., customer agent, in real-time. The AI system 102 captures the voices of both the end user and the service provider in an audio format. In one implementation, the voices of both the end user and the service provider may be captured using a microphone 204 deployed on the user’s device and a voice recorder device deployed on the system 102. The voices, of both the end user and the service provider, captured are then detected in order to distinguish the voices of both the end user and the service provider. In one implementation, the voices may be detected by using a speech detection technique. In an embodiment, the speech
detection technique may use a Voice Activity Detector (VAD) to remove noise and non-speech content captured in the audio. It may be noted that the speech detection technique processes a plurality of speech signals and thereby maps each signal with its corresponding speaker i.e., either the end user or the service provider. Particularly, the processor 202 may process the speech signals from said call between the end user and service provider to identify the end user and the service provider.
[0033] In one embodiment, in order to identify the end user and the service provider using the speech signals, the processor 202 extracts a plurality of short segments (i.e., sliding window) from the speech signals. Thereafter, the processor 202 generates D-vector for each segment of the plurality of segments using the LSTM modelling unit 208 as shown in fig. 3. Thereafter, the processor 202 generates segment wise embedding by aggregating the D-vector for each segment and clusters the segment wise embedding to segregate speech signals of the end user from speech signals of the service provider.
[0034] Further, the processor 202 may detect noise in the speech signals by categorizing the speech signals based on amplitude in each frequency region. The sound events with the highest/lowest amplitude are assumed to be noise. In other words, the lowest and highest amplitude regions represent the noise. Further, the processor 202 converts said speech signals into textual format using the speech to text converter 212.
[0035] According to one embodiment of the present disclosure, the processor 202 may analyse the textual format to identify the end user and the service provider. The processor 202 identifies the end user and the service provider by associating textual content of the textual format with either the end user or the service provider using a Support Vector Machine (SVM) classifier. The historical ways of addressing end user would be used for to train the AI system 102 so that the system 102 can identify the service provider and the end user during real time conversation i.e., call. The AI system 102 uses the SVM classifier to understand if the given text is spoken by an end user or a service provider. For example, Ex: Hi! How may I help you?
Good Morning! How can I be of service to you?
Such templated statements which are used in an organization to open conversation with an end user would be used to train a Service provider Identifier model. Therefore, in real time when the service provider such as a service provider/customer agent utters:
Hello Good evening. I am XYZ. How can I assist you? The service provider identifier model would be able to identify it as a service provider as the terms offering assistance/greeting/the organization name would be one of key features in the trained model.
[0036] According to one embodiment of the present disclosure, the processor 202 may be configured to process the textual format for removing stop words, lemmatization, and lowercasing. The at least processor 202 may also remove gibberish that occurred during speech to text conversion. Further the processor 202 may also perform operation such as lowercasing and tokenization on the converted text format.
[0037] In accordance with one embodiment of the present disclosure, the system 102 models the historical data with at least following features such as converted text, toner analyzation, and user account information, but not limited thereto, using supervised mechanism to predict triggered/non triggered behaviour of the end user. The user account information may comprise information such as whether the end user is premium customer or a standard user, the end user’s salary, and his/her location, age, account activity history in the past, salary information, etc. but not limited thereto. The process of tone analyzation is explained below paragraphs in more detail.
[0038] According to one embodiment of the present disclosure, the processor 202 may be configured to access profile of the identified end user, in real time during the call, which is already stored in the profile database 218 of the system 102. The processor 202 may access the profile of the end user based on of the processed speech signals and the end user contact details. In an exemplary embodiment, the profile database 218 may be configured to store the profiles of plurality of the end users which are customer of that organization and have called at least once earlier.
The processer 202 may be configured to fetch pre-stored pitch information associated with the profile of the end user from the profile database 218. [0039] Further, the tone analyser 206 of the system 102 may continuously analyse the speech signals of the end user during said call to determine a change in pitch information of the end user. The tone analyser 206 analyses the fetched pre-stored pitch information with real-time pitch information of the end user captured during said call, to determine change in said pitch information. The real-time pitch information is indicative of tone of the end user during said call. The tone analyser 206 may pass the calculated change in the pitch information to the LSTM modelling unit 208 to determine whether said determined change is normal/abnormal change. [0040] The LSTM modelling unit 208 is operatively coupled to the processor 202 and may evaluate the determined change in the said pitch information to ascertain probability of triggered behaviour. According to an exemplary embodiment, at least past three conversations text, tone of the user and account details are passed to the LSTM modelling unit 208 in real time to predict the triggered behaviour. [0041] Further, the score calculation unit 214 unit of the system 102 may calculate a rating score of the end user based on sentiment associated with each sentence and the probability of the triggered behaviour from the LSTM modelling unit 208. In an exemplary aspect, the processor 202 of the system 102 may be configured to calculate the sentiment using VADER (Valence Aware Dictionary and Sentiment Reasoner) sentiment modelling unit 210. Particularly, the rating score may be calculated as below:
Rating score= Sentiment*Trigger Probability [0042] In accordance with an embodiment of the present disclosure, according to the rating scores, the system 102 may determine that whether the conversation trend is positive or negative. Precisely, in an aspect, if the rating score is above a pre¬defined threshold and sign of sentiment is positive, the processor 202 may determine profiles stored in the profile database 218 that are similar to the end user profile and may calculate a similarity score for each determined profile. When the calculated similarity score for a determined profile is above a predefined value, the processor 202 may provide instructions to the service provider, via the I/O unit 216,
to make recommendations to the end user based on recommendations made to determined similar profiles in past.
[0043] Precisely, when the calculated rating score is above a pre-defined threshold score and the sentiment is also positive, then the conversation is considered to be moving positively. Thus, when the conversation between the end user and the service provider is moving positively the service provider may up sell further products/policies to the end user which are suitable to the user’s profile (i.e. age limits, no-claims bonus level, insurance characteristics, etc.). The system 102 analyse historical customer (end user) profiles and similarity score is calculated using a cosine similarity technique. All the profiles where similarity is greater than threshold are considered for upselling the product/policies. The cosine similarity is defined as follows:
Where A and B are two customer (end user) profiles and similarity (A,B) is the similarity score. In one exemplary embodiment, all the customer (end user) profiles similar to current customer (end user) may be analysed to understand the best product to be sold to the end user. The system 102 may determine the similarity score using a Apriori AI algorithm and an association rule mining.
[0044] In order provide instructions the service provider to make recommendations the user, the system 102 can identify frequent purchased/sold item sets based on Lift values and support values. The support value is percentage of orders that contains the item or item set. For example, if there are 5 orders in total and an item A occurs in 3 of them, so:
support (A) = 3/5 or 60% [0045] Further, if there are two items, A and B, the lift indicates whether there is a relationship between A and B, or whether the two items occur together in the same order simply by chance (ie: at random). The lift value is calculated as below:
lift{A,B} = lift{B,A} = support{A,B} / (support{A} * support{B})
wherein, lift = 1 implies no relationship between A and B (i.e., A and B occur together only by chance),
lift > 1 implies that there is a positive relationship between A and B (i.e., A and B occur together more often than random), and
lift < 1 implies that there is a negative relationship between A and B (i.e., A and B occur together less often than random).
Based on the above information received from the one or more processor 202, the I/O unit 216 may provide instructions to the service provider to adapt the conversations according to the end user behaviour.
[0046] In accordance with another embodiment of the present disclosure, when the conversation between the end user and the service provider is moving negative, (i.e., when the condition that the rating score is a pre-defined threshold and sign of sentiment is positive, is not satisfied), the processor 202 may provide instructions to the service provider to initiate a resolution process to resolve concern of the end user.
[0047] Particularly, when the conversation is moving negatively, the service provider can utilize help from the system 102 to resolve the problem. The past three conversation sentences are passed through a Named Entity Recognition (NER) model. The NER model may be trained to identify the service type from the conversations. For example, if the end user is talking about claim reimbursement and the system 102 observe that the trend is moving negatively. The system 102 will query a ticket database 220 of the system 102 for such service type (i.e., reimbursement) and may suggest top three resolutions (top three frequent resolutions) for such services, but it is not limited thereto. If the end user is not satisfied by suggestion, then the end user can log a ticket into ticket service management.
[0048] In this manner, the present disclosure provides a system for real time automated rating to a call by understanding the user behaviour. For example, “What do you mean by he is busy for today! I needed him” is a negative sentence. A general sentiment analysis for this sentence might be negative, and the overall conversation flow for this end user might give negative analysis. But understanding that it was a
triggered behaviour which led to negativity can be used to suggest other alternatives
to the end user.
[0049] The embodiments described above, may be easily understood by way of
following example.
Step1: When a service provider receives a call from an end user, the speech is
continuously monitored by the system 102 to group speech waves into 2 clusters
[i.e., speech segmentation].
Ex: A: Hi. I am XYZ How may I assist you? [speech]
B: Hi. I reached out as my car insurance policy is about to expire [speech]
Step2: From that group of speech waves, each uttered sentence is converted into
text.
Ex: A: Hi. I am XYZ How may I assist you? [text]
B: Hi. I reached out as my car insurance policy is about to expire [text]
Step3: The system 102 may perform pre-processing such as removing stop words,
do lemmatization (convert word into root form), lowercasing, etc.
Ex: A: Hi. I am XYZ How may I assist you? [text]
Pre-processed would be: hi. i xyz. how I assist you?
B: Hi. I reached out as my car’s insurance policy is about to expire [text]
Pre-processed would be: hi. i reach my car insurance policy about expire Step 4: The pre-processed text is sent to the SVM classifier to identify who is the service provider and the end user, i.e., identify A as the service provider and B as the end user.
“The sentence hi. i xyz. how I assist you?” would be classified as assistance relates to help the examples that the system 102 is trained on. Therefore, A would be identified as the service provider. Thus, B would be identified as the end user. Step 5: The end user profile is accessed based on registered mobile number. Thereafter, pitch details of the end user are compared with the actual real time pitch details.
Pitch analyser = real time user pitch- actual stored user pitch
Step 6: For every end user utterance the trained LSTM model predicts whether it is a triggered behaviour or not. Also, the sentiment for each sentence is calculated.
rating score = triggered behavior probability * sentiment. Thus, when rating for each conversation is changes with a fixed threshold and the sign of sentiment is positive then the upsell process is triggered, else the system 102 suggest resolution to the current concern. For example from below real time conversation:
End user: Hi. I reached out as my car insurance policy is about to expire (Non triggered and neutral)
Service provider: Sure. Let me check.
Service provider: Yes, it is about to expire in 10 days.
End user: I wanted to renew (Non triggered and +Ve, Example – rating 0.3) Service provider: We have an additional loyalty discount for your policy on top of no claim bonus. [Service provider provided current information from database] End user: Oh Wow! That’s great!! (triggered and +ve, Example- rating 0.8. so a positive trigger of 0.5. If threshold is 0.4 for this process then service provider/customer agent can upsell products.)
Service provider: If you include your two wheeler in the same policy you can take advantage of the discount for your two wheeler as well. Would you like me to include two wheeler in the policy? (Upsell)
The product to be upsold will be predicted by analyzing the similar profiles to the end user and Apriori algorithm to predict the next service that could be bought by the end user based on his profile.
[0050] Fig. 4 discloses an Artificial Intelligence (AI) based method 400 for
performing trend analysis of end user behaviour during a call with a service provider in real-time. At step 402, the method 400 comprises processing speech signals from said call to identify the end user and the service provider. According to an embodiment, the present disclosure describes that speech signals of both the end user and the service provider may be captured in an audio format. In one implementation, the voices of both the end user and the service provider may be captured using a microphone 204 deployed on the user’s device and a voice recorder
device deployed on the system 102. According to another embodiment of the
present disclosure, the speech signals, of both the end user and the service provider,
captured are then processed to distinguish the voices of both the end user and the
service provider. In one implementation, the speech signals may be detected by
using a speech detection technique. In an embodiment, the speech detection
technique may use a Voice Activity Detector (VAD) to remove noise and non-
speech content captured in the audio. It may be noted that the speech detection
technique processes a plurality of speech signals and thereby maps each speech
signal with its corresponding speaker i.e., either the end user or the service provider.
[0051] Though not exclusively discussed in figure 4 but in order to process
the speech signals from said call, the method describes that a plurality of segments are extracted from the speech signals. Further, D-vector for each segment of the plurality of segments is generated. Thereafter, segment wise embedding is generated by aggregating the D-vector for each segment. The method further comprises clustering the segment wise embedding to segregate speech signals of the end user from speech signals of the service provider. According to one embodiment, the method further comprises detecting noise in the speech signals by categorizing the speech signals based on amplitude in each frequency region. The lowest and highest amplitude regions represent the noise. Further, the speech signals are converted into textual format and the textual format is analysed to identify the end user and the service provider. The end user and the service provider are identified by associating relevant content in the textual format, hereinafter referred to as textual content of the textual format with either the end user or the service provider using a Support Vector Machine (SVM) classifier. According to an embodiment, the textual format is pre-processed for removing stop words, lemmatization, and lowercasing.
[0052] At step 404, the method 400 recites accessing profile of the identified
end user, stored in a profile database 218, based on at least one of the processed speech signals and the end user contact details. The user contact details may comprises at least following, but not limited thereto: mobile number or email id. In an exemplary embodiment, the profile database 218 may be configured to store the
profiles of plurality of the end users which are customer of that organization and have called at least once earlier. Further, at step 406, the method 400 describes fetching pre-stored pitch information associated with the profile of the end user from the profile database 218.
[0053] Moving ahead, at step 408 the method 400 discloses analysing the pre-stored pitch information with real-time pitch information of the end user, captured during said call, to determine change in said pitch information. The real-time pitch information is determined by continuously analysing the speech signals of the end user during said call, wherein the real-time pitch information is indicative of tone of the end user during said call. The pitch information may be analysed using a tone analyser 206. Further, the determined change in the pitch information may be passed to the LSTM modelling unit 208 to determine whether said determined change is normal/abnormal change.
[0054] At step 410, the method 400 describes evaluating the determined change in said pitch information along with along with user uttered text to ascertain probability of triggered behaviour. At step 412, the method describes determining sentiment of the end user. Thereafter, at step 414 the method describes calculating a rating score of the end user during said call, based on the sentiment and the probability of the triggered behaviour. According to an embodiment of the present disclosure, the LSTM modelling unit 208 captures sequence of conversations and accordingly predicts whether it is a triggered behaviour. In one implementation, the triggered behaviour may also be predicted based on a set of parameters including, but not limited to, textual content, accent (tone), and user’s account information. In one embodiment, the accent determination speech model may be developed by modelling the vocabulary and tone used by people speak in high / low pitch. The user’s account information may include, but not limited to, premium or standard customer, account activity history in the past, location, salary information, age, etc. [0055] Lastly, the method 400 at step 416 describes providing instructions, to the service provider, to adapt the conversations according to the end user behaviour, based on the rating score. The method further describes determining profiles similar to the end user profile, and calculating a similarity score for each determined
profile, if the rating score is above a pre-defined threshold and sign of sentiment is positive. Further, if the calculated similarity scores for the determined profiles are above a predefined value, instructions are provided to the service provider to make recommendations to the end user based on recommendations made to determined similar profiles in past. Otherwise, instructions are provided to the service provider to initiate a resolution process to resolve concern of the end user.
[0056] In this manner, trend analysis of end user behaviour during a call with a service provider in real-time is performed and the service provider is provided with the instructions to adapt the conversations according to the end user behaviour. [0057] The system and method disclosed in the present disclosure can be easily understood with help of fig. 5 of the present disclosure. Fig. 5 illustrates a process 500 of performing trend analysis of end user behaviour during a call with a service provider in real-time, in accordance with an embodiment of the present subject matter. According to Fig. 5, the speech signals captured during the call is processed by a speech segmentation module 502. Said speech segmentation module 502 extract short segments (i.e., sliding window) from the captured speech signals and executes a LSTM model to produce D-vectors for each sliding window. Such speech segmentation technique is implemented to distinguish speech signals of the end user from speech signals of the service provider. Further, for each short segment, D-vectors belong to a segment may be aggregated to produce segment wise embeddings. Upon producing the segment wise embeddings, the short segment wise embedding may be clustered to detect the user’s and service provider’s speech segments. Thus, in this manner, the user’s speech is distinguished from the service provider’s speech.
[0058] Further, the speech to text module 504 may convert the speech signals into a textual format. It may be noted that the proposed technique facilitates to convert the speech signals, of both the end user and the service provider, into the textual format. In one implementation, the speech signals may be converted into the textual format by using a speech to text converter. Thereafter, a pre-processor module 506 may pre-process the converted text format for removing stop words, lemmatization, tokenization, and lowercasing.
[0059] Further, a speaker distinction module 508 may associate relevant content in textual format, hereinafter referred to as textual content, with either the end user or the service provider. In order to associate the textual content, the speaker distinction module 508 may be trained with Support Vector Machine (SVM) classifier to understand whether the textual content is spoken by the end user or the service provider. While training, it may further take into consideration that the service provider has fixed set of templated scrips to greet the end user and accordingly respond to the user’s queries. Such statements/scripts may be used to train module which in turn helps in distinguishing the voices of the end user and the service provider. Therefore, the speaker distinction module 508 may analyse the textual format to identify the end user and the service provider.
[0060] Furthermore, a tone analyser module 510 may analyse the tone of the end user. The tone analyser module 510 may accessing profile of the identified end user, stored in a profile database, based on at least one of the processed speech signals and the end user contact details. Thereafter, the tone analyser module 510 may fetch pre-stored pitch information associated with the profile of the end user from the profile database. In an exemplary embodiment, the profile database may be configured to store the profiles of plurality of the end users which are customer of that organization and have called at least once earlier. The tone analyser module 510 may analyse the pre-stored pitch information with real-time pitch information of the end user, captured during said call, to determine change in said pitch information. The real-time pitch information is determined by continuously analysing the speech signals of the end user during said call, wherein the real-time pitch information is indicative of tone of the end user during said call. [0061] Furthermore, a tigger detection module 512 may evaluate the determined change in said pitch information to ascertain probability of triggered behaviour. According to an embodiment of the present disclosure, trigger detection module 512 may capture sequence of conversations and accordingly predicts whether it is a triggered behaviour. In one implementation, the triggered behaviour may also be predicted based on a set of parameters including, but not limited to, textual content, accent (tone), and user’s account information. In one embodiment, the accent
determination speech model may be developed by modelling the vocabulary and tone used by people speak in high / low pitch. The user’s account information may include, but not limited to, premium or standard customer, account activity history in the past, salary information, age, etc., but not limited thereto.
[0062] Furthermore, a score generator module 514 may calculate rating score to end user. In one aspect, the rating score may be computed as the sentiment associated with each sentence multiplied by the prediction probability from a LSTM model of the trigger detection module 512. In one embodiment, the rating may be computed as the sentiment associated with each sentence multiplied by the prediction probability from the LSTM model.
[0063] If the rating score is above a pre-defined threshold and sign of sentiment is positive, a upsell module 516 may provide recommendation for upselling a product/service/policy. Otherwise, instructions to initiate a resolution process to resolve concern of the end user may be provided.
[0064] According to an embodiment, in order to make upsell recommendations, the upsell module 516 may determine profiles similar to the end user profile, and may calculate a similarity score for each determined profile. If the calculated similarity scores for the determined profiles are above a predefined value, the upsell module 516 may provide instructions to the service provider to make recommendations to the end user based on recommendations made to determined similar profiles in past. In other scene, the rating score is above a pre-defined threshold and sign of sentiment is positive.
[0065] According to another embodiment, when the conversation trend is moving negatively, the service provider may utilize help from the system to resolve the problem. The past three conversation sentences are passed through a service identifier module 518 that includes Named Entity Recognition (NER) model. The NER model may be trained to identify the service type from the conversations. For example, if the end user is talking about claim reimbursement and trend is moving negatively. The system may query a ticket database for such service type (i.e., reimbursement) and may suggest top three resolutions (top three frequent resolutions) for such services, but it is not limited thereto. If the end user is not
satisfied by suggestion, then the end user can log a ticket into ticket service
management.
[0066] The advantages of the present disclosure are as below:
• Enabling the modelling of automatically rating the users using Artificial Intelligence.
• Providing an AI system and an AI based method to model accent-based parameters to understand triggered behaviour.
• Enabling detection of trigger behaviours of user that may be mitigated by suggesting remedies according to the instigation type.
• Improving customer/end user satisfaction and reduce resolution time.
[0067] The illustrated steps are set out to explain the exemplary
embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed.
[0068] Alternatives (including equivalents, extensions, variations,
deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope of the disclosed embodiments.
[0069] Furthermore, one or more computer-readable storage media may be
utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., are non-transitory. Examples include
random access memory (RAM), read-only memory (ROM), volatile memory, non-volatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
[0070] Suitable processors/controllers include, by way of example, a general
purpose processor, a special purpose processor, a conventional processor, a digital signal processor (DSP), a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) circuits, any other type of integrated circuit (IC), and/or a state machine.
WE CLAIM:
1. An Artificial Intelligence (AI) based method (400) for performing trend
analysis of end user behaviour during a call with a service provider in real-time, the method comprising:
processing (402), by a processor (202), speech signals from said call to identify the end user and the service provider;
accessing (404), by the processor (202), profile of the identified end user, stored in a profile database, based on at least one of the processed speech signals and the end user contact details;
fetching (406), by the processor (202), pre-stored pitch information associated with the profile of the end user from the profile database;
analysing (408), by a tone analyser (204), the pre-stored pitch information with real-time pitch information of the end user, captured during said call, to determine change in said pitch information, wherein the real-time pitch information is determined by continuously analysing the speech signals of the end user during said call, and wherein the real-time pitch information is indicative of tone of the end user during said call;
evaluating (410), by a Long Short-Term Memory (LSTM) modelling unit (208), the determined change in said pitch information to ascertain probability of triggered behaviour;
determining (412), by a sentiment modelling unit (210), sentiment of the end user;
calculating (414), by a score calculation unit (214), a rating score of the end user during said call, based on the sentiment and the probability of the triggered behaviour; and
providing (416), by an output unit (216), instructions to the service provider, to adapt the conversations according to the end user behaviour, based on the rating score.
2. The method as claimed in claim 1, wherein processing (402) the speech
signals from said call further comprises:
extracting a plurality of segments from the speech signals;
generating D-vector for each segment of the plurality of segments;
generating segment wise embedding by aggregating the D-vector for each segment; and
clustering the segment wise embedding to segregate speech signals of the end user from speech signals of the service provider.
3. The method as claimed in claim 2, further comprising:
detecting, by the processor (202), noise in the segregated speech signals by categorizing the segregated speech signals based on amplitude in each frequency region, wherein lowest and highest amplitude regions represent the noise;
converting, by a speech to text converter (212), the segregated speech signals into textual format; and
analysing, by the processor (202), the textual format to identify the end user and the service provider,
wherein the end user and the service provider are identified by associating textual content of the textual format with either the end user or the service provider using a Support Vector Machine (SVM) classifier.
4. The method as claimed in claim 1, further comprises:
determining, by the processor (202), profiles similar to the end user profile, and calculating a similarity score for each determined profile, if the rating score is above a pre-defined threshold and sign of sentiment is positive;
providing, by the processor (202), instructions to the service provider to make recommendations to the end user by Apriori analysis based on recommendations made to determined similar profiles in past, if the calculated similarity scores for the determined profiles are above a predefined value; or
providing, by the processor (202), instructions to the service provider to initiate a resolution process to resolve concern of the end user.
5. An Artificial Intelligence (AI) system (102) for performing trend analysis of
end user behaviour during a call with a service provider in real-time, the system comprising:
a microphone (204) configured to capture speech signals from said call;
a processor (202) in communication with the microphone (204), wherein the processor (202) is configured to:
process the speech signals from said call to identify the end user and the service provider;
access profile of the identified end user, stored in a profile database, based on at least one of the processed speech signals and the end user contact details; and
fetch pre-stored pitch information associated with the profile of the end user from the profile database;
a tone analyser (206) operatively coupled to said processor (202) and configured to analyse the pre-stored pitch information with real-time pitch information of the end user, captured during said call, to determine change in said pitch information, wherein the real-time pitch information is determined by continuously analysing the speech signals of the end user during said call, and wherein the real-time pitch information is indicative of tone of the end user during said call;
a Long Short-Term Memory (LSTM) modelling unit (208) operatively coupled to the processor (202) and configured to evaluate the determined change in said pitch information to ascertain probability of triggered behaviour;
a sentiment modelling unit (210) operatively coupled to said processor (202) and configured to determine sentiment of the end user;
a score calculation unit (214) operatively coupled to the processor (202) and configured to calculate a rating score of the end user during said call, based on the sentiment and the probability of the triggered behaviour; and
an output unit (216) operatively coupled to said processor (202) and configured to provide instructions to the service provider to adapt the conversations according to the end user behaviour.
6. The system as claimed in claim 5, wherein to process the speech signals
from said call, the processor (202) is configured to:
extract a plurality of segments from the speech signals;
generate D-vector for each segment of the plurality of segments;
generate segment wise embedding by aggregating the D-vector for each segment; and
cluster the segment wise embedding to segregate speech signals of the end user from speech signals of the service provider.
7. The system as claimed in claim 6, wherein the system (102) further
comprises a speech to text converter (212) in communication with the processor
(202) and the microphone (204), and wherein the processor (202) is configured to:
detect noise in the segregated speech signals by categorizing the segregated speech signals based on amplitude in each frequency region, wherein lowest and highest amplitude regions represent the noise;
convert, using the speech to text converter, said segregated speech signals into textual format; and
analyse the textual format to identify the end user and the service provider,
wherein the end user and the service provider are identified by associating textual content of the textual format with either the end user or the service provider using a Support Vector Machine (SVM) classifier.
8. The system as claimed in claim 5, the processor (202) is further configured
to:
determine profiles similar to the end user profile, and calculating a similarity score for each determined profile, if the rating score is above a pre-defined threshold and sign of sentiment is positive:
provide instructions to the service provider to make recommendations to the end user by Apriori analysis based on recommendations made to determined similar
profiles in past, if the calculated similarity scores for the determined profiles are above a predefined value; or
provide instructions to the service provider to initiate a resolution process to resolve concern of the end user.