Goal Driven Integrated Text Mining

< Back

Goal Driven Integrated Text Mining

Abstract: Described herein is a method for deriving actionable intelligence from customer feedback data. In an implementation, the method for deriving actionable intelligence from customer feedback data includes creating a domain ontology based on at least one domain goal. The method further includes mining information from customer feedback data and identifying at least one action item based on the information and the domain goal. Further, the method includes assessing a priority of the at least one action item and updating the domain ontology based on the assessing

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

12 April 2011

Publication Number

48/2012

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Patent Number

Legal Status

Grant Date

2023-03-10

Renewal Date

Applicants

TATA CONSULTANCY SERVICES LIMITED

Nirmal Building 9th Floor Nariman Point Mumbai Maharashtra India

Inventors

1. DEY Lipika

TCS TOWERS Plot 249 D & E Udyog Vihar Phase IV Gurgaon-122016 India

2. HAQUE Sk. Mirajul

TCS TOWERS Plot 249 D & E Udyog Vihar Phase IV Gurgaon-122016 India

3. RAJ Nidhi

TCS TOWERS Plot 249 D & E Udyog Vihar Phase IV Gurgaon-122016 India

Claims

1. A system (102) for deriving actionable intelligence from customer feedback data, the system (102) comprising: a processor (202); and a memory (206) coupled to the processor (202), the memory (206) comprises: an acquisition module (213) to acquire at least one domain goal based on domain knowledge and input data, wherein the input data comprises the customer feedback data; an information extraction module (214) to identify at least one action item based on the input data and the at least one domain goal; an intelligence module (108) to assess relevance of the action item based on a priority associated with the action item.

2. The system as claimed in claim 1, wherein the acquisition module (213) comprises a data acquisition module to acquire the input data and isolate feedback texts from the input data.

3. The system as claimed in claim 2, wherein the acquisition module (213) comprises a knowledge acquisition module to acquire the domain knowledge, the domain knowledge having the at least one domain goal, based on a pre-specified set of documents.

4. The system as claimed in claim 3, wherein the pre-specified set of documents include a set of uniform resource locators specific to a domain.

5. The system as claimed in claim 3, wherein the domain knowledge comprises knowledge specified by a user.

6. The system as claimed in claim 1, wherein the intelligence module (108) comprises an assessment module (312) to assign a priority to the action item.

7. The system as claimed in claim 1, wherein the intelligence module (108) comprises an interaction module (314) to receive input from the user to facilitate interaction between the user and the system (100).

8. The system as claimed in claim 1, wherein the knowledge acquisition module (108) creates a domain ontology having nodes, wherein each of the nodes of the domain ontology represent at least one domain ontology concept.

9. The system as claimed in claim 8, wherein the domain ontology concepts comprise the at least one domain goal.

10. The system as claimed in claim 8, wherein the intelligence module (108) updates the domain ontology.

11. The system as claimed in claim 1, wherein the information extraction module (214) comprises: a pre-processing module (402) to provide one or more clean text units indicative of noise-free text from the input data; a natural language processing (NLP) module (404) to provide structured components that represent relationship among elements of each clean text unit; and a text mining module (406) to determine at least one pattern based on processing of the clean text units.

12. The system as claimed in claim 11, wherein the text mining module (406) comprises: a component identification module (408) to identify information components based on the structured components and identify relationships between the information components based on each clean text unit; an affect analysis module (410) to extract opinions indicative of one of customer pain-point and delights from each clean text unit based on the relationships; a tagging module (412) to tag each clean text unit based on domain ontology concepts present in the clean text units; a statistical pattern recognition module (414) to group the clean text units that are semantically cohesive based on the at least one domain goal; and a trend analysis module (416) to compute a trend partly based on the opinions and volumes of the clean text units.

13. The system as claimed in claim 12, wherein the intelligence module (108), based on the domain ontology concepts present in the feedback texts, assigns each feedback text to a node of the domain ontology.

14. The system as claimed in claim 1, wherein the intelligence module (108) assigns a weight to each node based on a number of feedback texts assigned to that node.

15. The system as claimed in claim 1, wherein the intelligence module (108) computes a satisfaction score based partly on the weight assigned to each node, and based on an overall aggregate of positivity and negativity of an opinion in each feedback text.

16. The system as claimed in claim 12, wherein the statistical pattern recognition module (414) comprises a text classification module for generating a classifier that can be applied to untagged feedback texts.

17. The system as claimed in claim 1 further comprises a reporting module to generate alerts and summarized results.

18. The system as claimed in claim 1 further comprises a workflow integration module to feed at least one action item to a workflow of the domain in an order of the priority of the action item.

19. The system as claimed in claim 1 comprises knowledge data (218) having at least one rule to process the input data.

20. The system as claimed in claim 18, wherein an output from the workflow integration module is fed to a domain ontology.

21. The system as claimed in claim 7, wherein an output of the interaction module (314) is fed to a domain ontology.

22. A computer implemented method for deriving actionable intelligence from customer feedback data in a goal-driven manner, the method comprising: creating a domain ontology based on at least one domain goal; mining information from customer feedback data; identifying at least one action item based on the information and the domain goal; assessing a priority of the at least one action item; and updating the domain ontology based on the assessing.

23. The method as claimed in claim 22, wherein the mining comprises a semi-supervised clustering technique involving a clustering algorithm to group semantically cohesive information.

24. The method as claimed in claim 22, wherein the at least one domain goal corresponds to a node of the domain ontology.

25. The method as claimed in claim 22, wherein the mining is based partly on NLP technique and a text mining technique.

26. The method as claimed in claim 25, wherein the NLP technique comprises a preprocessing technique to provide clean text units.

27. The method as claimed in claim 25, wherein the mining technique comprises: identifying, based partly on linguistic rules, information components from each clean text unit and relationships between the information components; extracting opinions from the information components based on the relationships; tagging each clean text unit based on the information components that qualify as at least one of a subject, an object, and an action; grouping the clean text units that are semantically cohesive, based on the at least one domain goal; and computing a trend partly based at least on the opinions and volumes of the clean text units aligned to each node of the domain ontology.

28. The method as claimed in claim 22, wherein the assessing comprises receiving inputs from a user to facilitate an interaction between the user and the domain ontology.

29. The method as claimed in claim 22, wherein the updating comprises tracking of interactions between user interactions.

30. The method as claimed in claim 22 comprises identifying actions items based on the mining.

31. The method as claimed in claim 22, wherein the assessing comprises assigning a priority to each action item.

32. The method as claimed in claim 22 comprises integrating the action items to a workflow.

33. A computer readable media for storing computer implemented instructions, said instructions when executed perform the acts comprising: creating a domain ontology based on at least one domain goal; mining information from customer feedback data; identifying at least one action item based on the information and the domain goal; assessing a priority of the at least one action item; and updating the domain ontology based on the assessing.

Specification

FORM 2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENTS RULES, 2003
COMPLETE SPECIFICATION (See section 10, rule 13)
1. Title of the invention:
GOAL-DRIVEN INTEGRATED TEXT MINING
2. Applicant(s)
NAME NATIONALITY ADDRESS
TATA CONSULTANCY Nirmal Building, 9th Floor, Nariman Point,
Indian
SERVICES LIMITED Mumbai-400021, Maharashtra, India
3. Preamble to the description
COMPLETE SPECIFICATION
The following specification particularly describes the invention and the manner in which it
is to be performed.

TECHNICAL FIELD
[0001] The present subject matter, in general, relates to free-text analysis and, in
particular, relates to systems and methods for goal-driven integrated text mining of free-text customer feedback.
BACKGROUND
[0002] Customer retention is an important business consideration for any commercial
enterprise. Customer retention majorly depends on a level of customer satisfaction. One of the methods of ensuring customer satisfaction is to obtain feedback from customers, regularly, and then take corrective actions based on an intelligent analysis of the feedback received so as to deliver what the customers expect. A customer feedback contains key information about levels of customer satisfaction or dissatisfaction with regard to products, services, or overall experience of the customers in dealing with the enterprise. Customer feedbacks may be in the form of complaints, desires, appreciations, problem descriptions, etc. A feedback may be explicitly stated or is implicit in text or content of the feedback.
[0003] Due to the rapidly changing profile of connectivity, customers have the option
of using multiple channels, such as open forums on the web, focused surveys, call centers, etc., to give their views or opinions regarding products and services they use. Volume of data of this kind has flooded almost all channels of customer interaction. Moreover, the rate of receiving such data has also become increasingly high.
SUMMARY
[0004] The subject matter described herein relates to systems and methods for
deriving actionable intelligence from customer feedback data in a goal-driven manner, which are further described below in the detailed description. This summary is not intended to identify essential features of the claimed subject matter nor is it intended for use in determining or limiting the scope of the claimed subject matter.
In an implementation, the method for deriving actionable intelligence from customer feedback data includes creating a domain ontology based on at least one domain goal. The method further includes mining information from customer feedback data and identifying at least one action item based on the information and the domain goal. Further, the

method includes assessing a priority of the at least one action item and updating the domain ontology based on the assessing.
BRIEF DESCRIPTION OF THE DRAWINGS
[0005] The above features and aspects and other features and aspects of the subject
matter will be better understood with regard to the following description, appended claims,
and accompanying figures. In the figures, the left-most digit(s) of a reference number
identifies the figure in which the reference number first appears. The use of the same
reference number in different figures indicates similar or identical items.
[0006] Fig. 1 illustrates an exemplary environment for implementing a goal-driven
integrated text mining system, in accordance with one embodiment of the present subject matter.
[0007] Fig. 2 illustrates an exemplary goal-driven integrated text mining system, in
accordance with one embodiment of the present subject matter.
[0008] Fig. 3 illustrates an expanded view of an exemplary intelligence module of the
goal-driven integrated text mining system, in accordance with one embodiment of the present subject matter.
[0009] Fig. 4 illustrates an expanded view of the exemplary information extraction
module of the goal-driven integrated text mining system, in accordance with one embodiment of the present subject matter.
[0010] Fig. 5 illustrates a method for deriving actionable intelligence from customer
feedback data in a goal-driven manner, in accordance with one embodiment of the present subject matter.
[0011] Fig. 6 provides a high-level context representation of an exemplary process
flow as illustrated in Figs. 1 to 5, in accordance with an embodiment of the present subject matter.
[0012] Fig. 7 provides a snapshot of the goal-driven integrated text mining system
acquiring knowledge from uniform resource locators and creating a domain ontology based on the knowledge, according to an embodiment of the present subject matter.

[0013] Fig. 8 provides a snapshot of the goal-driven integrated text mining system
attaching descriptions to different models and storing the descriptions in the domain ontology,
according to an embodiment of the present subject matter.
[0014] Fig. 9 provides a snapshot of the goal-driven integrated text mining system
assigning weights and information components to goals, according to an embodiment of the
present subject matter.
[0015] Figs. 10a and 10b provide snapshots of the goal-driven integrated text mining
system facilitating analyst interaction, according to an embodiment of the present subject
matter.
[0016] Figs. 11a, 11b, 11c, 11d, and 11e provide snapshots of various charts, reports,
and alerts generated by the goal-driven integrated text mining system, according to various
embodiments of the present subject matter.
DETAILED DESCRIPTION
[0017] The present subject matter is directed to systems and methods for deriving
actionable intelligence from goal-driven mining of customer feedback data. A goal or objective is a desired result a person or an entity, such as a business enterprise, envisions, plans, and commits to achieve. The person or entity may belong to a particular domain or area of activity, such as business, education, medicine, technology, etc. An entity in business domain, for example, may have certain goals set for a period to be achieved. These goals, or business goals as they are related to the business domain, may provide a basis to the entity for orienting its resources and priorities for that period. As goals are specific to a domain, the terms “goal” and “domain goal” can be used interchangeably.
[0018] Further, a domain may have goals or objectives that are diverse in nature and
each goal may further have a set of sub-goals attached to it. These sub-goals need to be fulfilled in order to achieve an ultimate goal of the domain. The sub-goals may define a set of tasks, actions, or methods to be deployed at various levels of a hierarchy of the domain, such as a business organization, to help in the achievement of the ultimate goal. The sub-goals may include goals of a division or a department in the organization, or a goal envisioned to a particular product or service offered by the organization or the department or division of the organization.

[0019] Further, associated to each goal, there may be a set of measurable parameters
that would help in computing a level of success achieved for each goal. As most of the goals, in the context of a business enterprise, are directed to and dependent on the level of satisfaction of customers, it becomes imperative for businesses to have a measure this critical parameter to assess success or failure of the goals or objectives, or to identify areas of improvement. The term “customer” as mentioned here refers to any direct or indirect customers of products and/or services offered by an industry or an enterprise or a business division of the enterprise. The term “customer” also includes a consumer, and thus the two terms have been used interchangeably hereinafter.
[0020] Now, customer satisfaction levels can be best judged by feedback provided by
the customers. Due to the rapidly changing profile of connectivity, customers have the option
of using multiple channels, such as open forums on the web, focused surveys, call centers,
etc., to give their inputs, even without asking them to give it, regarding products and services
they use. This has led to a manifold increase in such content over all such channels.
[0021] All such customer-generated data, hereinafter interchangeably referred to as
customer input data or customer feedback data, directed to various domains is available in abundance for businesses to derive useful information out it. All this data can provide valuable insights into long and short-term goals, for example, of a business entity. Generally, the customer input data, or simply input data, is received in text format, and is processed to provide relevant information. However, recorded data, such as a tape-recorded conversation of an interaction between a customer and, for example, a call center executive over phone may also be processed to extract the input data, in text format, by using methods known in the art.
[0022] Given the volumes, diversity, and noisy nature of the input data, it becomes
impossible for human beings to comprehend the impact of all of such input data manually, unless the task is approached in a structured fashion using automated tools. Moreover, as not all of the feedback data is relevant from the domain point of view, text mining, also interchangeably referred to as information mining, from the feedback data may not prove beneficial if the feedbacks are not aligned to any domain goal. This is why simple word-based processing of the customer feedback data without taking business perspective into account proves unfruitful.

[0023] There are systems that offer summary views of the content of the input data, its
elements, and relationships among the elements, etc. However, these systems are not capable of providing actionable intelligence, that is, what action is required to be taken or any other requirement that needs to be fulfilled to achieve the domain goal. Moreover, as requirements of a domain may change based on ever-changing domain goals and customer expectations, actionable items that need to be addressed may also change. Such intelligence about the actionable items to improve satisfaction levels of the customers based on changing domain requirements can only be provided by an expert who is capable of drawing intelligence from such a collection of information. Thus, the whole process of input data analysis and identifying the actionable items, hereinafter referred to as action items, becomes a human-dependent task.
[0024] To this end, the present subject matter discloses an integrated, self-
evolutionary automated system, with least of human intervention, for multi-perspective goal-driven text feedback analysis. The system facilitates for a domain, in a collective manner, an analysis of input data from customer in conjunction with domain-specific data or knowledge to provide actionable intelligence. A domain may be an industry comprising a group of enterprises engaged in marketing and delivering similar type of goods and services in competition with each other, or a single enterprise within an industry engaged in marketing and manufacturing of a range of products or services, or a business division of an enterprise responsible for marketing and manufacturing of a range of products or services, or a product or a department within a division, or a group or a team within a department, or such other entities. Hereinafter, the term “domain” shall be given a similar meaning as is explained hereinabove.
[0025] In an implementation, the method for deriving actionable intelligence from
customer feedback data includes creating a domain ontology based on at least one domain goal. The method further includes mining information from customer feedback data and identifying at least one action item based on the information and the domain goal. Further, the method includes assessing a priority of the at least one action item and updating the domain ontology based on the assessing
[0026] The system identifies all important concepts and topics present in a collection
of data representing the input data received via various channels, such as from blogs, web-

sites, forums, etc., and knowledge data received from various domain-specific documents and relates the two based on rules, priorities, and domain goals to identify certain action items. The concepts and topics may be understood as words and phrases that describe a relevant aspect that is important to that domain or to a customer in that domain. An example of such a concept may include a resolution of a camera. The system may assign a priority level to each such action item.
[0027] An analyst may then take appropriate actions by identifying information
pertaining to the domain as “regions of interest” within the input data based on priorities. The system may also analyze temporal trends in the collection to identify alarming, anomalous, and rare events and may be tuned to provide both short-term and long-term analysis based on temporal parameters. All this information extraction and analysis based on a domain goal forms a basis for actionable intelligence for that domain.
[0028] In addition, the system also generates comparative reports on diversities and
similarities of customer feedback received as part of the input data from across the channels. The framework of the present system is dynamic and adaptive to accommodate changing domain goals and evolving domain vocabulary over time.
[0029] In an implementation, in addition to providing actionable intelligence, the
system provides a mechanism for building a domain ontology, which is capable of being
updated based on new knowledge acquired. An ontology structure, as one may appreciate, is a
formal representation of knowledge as a set of concepts and topics within a domain and the
relationships between those concepts. In other words, the domain ontology is a type of
knowledge structure which may store domain knowledge relevant to a particular domain. A
domain ontology models a specific domain, such as specific to an industry or an enterprise or
a business, for example, a hotel business targeted to deliver three star services. The present
system provides for the creation of the domain ontology in an extensible and semi-automated
way, that is, it can be extended to sub-domains and can be updated automatically.
[0030] The domain ontology represents particular meanings of terms as they apply to
that domain. For example, the word “card” has many different meanings. However, a domain ontology about the domain of gambling industry would model “playing card” as the meaning of the word, while a domain ontology about the domain of Information Technology would model “punched card” and “video card” meanings. Thus, as an example, a domain ontology

specific to an industry or an enterprise, or a business division may store domain knowledge
about products, people, services, goals, and all other allied concepts like competitor names,
supplier names, etc., associated with that domain As an example, goals in the context of a
domain, may include profits, turnover, number of new customers, customer loyalty index,
customers lost, dissatisfied customers, number of offices, merger and acquisition plans, etc.
[0031] In an implementation, the system is provided with a set of web-sites, such as
blogs or social network sites related to a domain or a set of documents or policy papers containing information such as the type of products or services offered by the domain, its objectives, goals, and future plans. The system employs Natural Language Processing (NLP) and information extraction principles to create a semi-structured domain ontology, that is, a domain ontology having a scope for update, which can be edited by humans to derive the final domain ontology.
[0032] The final domain ontology may store, for example, description of products and
related accessories acquired in the form of words and phrases as declared in the enterprise documents or on the web-sites. In the final domain ontology, as an example, each business division of the enterprise may be represented as a node. Each node may be associated with a set of relevant products, groups, and/or services the business division is supposed to provide. Further, goals of each business division may be set as an aggregate of one or more sub-goals or tasks associated with each product group plus an additional set of goals defined explicitly for that business division in the documents and over the web-sites. These goals and sub-goals or tasks are also associated with the node(s). Hereinafter, the terms “sub-goals” and “tasks” may be used interchangeably to mean a lower level goal or objective in a hierarchy. These goals and sub-goals are expressed in terms of different parameters, such as production levels, marketing goals, stock availability, quality, cost, and customer satisfaction level, etc., associated with, for example, a set of products or services, their features, and other services associated with the products or services.
[0033] The final domain ontology storing all information about products and
associated features, services, and associated functionalities, etc., is made available to an analyst or an expert, hereinafter referred as a user. The final domain ontology may show all such information in the form of a graphical structure to the user. The user can then build a hyper-graph for integrating ontology of different product groups, each product group

representing a sub-domain. A hyper graph may be build by using entities and edges, the entities representing nodes of the hyper-graph and edges representing relationships between the entities. The entities may be taken as a set of products, services, and features expressed in noun form. A node may have a sub-node, the former generally referred to as a parent node, while the latter a child node.
[0034] In an implementation, the hyper-graph has weights assigned to different hyper-
edges connecting a node to an underlying domain ontology sub-graph. As may be understood by a person skilled in the art, a hyper-graph is a generalization of a graph, where an edge can connect any number of nodes. Edges in a hyper-graph, as one will also appreciate, are unordered pairs of nodes. By assigning weights or numerical values to each of the edges, a graph structure can be extended. This indicates a relative importance of the concepts, also referred to as domain ontology concepts, directed to a node towards a final domain ontology analysis. In an implementation, weights may be assigned at run-time. The domain ontology provides relevant background knowledge for classification of different text components, hereinafter referred to as feedback texts, in the input data and identification of action items based on such classification.
[0035] Further, since customer vocabulary and organizational vocabulary may not
match, the system also provides a mechanism to automatically enhance the domain ontology
with additional words from customer vocabulary through semi-supervised machine learning
techniques, like fuzzy clustering. The domain ontology nodes along with their associated
descriptions serve as a basis for tagging related feedback to these ontology nodes.
[0036] Though most of the conventional systems incorporate domain ontology, none
of them provide methodologies for building and enhancing a domain ontology automatically based on available knowledge and statistical mining of new concepts from customer feedback data. To this end, the system provides one or more action items to the user in order of priority. A domain specific action item may contain, for example, a business area to which the action item relates, domain goals that are affected by the action item, and a summary of data that caused the action item to be generated, that is customer feedback data.
[0037] In order to identify an action item, each feedback text is assigned a relevance
score based on its alignment to a node in the domain ontology. As each node represent a

domain goal or sub-goal, each domain goal is then assigned a comprehensive satisfaction
score based on its support in terms of number of feedbacks assigned to that node.
[0038] Further, in an implementation, all such satisfaction scores are used to derive a
priority score for each action item based on an interestingness of the action item. Interestingness may be taken as a function of several variables including domain criticality, temporal variation shown by the support data, propensity of the action item to cause other functional or technical problems, long-term and short-term impact on the domain, social implications, etc. In this way, the system as described herein allows for easy incorporation of domain perspectives and removes analyst dependency. The action items can also be integrated into a domain process workflow of the business entity for rapid action.
[0039] The following disclosure describes systems and methods of extracting
strategically important information from customer feedback data and deriving actionable intelligence from the extracted information in an integrated and evolutionary manner. While aspects of the described systems and methods can be implemented in any number of different computing systems, environments, and/or configurations, embodiments for the information extraction system are described in the context of the following exemplary system(s) and method(s).
[0040] Fig. 1 illustrates an exemplary network environment 100 implementing a goal-
driven integrated text mining system 102, according to an embodiment of the present subject matter. The network environment 100 includes a plurality of user devices 104-1, 104-2 ..., 104-N, collectively or individually referred to as user device(s) 104, in communication with the goal-driven integrated text mining system 102, hereinafter referred to as the system 102, through a network 106. Communication links between the user devices 104 and the system 102 are enabled through a desired form of communication, for example, via dial-up modem connections, cable links, digital subscriber lines (DSL), wireless or satellite links, or any other suitable form of communication. The network 106 may be a wireless network, a wired network, or a combination thereof. The network 106 can also be an individual network or a collection of many such individual networks, interconnected with each other and functioning as a single large network, e.g., the Internet or an intranet.
[0041] The network 106 can be implemented as one of the different types of networks,
such as intranet, local area network (LAN), wide area network (WAN), the Internet, and such.

The network 106 may either be a dedicated network or a shared network, which represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), Wireless Application Protocol (WAP), etc., to communicate with each other. The network 106 may include network devices, such as network switches, hubs, routers, host bus adapters for providing a link between the system 102 and the user devices 104. The network devices within the network 106 may interact with the system 102 and the user devices 104 through communication links.
[0042] Further, the user devices 104 may be one or more processor-driven computing
devices or communication devices, such as a desktop computer, a laptop computer, a network
computer, an online terminal, a hand-held computer, a mobile phone, land line phone, or other
similar equipment, which is compatible with the network 106. The system 102 is also a
processor-driven computing device, such as, a server computer, which may manage network
resources and responds to commands from the user devices 104. Feedback may be received in
various formats such plain-text, HTML text, or can be reproduced from a recorded
conversation, for example, between a customer and a call center executive.
[0043] The system 102 may be any computing device connected to the network 106.
For instance, the system 102 may be implemented as mainframe computers, workstations, personal computers, desktop computers, hand-held devices, multiprocessor systems, personal digital assistants, laptops, network computers, minicomputers, servers and the like. The system 102 employs various modules for extracting strategically important information from customer feedback data and deriving actionable intelligence from the information extracted in an evolutionary manner. In addition, the system 102 may include multiple servers to perform mirrored tasks for users, thereby relieving congestion or minimizing traffic. The user devices 104 and the system 102 both include computer readable data storage media, such as hard disk drives and RAM memory, which store program instructions and data
[0044] As mentioned earlier, the system 102 facilitates extraction of strategically
important information from customer feedback data in an evolutionary manner. Moreover, the system 102 is customizable and interactive in nature. The system 102 employs natural language processing techniques with statistical analysis for mining text feedback and aligning the feedback with objectives of the domain entity, such as an industry, an enterprise or a

business division. Natural Language Processing (NLP) refers to a collection of techniques that
are implemented as computer programs and employed to derive semantics from digital texts.
[0045] Statistical analysis, also referred to as statistical pattern recognition, relates to
finding components from unstructured and noisy text that are hard to derive based on linguistic and semantic analysis, but are discovered based on statistical similarities of the components with components indentified earlier. Components, as understood by a person skilled in the art, are nouns or noun phrases expressing opinions identified after the analysis. Mining refers to the task of assessing significance of certain patterns of the information components within the feedback texts.
[0046] The system 102 is configured to extract core domain concepts and action
items, for example, the type of industry, features and categories of products and/or services on offer, productivity, information about customer base, terminologies used or referred in that particular domain, etc., from domain-specific documents, such as policy documents, information on company web site, etc. The system 102 converts all of these data including domain concepts and action items into one or more semi-structured representations of a domain goal(s). Some of the generalized domain goals of an enterprise may include profits, revenues, expansion plans, etc. For this purpose, the system 102 includes an intelligence module 108, which in combination with other modules in the system 102 is capable of mining all of such relevant domain-specific information, also referred to as actions items, and prioritize them.
[0047] The network environment 100 further comprises a database 110
communicatively coupled to the system 102. The database 110 may store all data inclusive of customer input data and domain-specific data, such as business data, along with all extracted and processed information. The database 110 may also act as a store house for storing domain knowledge and domain ontology. The database 110 may also contain rules, including domain-specific rules, linguistic rules, etc., to process the extracted information. In addition, the database 110 may have any data required for the functioning of the system 102. Although the database 110 is shown external to the system 102, it will be appreciated by a person skilled in the art that the database 110 may be internal to the system 102.
[0048] In an implementation, the system 102 receives all input data inclusive of
customer input data and domain-specific information from multiple channels, in one or more

formats, for example, in HTML, plain text, text reproduced from a recorded conversation, etc. The system 102 processes this domain-specific customer input data and feedback text is isolated for further processing. As the feedback text is usually noisy and unstructured, it is first cleaned or normalized to remove noise from the text. Cleaning or normalizing involves identifying logical partitions in the text and removal of unnecessary symbols and normalizing the text. In an implementation, spelling errors are corrected using a context-dependent algorithm.
[0049] The output following cleaning, which is a set of sentences that can be
considered as units for analysis or clean text units, is processed, analyzed, and subsequently aligned with the nodes, which, as already mentioned represent domain goals. In an implementation, normalized feedback sentences following cleaning are tagged to their respective nodes of the domain ontology based on ontology concepts present in the sentences. As the nodes of the domain ontology are based on domain goals and objectives, the customer feedback too become goal-oriented and aligned. In an implementation, the domain goals may be acquired by the intelligence module 108 as natural language inputs.
[0050] Based on the analysis and alignment of the input data, information is extracted
and is assessed for its impact on the enterprise or a division of the enterprise or a set of
products/services offered by the enterprise or its business division to identify one or more
action items. In an implementation, the action items may be prioritized to give an output
which may form basis for actionable intelligence for the business entity. Further, an analyst
may interact with the system 102, and, in the process, enhance the domain ontology. In
addition, the system 102 may generate reports, which may include summarized results for the
user or the business entity in the form of charts and figures. Based on the statistics, the user
can take a decision whether to upgrade the system 102 with new knowledge about words that
are extracted and saved in a repository, but not recognized by a dictionary and also new
domain rules, and if yes, then how to upgrade the system 102. For illustration purpose only,
figures 11a and 11b show some sample reports that may be generated by the system 102.
[0051] Fig. 2 illustrates the goal-driven integrated text mining system 102, in
accordance with one embodiment of the present subject matter. The goal-driven integrated text mining system 102, hereinafter referred to as the system 102, includes processor(s) 202, input and output (I/O) interfaces 204, and a memory 206.

[0052] The processor(s) 202 can be a single processing unit or a combination of
multiple processing units. The processor(s) 202 can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, or any devices that manipulate signals based on operational instructions. Among other capabilities, the processor(s) 202 are configured to fetch and execute computer-readable instructions and data stored in the memory 206.
[0053] The I/O interfaces 204 may include a variety of software and hardware
interfaces, for example, interface for peripheral device(s) such as a keyboard, a mouse, an external memory, a printer, etc. Further, the I/O interfaces 204 enable the system 102 to communicate with other computing devices, such as web servers and external databases. The I/O interfaces 204 may facilitate multiple communications within a wide variety of protocols and networks, such as the network 106, including wired networks, e.g., local area network (LAN), cable, etc., and wireless networks, e.g., wireless LAN, cellular, satellite, etc. The I/O interfaces 204 may include one or more ports for connecting the system 102 to a number of computing devices, such as the user devices 104.
[0054] The memory 206 can include any computer-readable medium known in the art
including, for example, volatile memory (e.g., random access memory or RAM) and/or nonvolatile memory (e.g., flash memory). The memory 206 includes module(s) 208 and data 210. The modules 208, in general, include routines, programs, objects, components, data structures, etc., each of which performs a particular task or implement particular abstract data types.
[0055] The modules 208 may further include, in addition to the intelligence module
108, a configuration module 212, an acquisition module 213, an information extraction module 214, and other modules 216. The intelligence module 108, as also discussed in the description of Fig. 1, is configured to derive actionable intelligence based on customer feedback data and knowledge that is specific to a particular domain based on a goal specification. The intelligence module 108 may include various sub-modules to achieve the purpose. The sub-modules and their functioning are described in detail in subsequent figures descriptions.

[0056] Further, the configuration module 212 of the system 102 is used to configure
the system 102 for various tasks. For example, the configuration module 212 may set operational parameters like frequency of data extraction and report generation, create a set of generic report templates, provide weights to domain goals, etc. Then, the other module(s) 216 may include programs that supplement applications implemented by the system 102. In an implementation, the other modules 216 include an administrative module (not shown) for creating user accounts for different users. Different users like system administrator(s), domain experts, analysts, and other users may be allowed different permission levels based on their roles.
[0057] The acquisition module 213 is configured to acquire domain knowledge and
customer feedback data. The domain knowledge, such as business knowledge may be
acquired by a knowledge acquisition module (not shown in the figure), while customer
feedback data may be acquired from the customer input data by a data acquisition module (not
shown in the figure), both modules constituting the acquisition module 213.
[0058] The knowledge acquisition module employs information extraction principles
to extract core domain concepts from domain-specific documents and convert them to semi-structured representations of domain goals. For example, the system 102 may be provided with a set of Uniform Resource Locators (URLs) or a set of text documents relevant to, for example, the business division of the enterprise. These documents are expected to contain information about objectives of the division, its intermediate goals, and certain descriptive text about how these goals can be achieved. These documents are also likely to contain list of products and services offered by the business division. In order to extract information from URLs, site-specific crawlers may be employed. Subsequently, parsers may be employed to extract text from text documents and documents in HTML (HyperText Markup Language) extracted from these URLs.
[0059] Initially, document objects like headings, hyperlinks, list items, and named
entities may be used to identify possible nodes. Subsequently or alternatively, techniques for parsing tables may be deployed to create the domain ontology in its base form to store, for example, product models, services, components, features, enterprise divisions, etc. One may refer to figures 7 and 8, which show how product description can be used to build the domain

ontology. One may also employ interactive mechanisms where an analyst’s inputs may be used for the purpose of creating the domain ontology.
[0060] Further, domain goals are extracted using document processing techniques. To
this end, each domain goal may be represented as a hyper-graph containing weighted edges to different knowledge units. The domain goal may comprise, as also mentioned earlier, a set of products or services, a set of secondary products or services, and services to be offered along with these products and services in order achieve sub-goals like ensuring high availability, high quality, low-cost, and increased customer satisfaction. Further, domain goals may also be specified as rules. One may again refer to figures 7 and 8 which show how information about products, their features, and goals of the enterprise can be captured to build hyper-graphs. In an implementation, a rule language using ontology concepts as terms, fuzzy weights, and logical combination functions can be employed.
[0061] Further, paragraph-level and sentence-level groupings may be used to draw
edges between the nodes. In an implementation, these nodes may be described based on the
text extracted from proximity of these nodes using techniques such as Natural Language
Processing. Each connected component of such hyper-graphs may be marked as business-
critical knowledge components, also referred to as domain ontology concepts. These units are
then inserted into the domain ontology. The resulting graph can be edited, if required, by an
user, such a knowledge engineer or an analyst to derive the final domain ontology. For
illustrative purposes and not as a limitation, the aforementioned process of acquiring
knowledge from sources, such as URLs, and creating the domain ontology is shown in Fig. 7,
which shows source pages to gather information from (top left block), a crawler browsing a
web page (bottom left block), descriptive words and phrases gathered from the web page and
from knowledge extracted from an existing semi-structured ontology (bottom right block) to
create a final ontology (top right block). In an implementation, the domain ontology may also
include information about competitors and competing products offered by them.
[0062] In this way, knowledge crawls from the web-sites and enterprise documents
and enter the domain ontology, where domain goals are defined as a part of the domain ontology. Further, weights can be assigned to different edges joining the nodes to indicate importance of the edges towards computation of satisfaction levels. In an implementation, the weights can be adjusted at run time. This helps in simulating future scenarios to see how

satisfaction parameters may change with differing emphasis on different divisions, products, or services.
[0063] The association of weights allows end users to analyze whole of the feedback
data from multiple perspectives. In this way, different divisions may look at only those aspects of customer feedback that are relevant to them rather than looking at the entire set of issues extracted from the feedback text. For illustrative purposes and not as a limitation, the aforementioned process of assigning weights to domain goals, for example, specification of a camera, in the form of an importance value is shown Fig. 9.
[0064] Since customer vocabulary and domain vocabulary may not match, the system
102 further provides a mechanism to automatically enhance the domain ontology with additional words from customer vocabulary through semi-supervised machine learning techniques, for example, fuzzy clustering techniques.
[0065] Besides this, the data acquisition module receives all customer input data from
multiple channels in different formats. The data acquisition module processes whole of such input data to isolate customer feedback data / text(s) according to pre-defined schema definitions and isolates the feedback text for further processing.
[0066] The information extraction module 214 processes output of the acquisition
module 213 to provide relevant information, which subsequently forms the basis for identifying the action items. Feedback text isolated from the customer feedback data by the data acquisition module is sent to the information extraction module 214. The information extraction module 214 cleans and pre-processes and outputs sentences as units for subsequent analysis. The sentences go through the NLP techniques to provide structured components, which represent relationships among elements of a sentence. Relevant information, such as opinion, trend, statistical similarities, etc., is then mined from these structured components using text-mining, interchangeably referred to as information mining, techniques. For example, the information extraction module 214 computes overall negativity or positivity of sentiments or opinions expressed in a single sentence. Overall negativity or positivity of a feedback may be computed based on an aggregate of total number of “positive” and “negative” words of phrases in its text. For example, the overall negativity and positivity may be computed as a weighted function of positive and negative scores obtained by all sub-nodes

of a node. The functioning of the information extraction module 214 is discussed in detail in the description of Figure 4.
[0067] The data 210, on the other hand, may include knowledge data 218, object data
220, and other data 222. The knowledge data 218 may contain rules to process information extracted from the feedback text. The object data 220 comprise all data inclusive of customer input data and domain knowledge along with all extracted and processed information. In an embodiment, the knowledge data 218 as well as the object data 220 may be stored on an external database (shown in Fig. 1) instead of being present on the system 102 (as shown in Fig.2).
[0068] Further, the other data 222 includes data generated as a result of the execution
of one or more modules in the other modules 216. The other data 222 may also include, for example, information about activities external to the information extraction module 214. In an example, if some business knowledge acquired by the system 102 states that some product A is not for sale in some region X of a country Y, and then the system 102 receives a feedback saying “the product A is not yet available in region X of the country Y”. Such information may be used by the system 102 not to generate an alert even if lot of consumers located in Y complain about “non-availability” of the product A.
[0069] As an example, business goals can be taken as goals or objectives of the
enterprise and may be defined by the object data 220. Business objects may be defined and saved using an update file as the object data 220. The update file may be in any suitable format, for example, an XML (Extensible Markup Language) file. The contents of the file may be updated using the knowledge data 218. On the other hand, the knowledge data 218 is obtained following processing of the object data 220. The knowledge data 218 may change according to changes in the business needs. In an implementation, the knowledge data 218 can be verified and updated by an user, such as an analyst, which may be an administrator, a developer, or a computing system other than the system 102.
[0070] Fig. 3 illustrates an expanded view of the exemplary intelligence module 108
of the goal-driven integrated text mining system 102, in accordance with one embodiment of the present subject matter. The intelligence module 108 may include various sub-modules for identifying the action items and deriving actionable intelligence based on a priority of the action items. In addition, the intelligence module 108 facilitates updating of the domain

ontology with new domain-specific concepts and vocabulary. The intelligence module 108 works in tandem with the acquisition module 213 and the information extraction module 214, which are configured to extract information from the feedback text and domain knowledge from various sources to derive actionable intelligence.
[0071] In an implementation, the intelligence module 108 includes an assessment
module 312 and an interaction module 314. The intelligence module 108 processes output of
the information extraction module 214. In an implementation, the output of the information
extraction module 214 is received by the assessment module 312. The assessment module 312
judges relevance of the extracted information for the end user. In an implementation, a
volumetric analysis of the number of feedback that aligns with a domain goal or sub-goal may
provide a measure of support for the corresponding domain goal or sub-goal. In another
implementation, overall negativity or positivity of opinions with regard to a domain goal may
provide a measure of user satisfaction corresponding to the domain goal. Negativity or
positivity corresponding to a domain goal node of the domain ontology may be taken as a
weighted sum of negativity or positivity of all its constituent nodes. The computation of
overall negativity or positivity of an opinion is discussed in detail with respect to Fig. 4.
[0072] Based on the assessment of the relevance of each domain goal in the domain
ontology, the action items may be scored and accordingly prioritized as a function of, for example, the domain goal criticality, the support, and the observed trend based on a predefined time-parameter. The support for each parent node is given by the total number of unique feedbacks finally aligned to it.
[0073] In an implementation, following assessment of the relevance of the information
extracted, the interaction module 314 facilitates a user, such as an analyst to interact with the system 102 to add concepts that may not be part of the domain ontology earlier and enhance the same. An analyst may also query the system to verify output of the system 102 or browse through various data and results. The interaction module 314 may also help tracking and saving analyst interaction sequences, for example, for future reuse. For illustration purpose, and not as a limitation, the functioning of the interaction module 314 is shown with respect to Figs. 10a and 10b, which show how entities and actions can be modified to update the domain ontology and how new concepts can be tagged.

[0074] In an implementation, the system 102 may also include a reporting module and
an alert module (not shown), which can be programmed to generate alerts or summarized results for the end user on a regular basis, in the form of red flags or charts and figures. The user may also drill-down through the reports to view raw and processed data. Association among elements of the report is also possible to be viewed at multiple levels of granularity, for example, at sentence level, feedback level, domain goal level, etc. The reporting module can also provide intelligence about the underlying content around any other metadata that is associated to the feedback text. In an implementation, the reporting module may track and store all such analyst interactions for future use and display statistics about current usage. Based on the statistics, the analyst may take a decision to upgrade the system 102 with new knowledge acquired.
[0075] For illustration purpose only, and not as a limitation, Figs. 11a, 11b, 11c, 11d,
and 11e are provided which give an overview of various graphs and charts that the system 102 can generate. For example, the reporting module of the system 102 may generate graphs showing feedback volumes, an analysis of opinions, alarming trends, anomalous trends, and prioritization of action items by the system 102.
[0076] Further, the intelligence module 108 may include a workflow integration
module (not shown), which may facilitate upgrading the workflow of the business entity under consideration. In an implementation, the workflow integration module may output an actionable intelligence item based on prioritized action items generated by the assessment module 312. The intelligence module 108 can be directly fed into the workflow. The workflow integration module may also provide feedback to the system 102 about the corrective actions that may need to be taken by the domain, such as the enterprise or the division of the enterprise under question. This may prove useful in altering the assessment module 312, as and when required.
[0077] For example, in case a problem is detected with a product or a service and such
a detection has resulted in an action item requiring some improvement in the product or service, it is expected that after a specified amount of time, the problem will not be observed again. However, if the problem recurs after the stipulated time period, then it can be deemed as an emergency. Such a situation may initiate a call for a review of a knowledge base, which is part of the database 110, storing all previously reported actions, existing domain logic,

assessment process employed, and other allied actions. In an implementation, certain feedback are provided to the system 102 using the same rule definition language defined earlier as a part of defining domain goals.
[0078] Fig. 4 illustrates an expanded view of the exemplary information extraction
module 214 of the goal-driven text mining system 102, in accordance with one embodiment of the present subject matter.
[0079] In an implementation, the information extraction module 214 includes a pre-
processing module 402, a natural language processing (NLP) module 404, and a text mining module 406. The pre-processing module 402 is a data cleaning and storing module, which populates a back-end database server, such as the database 110, with raw data supplemented by additional information received about the data, wherever applicable.
[0080] In an implementation, customer feedback isolated by the data acquisition
module in text format from various channels and sources is fed to the pre-processing module 402. The pre-processing module 402 cleans and normalizes the feedback text. The cleaning involves identifying logical partitions in the text and removal of unnecessary symbols. For example, spelling errors are corrected using a context-dependent algorithm, as is known in the art. The output of pre-processing module 402 is a set of sentences that may be considered as units for analysis. Each sentence may be either considered to be a part of a feedback about a domain goal highlighting its entities, such as features and qualities expected by customers from the product(s), or a feedback about the entities itself.
[0081] Further, the information extraction module 214 includes the NLP module 404,
which operates on all the sentences extracted from the text feedback and stored in the database 110 after cleaning. The NLP module 404 tags words of each sentence with their respective part-of-speech (POS), parses the sentences, and then analyzes for its relationship or dependencies among the words. In an implementation, the NLP module 404 outputs different types of structured components, which represent relationships among various elements of a sentence.
[0082] The first type of structured component may include a parse tree stored as an N-
ary tree in the database 110. Within the N-ary tree, every word is tagged with its POS tag, every phrase is tagged with its type, and all the phrases can be traced back to a root. The second type of output component includes a set of dependency relations, which are generally

binary in nature. This component defines relationship between a pair of words in a sentence. In an implementation, the words extracted from the sentences are converted and stored in their root forms. Further, synonyms may be grouped together with the help of language dictionaries and the knowledge base. In addition, the NLP module 404 implements a normalization technique to identify semantically similar phrases, which may have surface dissimilarities. All semantically equivalent phrases are transformed and stored in a normalized fashion as cleaned and semi-processed text to ensure efficient retrieval at a later stage.
[0083] Further, the text mining module 406 processes the output of the NLP module
404 to extract relevant information out of the cleaned and semi-processed text. In an
implementation, the text mining module 406 includes a component identification module 408,
which identifies information components like subjects, objects, actions, or descriptions and
their relationships from each sentence processed using the NLP technique. The information
components may be composite, that is, made by combining multiple existing functions into a
new component. For illustration purpose, and not as a limitation, Fig. 7 shows a snapshot
where the system 100 showing functioning of the component identification module 408. In
the example shown in Fig. 7, two camera models are linked with their descriptions as
identified from the documents specified for acquiring domain knowledge.
[0084] The information components are identified using a dependency-driven analysis
technique. The component identification module 408 uses dependency relations output by a parser to construct a dependency graph. The dependency graph is traversed to identify different information components. In an implementation, traversal rules are designed based on linguistic rules of the underlying language. The linguistic components that qualify as subjects, objects, and actions are further identified as ontology concepts with varying degrees of similarity.
[0085] The text mining module 406 further includes an affect analysis module 410 for
extracting opinion from feedback sentence provided by the NLP module 404, thereby identifying customer pain-points and delights. Opinion mining in conventional systems considers only one aspect of feedback analysis, assuming that a feature name appears explicitly in customer feedbacks. However, it has been observed that customer comments may not be about product features only. The situation becomes even more prominent for analyzing feedback on services. In addition, the vocabulary used by customers is often very different

from the vocabulary used by the enterprise. For example, if a service unit is responsible for
“providing information,” and a customer expresses his dissatisfaction stating that he “could
not find the relevant information”. Handling this kind of situation requires understanding that
“not finding a service” should be interpreted as a negative sentiment for “providing services”,
a task that can be done with the help of ontology and not through word-based processing.
[0086] Moreover, various allied issues that feature in a discussion from where the
feedback is taken may also be rather important, even though they may not be directly related to any specific domain-related issues, such as about a product or a service, as such. For example, discussions may include topics like “content of lead in children’s toys” or “issues about packaging” or “badly made commercials” etc., which are just some indicators of negative sentiments that cannot be tracked through standard techniques like feature-based opinion mining. Further, conventional text mining systems are unable to deal with context dependent opinions.
[0087] For example, in the domain of automobiles, the word “low” may indicate both
positive and negative opinions, depending on the feature the word is used with. Like, in the automobile domain, “low consumption” will be a desirable feature though “low mileage” will be undesirable. Moreover, if a customer states that the “car is giving me only 9 in the city”, this should also be considered as a negative opinion. Thus, opinions need to be interpreted in conjunction with the feature it is describing and also the context along with prior knowledge about the pair. However, identifying all such possible pairs and storing the necessary knowledge about them a priori, is a near impossible task, and is also not scalable. The text mining techniques applied in the present system 102 provides a structured way of capturing this kind of knowledge so as to enhance the knowledge base of the system 102 to improve its performance in similar future analysis.
[0088] The affect analysis module 410 identifies subjects and objects of all opinions
expressed in a sentence along with their orientations, that is, positivity or negativity. Individual opinion expressing words may have an inherent positive or negative orientation based on their long-standing usage to express positive or negative sentiments or opinions, for example, “good”, “amazing”, “great”, “poor”, “terrible”, etc. On the other hand, opinion expressions are units built from the information components by considering individual opinion expressing words and their dependant and governing items. In order to build opinion

expressions, dependency components extracted by the component identification module 408
are used. Further, opinion score computation is done based on domain-sensitive analysis, in
which sentiment words, word modifiers, and negation elements are taken into account.
[0089] In an implementation, an opinion score is assigned to an entity-action pair
based on negation of actions in the context of an entity. An entity-action pair can be inferred
from knowledge stored in the ontology, which stores entities like products or service names
and a description or action associated to those items. This takes into account different ways in
which domain actions stored in the ontology can be negated. This includes dictionary based
look-up tables for known synonyms, antonyms, as well as implicit and explicit negations.
Further, the affect analysis module 410 computes an overall negativity or positivity of
sentiments or opinions expressed in a single sentence and outputs a set of pain-points and
delights in the form of ontology concepts as identified from each feedback text.
[0090] Further, the text mining module 406 includes an ontology-based tagging
module 412, which tags each feedback sentence based on the ontology concept present in the sentence, and thereby whole of the feedback as indicative of ontology concepts contained in the feedback sentence. The total number of feedback texts acquired under each ontology node gives the strength to these ontology concepts. The strengths are thereafter propagated to higher levels of ontology concepts and finally to domain goal-nodes to indicate the total strength of these concepts present in all the feedback text. The support for each parent node is given by a total number of unique feedback texts finally aligned to the parent node. These strengths can be viewed directly by an analyst even when a concept node is not part of any domain goal.
[0091] Further, considering that the output of the NLP module 404 is only indicative
and not exhaustive, the text mining module 406 includes a statistical pattern recognition module 414, which implements a semi-supervised clustering mechanism to group semantically cohesive feedback together. Clustering, as is known in the field of statistical data analysis, is an assignment of a set of observations into subsets, called clusters, so that observations in same clusters are similar in some sense. The initial supervision to the clustering process is provided by an existing domain goal(s). Starting with a set of clusters, where each cluster includes a set of sentences that are aligned naturally with an ontology node, the collection is enhanced through fuzzy clustering. The process is iterative and, at the

end of each iteration step, feedback sentences with sufficiently high membership to different clusters are inducted into those clusters.
[0092] In an implementation, the system executes a fuzzy clustering algorithm for
clustering the semantically cohesive feedback. The algorithm is such that it facilitates the statistical pattern recognition module 414 of the system 102 to choose initial clusters and features required for clustering. The fuzzy clustering algorithm as used herein, as an example, may have the following steps:
Step 1: Construct a word matrix for sentences that are made part of cluster centers, where 0≤ k ≤ number of clusters, 0≤ j ≤ number of terms, 0≤ i ≤ number of sentences.
Step 2: For k=0, T0ij = T0ij * 0.25
Step 3: Construct Mij for the remaining sentences, which have not been put to any cluster except cluster 0.
Step 4: Calculate a cluster center ≤ number of terms,
where )/Number of sentences
Step 5: Initialize membership matrix U0=[uik], where 0≤ i ≤ number of test

sentences, 0≤ k≤ number of clusters and UIK
Here m=1.1, so that sentences with higher membership value are given more weight than the others.
Step 5: Calculate maxi= for 0≤ I ≤ NUMBER OF SENTENCES.
Step 6: If maxi ≥ threshold value and maxi equals uik add mij to T ij and remove corresponding row from Mij
Step 7: If at nth iteration < 0.05 then STOP; otherwise return to
step 2.
[0093] In an implementation, each cluster is represented by its cluster center. Cluster
characteristic that is represented by the cluster center is suitably adjusted to reflect changed composition of the cluster. A process of induction, that is, addition of new feedback texts to the cluster during iteration, continues till there is no change in the composition of any cluster. The cluster centers associated to each ontology node are presented for inclusion to the analyst

or the domain expert, and can be accepted as enhancements to the node. All sentences that are not assigned to any pre-designated cluster with reasonably high memberships are grouped together and may be further clustered into smaller groups.
[0094] In another implementation, the statistical pattern recognition module 414
includes a text classification module (not shown). An analyst may provide a set of manually-labeled feedback to the text classification module to generate a classifier, which can be applied to label unseen feedback. For example, new feedback received by the system 102, which have not been manually tagged can be assigned labels by the system 102, after the system learns labeling from manually assigned labels. This sort of learning is known as supervised learning. In yet another implementation, the text classification module may also contain a rule-based classifier.
[0095] The text mining module 406 may further include a trend analysis module 416,
which analyzes opinion-trends and volume-trends. The trend analysis module 416 acts a
flexible component where the sentences for which trends are to be computed and granularity
of time, both can be indicated by the end user. Units for trend computation could be opinions
associated to groups of feedback texts or any other component like words or phrases
satisfying user-specified conditions. User-specified conditions refer to such specifications or a
nature of a desirable trend as specified by the user, such as an analyst, an expert, or any other
end user. Trends for concepts are then classified as alarming, anomalous or desirable based on
their comparison with user-specified conditions. For example, user may specify that for a
newly launched product desired trend is “Rising positivity and falling negativity” but for a
service which is known to be not so popular the expected trend is “steady negativity”.
[0096] In an implementation, the trend analysis module 416 employs a linear
regression model to detect rising, falling, and steady trends within the unstructured data and further classifies them as desirable, anomalous, or alarming before presenting them to the analyst for further action. An alarming trend is generated when positive opinion is steeply decreasing or negative opinion is showing a steep rise. An anomalous trend for a target concept is generated when feedback volumes for this concept shows a steep fall or rise in contradiction to desired trend.
[0097] In an implementation, novel yet interesting words or phrases may also be
identified as information components exhibiting high occurrence or showing increasing trend,

but are not part of any pre-defined domain goals or objectives. In addition, all default parameters can be tuned according to application requirements.
[0098] Fig. 5 illustrates a method for deriving actionable intelligence from customer
feedback data, in accordance with one embodiment of the present subject matter.
[0099] At block 502, a domain ontology is created based on at least one goal
specification. The domain ontology contains concepts and definition of the concepts relevant to a specific domain, for example, an industry or business type or a specific business entity. As an example, a business entity may have a diverse collection of knowledge related to its products, services, policies, etc., encoded into the ontology. In an implementation, goal specification is identified based on automated analysis of business documents such as organizational web-sites, strategy papers, and policy documents using text information processing techniques. As any product or service offered by the business entity has a set of basic characteristic features or concepts associated with it, these features, appearing with or without product name, if mentioned in a customer feedback can start as a starting point for goal-driven analysis.
[00100] In an implementation, the system 102 stores acquired domain knowledge in the
form of ontology. Each node of the domain ontology corresponds to a goal having some description extracted from the business documents. The domain ontology stores all product and service names, their descriptions, and inter-relationships. Additional nodes can be created in the domain ontology to describe a complete goal hierarchy useful for analyzing results at multiple levels of specification. The goal specification also includes scoring of each goal and sub-goal based on their perceived importance to the analysis task.
[00101] For example, if a food product has an associated description ‘nutritious,
delicious, low fat vanilla bean ice cream’, it can be assumed that the primary sub-goal here would be to attract health conscious customers, with the sub-sub-goals of providing nutrition and not compromising on taste. While the ultimate goal may be to increase sales of the product, which is a measurable quantity, customer feedback mining can help in assessing how well the product is performing on the sub-goals or sub-sub-goals. Thus, continuous monitoring of customer feedbacks can help in identifying alarming trends well-in time and help in taking preventive actions, if needed.

[00102] At block 504, information is mined from feedback text. In an implementation,
all input texts, inclusive of all customer feedback received from multiple channels in different formats, are received. The text is processed according to pre-defined schema definitions so as to isolate feedback text for further processing.
[00103] At block 506, at least one action item is indentified based on the information
and the goal specification. The system identifies action items or such actions that are required to be executed by the enterprise or its business division to achieve a domain goal or correct its processes, etc., to improve customer satisfaction levels. In an implementation, information is mined from the feedback text and aligned to one or more domain goals, which represents the nodes of the domain ontology, as mentioned earlier. For example, the information extraction module 214 is provided to extract a set of products and services for which goal-driven analysis is to be performed. Products and services can be linked to each other through a set of links that define hierarchical relationships among them.
[00104] Further, since customer feedback data is noisy in terms of its textual content,
conventional NLP techniques like phrase extractions do not perform very well with such feedback texts. In an implementation, a combination of NLP techniques and statistical text mining techniques may be used to align the feedback text to these descriptions and thereafter assess goal satisfaction. The NLP techniques may further involve pre-processing and text normalization techniques to remove noise from the text.
[00105] Cleaning involves identifying logical partitions in the feedback text, and
removal of unnecessary symbols. In addition, spelling errors are corrected, for example, using a context-dependent algorithm. The output following cleaning is a set of sentences that can be considered as units of analysis. All such units extracted from the feedback text and stored in data storage after cleaning.
[00106] In natural language processing, words in each sentence are tagged with
corresponding Parts-of-Speech they represent and the sentences are parsed and analyzed for dependencies among words. The output of the natural language processing consists of different types of structured components, which represent the relationship among words or elements of each sentence.
[00107] Thereafter, the output of the natural language processing is processed based on
the statistical text mining techniques. The statistical text mining techniques may include

identification of various components and their relationship within each sentences, extraction of opinions from the sentences to identify customer pain-points and delights, categorization of the sentences using tagging methods as indicative of ontology concepts contained in the sentences, collection of semantically cohesive feedback together, and identification of trends such as opinion and volume trends.
[00108] At block 508, impact of an action item is assessed. In an implementation,
relevance of the extracted information to the end user is judged based on volumetric analysis of feedback texts that align with the domain goal. Total negativity or positivity analysis provides a measure of user satisfaction corresponding to it. Prioritization of action items are scored as a function of domain goal priority, support, and observed trend based on a predefined time-parameter. Novel but interesting elements are identified as those information components which exhibit high occurrence or show increasing trend but are not part of any pre-defined domain goal or objective. These elements can be entered into the ontology after validation by human analysts.
[00109] At block 510, the domain ontology is updated based on the impact assessment.
Following assessment of the impact of the action item(s), the domain ontology is updated. Based on the statistics, the analyst can take a call on whether and how to upgrade the system with new knowledge about words that are there in the repository but not recognized by the dictionary, and also new domain-specific rules. In another implementation, the system 102 provides feedback about the action taken by the business entity to modify the system 102. This is useful in altering the assessment mechanism of the system 102. Feedbacks are provided to the system using the same rule definition language defined earlier as a part of specifying domain goals.
[00110] In addition, association among the elements is possible to be viewed at
multiple levels of granularity: sentence level, feedback level, goal level, etc. The system can also provide intelligence about the underlying content around any other metadata that is associated to the feedbacks. All analyst interactions can be tracked and stored for future use. Statistics about current usage is also displayed through this module.
[00111] Figure 6 provides a high-level context representation of the exemplary process
flow, in accordance with an embodiment. The illustration is merely an example of how to derive actionable intelligence based on domain knowledge and feedback text, and thus should

not be construed as limiting. The illustration is based on the description as provided earlier with respect to Figs. 1 to 5.
[00112] Figures 7 to 11 represent various illustrations representing different stages of
deriving actionable intelligence, according to different embodiment of the present subject matter. Figures 7 to 11 respectively illustrate how to acquire business knowledge and create a domain ontology, associate description with ontology nodes, assign weights to the nodes, facilitate interaction with an analyst, etc., in addition to various charts and graphs that the system 102 may generate.
[00113] The above description has been given in reference to a goal-driven integrated
text mining system 102 used for deriving actionable intelligence from consumer generated text in text format, however, a person skilled in the art will appreciate that the goal-driven integrated text mining system 102 can be conveniently used for mining information from customer feedback in other formats and may be implemented in different domains such as various industry domains, business domains, and enterprise domains.
[00114] Although embodiments for goal-driven integrated text mining system 102 for
tracking and updating business processes have been described in language specific to structural features and/or methods, it is to be understood that the invention is not necessarily limited to the specific features or methods described. Rather, the specific features and methods are disclosed as exemplary embodiments for the goal-driven integrated text mining system 102 and methods.

I/We Claim:
1. A system (102) for deriving actionable intelligence from customer feedback data, the
system (102) comprising:
a processor (202); and
a memory (206) coupled to the processor (202), the memory (206) comprises:
an acquisition module (213) to acquire at least one domain goal based on domain knowledge and input data, wherein the input data comprises the customer feedback data;
an information extraction module (214) to identify at least one action item based on the input data and the at least one domain goal;
an intelligence module (108) to assess relevance of the action item based on a priority associated with the action item.
2. The system as claimed in claim 1, wherein the acquisition module (213) comprises a data acquisition module to acquire the input data and isolate feedback texts from the input data.
3. The system as claimed in claim 2, wherein the acquisition module (213) comprises a knowledge acquisition module to acquire the domain knowledge, the domain knowledge having the at least one domain goal, based on a pre-specified set of documents.
4. The system as claimed in claim 3, wherein the pre-specified set of documents include a set of uniform resource locators specific to a domain.
5. The system as claimed in claim 3, wherein the domain knowledge comprises knowledge specified by a user.

6. The system as claimed in claim 1, wherein the intelligence module (108) comprises an assessment module (312) to assign a priority to the action item.
7. The system as claimed in claim 1, wherein the intelligence module (108) comprises an interaction module (314) to receive input from the user to facilitate interaction between the user and the system (100).
8. The system as claimed in claim 1, wherein the knowledge acquisition module (108) creates a domain ontology having nodes, wherein each of the nodes of the domain ontology represent at least one domain ontology concept.
9. The system as claimed in claim 8, wherein the domain ontology concepts comprise the at least one domain goal.
10. The system as claimed in claim 8, wherein the intelligence module (108) updates the domain ontology.
11. The system as claimed in claim 1, wherein the information extraction module (214) comprises:
a pre-processing module (402) to provide one or more clean text units indicative of noise-free text from the input data;
a natural language processing (NLP) module (404) to provide structured components that represent relationship among elements of each clean text unit; and
a text mining module (406) to determine at least one pattern based on processing of the clean text units.

12. The system as claimed in claim 11, wherein the text mining module (406) comprises:
a component identification module (408) to identify information components based on the structured components and identify relationships between the information components based on each clean text unit;
an affect analysis module (410) to extract opinions indicative of one of customer pain-point and delights from each clean text unit based on the relationships;
a tagging module (412) to tag each clean text unit based on domain ontology concepts present in the clean text units;
a statistical pattern recognition module (414) to group the clean text units that are semantically cohesive based on the at least one domain goal; and
a trend analysis module (416) to compute a trend partly based on the opinions and volumes of the clean text units.
13. The system as claimed in claim 12, wherein the intelligence module (108), based on the domain ontology concepts present in the feedback texts, assigns each feedback text to a node of the domain ontology.
14. The system as claimed in claim 1, wherein the intelligence module (108) assigns a weight to each node based on a number of feedback texts assigned to that node.
15. The system as claimed in claim 1, wherein the intelligence module (108) computes a satisfaction score based partly on the weight assigned to each node, and based on an overall aggregate of positivity and negativity of an opinion in each feedback text.
16. The system as claimed in claim 12, wherein the statistical pattern recognition module (414) comprises a text classification module for generating a classifier that can be applied to untagged feedback texts.

17. The system as claimed in claim 1 further comprises a reporting module to generate alerts and summarized results.
18. The system as claimed in claim 1 further comprises a workflow integration module to feed at least one action item to a workflow of the domain in an order of the priority of the action item.
19. The system as claimed in claim 1 comprises knowledge data (218) having at least one rule to process the input data.
20. The system as claimed in claim 18, wherein an output from the workflow integration module is fed to a domain ontology.
21. The system as claimed in claim 7, wherein an output of the interaction module (314) is fed to a domain ontology.
22. A computer implemented method for deriving actionable intelligence from customer feedback data in a goal-driven manner, the method comprising:
creating a domain ontology based on at least one domain goal;
mining information from customer feedback data;
identifying at least one action item based on the information and the domain goal;
assessing a priority of the at least one action item; and
updating the domain ontology based on the assessing.
23. The method as claimed in claim 22, wherein the mining comprises a semi-supervised clustering technique involving a clustering algorithm to group semantically cohesive information.
24. The method as claimed in claim 22, wherein the at least one domain goal corresponds to a node of the domain ontology.

25. The method as claimed in claim 22, wherein the mining is based partly on NLP technique and a text mining technique.
26. The method as claimed in claim 25, wherein the NLP technique comprises a preprocessing technique to provide clean text units.
27. The method as claimed in claim 25, wherein the mining technique comprises:
identifying, based partly on linguistic rules, information components from each clean text unit and relationships between the information components;
extracting opinions from the information components based on the relationships;
tagging each clean text unit based on the information components that qualify as at least one of a subject, an object, and an action;
grouping the clean text units that are semantically cohesive, based on the at least one domain goal; and
computing a trend partly based at least on the opinions and volumes of the clean text units aligned to each node of the domain ontology.
28. The method as claimed in claim 22, wherein the assessing comprises receiving inputs from a user to facilitate an interaction between the user and the domain ontology.
29. The method as claimed in claim 22, wherein the updating comprises tracking of interactions between user interactions.
30. The method as claimed in claim 22 comprises identifying actions items based on the mining.
31. The method as claimed in claim 22, wherein the assessing comprises assigning a priority to each action item.

32. The method as claimed in claim 22 comprises integrating the action items to a workflow.
33. A computer readable media for storing computer implemented instructions, said instructions when executed perform the acts comprising:
creating a domain ontology based on at least one domain goal;
mining information from customer feedback data;
identifying at least one action item based on the information and the domain goal;
assessing a priority of the at least one action item; and
updating the domain ontology based on the assessing.

Documents

Application Documents

#	Name	Date
1	Form-3.pdf	2018-08-10
2	Form-1.pdf	2018-08-10
3	Drawings.pdf	2018-08-10
4	1205-MUM-2011-FORM 26(17-6-2011).pdf	2018-08-10
5	1205-MUM-2011-FORM 18(19-8-2011).pdf	2018-08-10
6	1205-MUM-2011-FORM 1(15-4-2011).pdf	2018-08-10
7	1205-MUM-2011-FER.pdf	2018-08-10
8	1205-MUM-2011-CORRESPONDENCE(19-8-2011).pdf	2018-08-10
9	1205-MUM-2011-CORRESPONDENCE(17-6-2011).pdf	2018-08-10
10	1205-MUM-2011-CORRESPONDENCE(15-4-2011).pdf	2018-08-10
11	1205-MUM-2011-OTHERS [14-12-2018(online)].pdf	2018-12-14
12	1205-MUM-2011-FER_SER_REPLY [14-12-2018(online)].pdf	2018-12-14
13	1205-MUM-2011-COMPLETE SPECIFICATION [14-12-2018(online)].pdf	2018-12-14
14	1205-MUM-2011-CLAIMS [14-12-2018(online)].pdf	2018-12-14
15	1205-MUM-2011-US(14)-HearingNotice-(HearingDate-31-01-2023).pdf	2022-12-05
16	1205-MUM-2011-Correspondence to notify the Controller [09-12-2022(online)].pdf	2022-12-09
17	1205-MUM-2011-FORM-26 [24-01-2023(online)].pdf	2023-01-24
18	1205-MUM-2011-Correspondence to notify the Controller [31-01-2023(online)].pdf	2023-01-31
19	1205-MUM-2011-US(14)-ExtendedHearingNotice-(HearingDate-03-02-2023).pdf	2023-02-01
20	1205-MUM-2011-Correspondence to notify the Controller [01-02-2023(online)].pdf	2023-02-01
21	1205-MUM-2011-Written submissions and relevant documents [15-02-2023(online)].pdf	2023-02-15
22	1205-MUM-2011-PatentCertificate10-03-2023.pdf	2023-03-10
23	1205-MUM-2011-IntimationOfGrant10-03-2023.pdf	2023-03-10

Search Strategy

1	search_1205MUM2011_11-06-2018.pdf