Systems And Methods For Neural Network Based Voice Enabled Analytical

< Back

Systems And Methods For Neural Network Based Voice Enabled Analytical Reports Generation

Abstract: Systems and methods for neural network based voice-enabled analytical reports generation is provided. In traditional systems and methods, data structures in which the relationships between a user"s speech and text are correlated tend to be product specific. Embodiments of the proposed disclosure provide for neural network based voice-enabled analytical reports generation using a Deep Neural Network (DNN) technique, wherein a set of voice-based user inputs are captured and converted into a set of textual information; the set of textual information is compared with a corpus of documents using fuzzy techniques to extract a plurality of user intents and queries corresponding to one or more voice-enabled analytical reports to be generated; one or more analytical reports in a textual form is generated; and the one or more analytical reports in the textual form is converted into the one or more voice-enabled analytical reports by implementing one or more deep learning techniques.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

31 July 2018

Publication Number

06/2020

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

ip@legasis.in

Parent Application

Applicants

Tata Consultancy Services Limited

Nirmal Building, 9th Floor, Nariman Point, Mumbai - 400021, Maharashtra, India

Inventors

1. KSHIRSAGAR, Mahesh

Tata Consultancy Services Limited, Unit 130/131, SDF 5, Seepz, Andheri (E), Mumbai - 400096, Maharashtra, India

2. SIRMOKADAM, Sumukh

Tata Consultancy Services Limited, Olympus - A, Opp Rodas Enclave, Hiranandani Estate, Ghodbunder Road, Patlipada, Thane West - 400607, Maharashtra, India

3. RAMCHANDANI, Sahil

Tata Consultancy Services Limited, Unit 130/131, SDF 5, Seepz, Andheri (E), Mumbai - 400096, Maharashtra, India

4. KOPPARAPU, Sunilkumar

Tata Consultancy Services Limited, Yantra Park, Voltas Compound, Subhash Nagar, Pokhran Road 2, Thane West - 400606, Maharashtra, India

5. GUPTE, Pankaj

Tata Consultancy Services Limited, B-3, Nirlon Knowledge Park, Nirlon Complex, Goregaon East, Mumbai - 400063, Maharashtra, India

Specification

Claims:1. A method of neural network based voice-enabled analytical reports generation, the method comprising a processor implemented steps of:
capturing, by one or more hardware processors, a set of voice-based user inputs via a voice capturing device (201);
converting, by an Automatic Speech Recognition (ASR) Engine, the set of voice-based user inputs into a set of textual information comprising of a plurality of text commands, wherein the conversion is performed using a Deep Neural Network (DNN) technique (202);
comparing, using one or more fuzzy matching techniques, the set of textual information with a corpus of documents, wherein the corpus of documents comprise a plurality of words or sentences similar to the set of textual information (203);
extracting, based upon the comparison, a first set of information comprising a plurality of user intents and entities corresponding to one or more voice-enabled analytical reports to be generated, wherein the first set of information is extracted by a cognitive engine using one or more natural language processing techniques (204);
extracting, based upon the first set of information, a second set of information comprising a plurality of queries on the one or more voice-enabled analytical reports to be generated (205);
generating, based upon the second set of information, one or more analytical reports in a textual form corresponding to the plurality of user intents and entities, wherein the one or more analytical reports comprise analytical data corresponding to the one or more voice-enabled analytical reports to be generated (206); and
converting, using a text-to-voice engine, the one or more analytical reports in the textual form into the one or more voice-enabled analytical reports by implementing one or more deep learning techniques (207).

2. The method of claim 1, wherein the step of comparison comprises tokenizing, using the one or more fuzzy matching techniques, the set of textual information into a plurality of tokenized words, wherein the plurality of tokenized words comprise closely matching words with one or more words from the set of textual information.

3. The method of claim 2, wherein the tokenization comprises generating, based upon the plurality of tokenized words, one or more sentences closely matching with the set of textual information, by implementing the one or more fuzzy matching techniques.

4. A system (100) for neural network based voice-enabled analytical reports generation, the system (100) comprising:
a memory (102) storing instructions;
one or more communication interfaces (106); and
one or more hardware processors (104) coupled to the memory (102) via the one or more communication interfaces (106), wherein the one or more hardware processors (104) are configured by the instructions to:
capture, a set of voice-based user inputs via a voice capturing device;
convert, by an Automatic Speech Recognition (ASR) Engine (201), the set of voice-based user inputs into a set of textual information comprising of a plurality of text commands, wherein the conversion is performed using a Deep Neural Network (DNN) technique;
compare, using one or more fuzzy matching techniques, the set of textual information with a corpus of documents, wherein the corpus of documents comprise a plurality of words or sentences similar to the set of textual information;
extract, based upon the comparison, a first set of information comprising a plurality of user intents and entities corresponding to one or more voice-enabled analytical reports to be generated, wherein the first set of information is extracted by a cognitive engine using one or more natural language processing techniques;
extract, based upon the first set of information, a second set of information comprising a plurality of queries on the one or more voice-enabled analytical reports to be generated;
generate, based upon the second set of information, one or more analytical reports in a textual form corresponding to the plurality of user intents and entities, wherein the one or more analytical reports comprise analytical data corresponding to the one or more voice-enabled analytical reports to be generated; and
convert, using a text-to-voice engine (202), the one or more analytical reports in the textual form into the one or more voice-enabled analytical reports by implementing one or more deep learning techniques.

5. The system (100) of claim 4, wherein the one or more hardware processors (104) are configured to perform the comparison by tokenizing the set of textual information into a plurality of tokenized words using the one or more fuzzy matching techniques, wherein the plurality of tokenized words comprise closely matching words with one or more words from the set of textual information.

6. The system (100) of claim 5, wherein the one or more hardware processors (104) are configured to generate, based upon the plurality of tokenized words, one or more sentences closely matching with the set of textual information, by implementing the one or more fuzzy matching techniques.
, Description:FORM 2

THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENT RULES, 2003

COMPLETE SPECIFICATION
(See Section 10 and Rule 13)

Title of invention:

SYSTEMS AND METHODS FOR NEURAL NETWORK BASED VOICE-ENABLED ANALYTICAL REPORTS GENERATION

Applicant

Tata Consultancy Services Limited
A company Incorporated in India under the Companies Act, 1956
Having address:
Nirmal Building, 9th floor,
Nariman point, Mumbai 400021,
Maharashtra, India

The following specification particularly describes the invention and the manner in which it is to be performed.

TECHNICAL FIELD
[001] The disclosure herein generally relates to neural network based voice-enabled analytical reports generation, and, more particularly, to systems and methods for neural network based voice-enabled analytical reports generation.

BACKGROUND
[002] An Artificial Neural Network (also known as a Neural Network) is a computational model based on the structure and functions of biological neural networks. It is composed of many interconnected units called artificial neurons (also referred to as neurons hereinafter), which are organized in layers. A typical Neural Network comprises three kinds of layers, an input layer, a hidden layer, and an output Layer. The data is fed to the network through the input layer and the result is obtained at the output layer. The intermediate set of neurons in the middle layer initially map the input space to a linearly separable representation where the final decision will be taken. The hidden layer makes an NN capable of solving non-linear problems.
[003] To train a neural network, a training data set is used. Once a neural network is trained, the neural network can be used to perform pattern recognition or other tasks on a target data set, which contains the target pattern or object to be processed by the neural network.
[004] Voice-based analytics is the process of analyzing audio recordings to gather information, bring structure to speaker's interactions or expose information buried in speaker's interactions. Voice-based (or audio) analytics often includes elements of automatic speech recognition, where the identities of spoken words or phrases are determined. One use of speech analytics applications is to spot spoken keywords or phrases, either as alerts on live audio or as a post-processing step on recorded speech. Other uses include categorization of speech, for example in the contact center environment, to identify calls from unsatisfied customers.

SUMMARY
[005] Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one embodiment, a method for neural network based voice-enabled analytical reports generation is provided, the method comprising: capturing, by one or more hardware processors, a set of voice-based user inputs via a voice capturing device; converting, by an Automatic Speech Recognition (ASR) Engine, the set of voice-based user inputs into a set of textual information comprising of a plurality of text commands, wherein the conversion is performed using a Deep Neural Network (DNN) technique; comparing, using one or more fuzzy matching techniques, the set of textual information with a corpus of documents, wherein the corpus of documents comprise a plurality of words or sentences similar to the set of textual information; extracting, based upon the comparison, a first set of information comprising a plurality of user intents and entities corresponding to one or more voice-enabled analytical reports to be generated, wherein the first set of information is extracted by a cognitive engine using one or more natural language processing techniques; extracting, based upon the first set of information, a second set of information comprising a plurality of queries on the one or more voice-enabled analytical reports to be generated to be generated; generating, based upon the second set of information, one or more analytical reports in a textual form corresponding to the plurality of user intents and entities; converting, using a text-to-voice engine, the one or more analytical reports in the textual form into the one or more voice-enabled analytical reports by implementing one or more deep learning techniques; tokenizing, using the one or more fuzzy matching techniques, the set of textual information into a plurality of tokenized words, wherein the plurality of tokenized words comprise closely matching words with one or more words from the set of textual information; and generating, based upon the plurality of tokenized words, one or more sentences closely matching with the set of textual information, by implementing the one or more fuzzy matching techniques.
[006] In another aspect, there is provided a system for neural network based voice-enabled analytical reports generation, the system comprising a memory storing instructions; one or more communication interfaces; and one or more hardware processors coupled to the memory via the one or more communication interfaces, wherein the one or more hardware processors are configured by the instructions to: capture, a set of voice-based user inputs via a voice capturing device; convert, by an Automatic Speech Recognition (ASR) Engine (201), the set of voice-based user inputs into a set of textual information comprising of a plurality of text commands, wherein the conversion is performed using a Deep Neural Network (DNN) technique; compare, using one or more fuzzy matching techniques, the set of textual information with a corpus of documents, wherein the corpus of documents comprise a plurality of words or sentences similar to the set of textual information; extract, based upon the comparison, a first set of information comprising a plurality of user intents and entities corresponding to one or more voice-enabled analytical reports to be generated, wherein the first set of information is extracted by a cognitive engine using one or more natural language processing techniques; extract, based upon the first set of information, a second set of information comprising a plurality of queries on the one or more voice-enabled analytical reports to be generated; generate, based upon the second set of information, one or more analytical reports in a textual form corresponding to the plurality of user intents and entities; convert, using a text-to-voice engine, the one or more analytical reports in the textual form into the one or more voice-enabled analytical reports by implementing one or more deep learning techniques; tokenize the set of textual information into a plurality of tokenized words using the one or more fuzzy matching techniques, wherein the plurality of tokenized words comprise closely matching words with one or more words from the set of textual information; and generate, based upon the plurality of tokenized words, one or more sentences closely matching with the set of textual information, by implementing the one or more fuzzy matching techniques.

[007] In yet another aspect, there is provided one or more non-transitory machine readable information storage mediums comprising one or more instructions which when executed by one or more hardware processors causes the one or more hardware processors to perform a method for neural network based voice-enabled analytical reports generation, the method comprising: capturing, a set of voice-based user inputs via a voice capturing device; converting, by an Automatic Speech Recognition (ASR) Engine, the set of voice-based user inputs into a set of textual information comprising of a plurality of text commands, wherein the conversion is performed using a Deep Neural Network (DNN) technique; comparing, using one or more fuzzy matching techniques, the set of textual information with a corpus of documents, wherein the corpus of documents comprise a plurality of words or sentences similar to the set of textual information; extracting, based upon the comparison, a first set of information comprising a plurality of user intents and entities corresponding to one or more voice-enabled analytical reports to be generated, wherein the first set of information is extracted by a cognitive engine using one or more natural language processing techniques; extracting, based upon the first set of information, a second set of information comprising a plurality of queries on the one or more voice-enabled analytical reports to be generated to be generated; generating, based upon the second set of information, one or more analytical reports in a textual form corresponding to the plurality of user intents and entities; converting, using a text-to-voice engine, the one or more analytical reports in the textual form into the one or more voice-enabled analytical reports by implementing one or more deep learning techniques; tokenizing, using the one or more fuzzy matching techniques, the set of textual information into a plurality of tokenized words, wherein the plurality of tokenized words comprise closely matching words with one or more words from the set of textual information; and generating, based upon the plurality of tokenized words, one or more sentences closely matching with the set of textual information, by implementing the one or more fuzzy matching techniques.
[008] It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

BRIEF DESCRIPTION OF THE DRAWINGS
[009] The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles:
[010] FIG. 1 illustrates a block diagram of a system for neural network based voice-enabled analytical reports generation, in accordance with some embodiments of the present disclosure.
[011] FIG. 2 is an architectural diagram depicting components and flow of the system for neural network based voice-enabled analytical reports generation, in accordance with some embodiments of the present disclosure.
[012] FIG. 3A through 3B is a flow diagram illustrating the steps involved in the process of neural network based voice-enabled analytical reports generation, in accordance with some embodiments of the present disclosure.

DETAILED DESCRIPTION OF EMBODIMENTS
[013] Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the spirit and scope of the disclosed embodiments. It is intended that the following detailed description be considered as exemplary only, with the true scope and spirit being indicated by the following claims.
[014] Embodiments of the present disclosure provide for systems and methods for neural network based voice-enabled analytical reports generation. A voice-based analytical data generation using Neural Networks typically comprise processing an input acoustic sequence using a first neural network to convert the input acoustic sequence into an alternative representation for the input acoustic sequence; processing the alternative representation for the input acoustic sequence using an attention-based Neural Network to generate, for each position in an output sequence order, a set of substring scores that includes a respective substring score for each substring in a set of substrings; and generating a sequence of substrings that represent a transcription of the utterance.
[015] In general, a voice data not only comprises valuable words and sentences which facilitates better analytics intelligence, but further comprises tone, intensity and voice intent providing for deeper sentiment analytics. Voice analytics helps facilitates faster resolution of queries. Voice Analytics is thus important from the context of technological advancements and the rising demand for more precise risk management solutions. The ability of voice analytics to derive key insights from unstructured data and convert that into specific business solutions and opportunities is acting as a catalyst towards the adoption of this solution.
[016] However, generating voice-based analytical data from a voice data (received as an input from speech recognition devices) is challenging and poses complex problems. While using an analytics application, an end user is often compelled to orient himself to one or more ways of performing various steps (for example, pattern recognition), thereby meaning that the end user needs to know which analytics application one has to use, and based upon the analytics application identified, the end user learns about the subject area the identified is addressing.
[017] Further, end user has to learn about usage of the analytics application that is, about metaphor and navigation of the analytics application, features are available in the analytics application and how to use them. For example, for using a specific report / graph what all parameters have to be passed and how. Typically, the end user is presented with series of report/graph layouts and then based on his / her experience, the end user is expected to select the best fit layout.
[018] Thus, a significant amount of training is required for the end user to understand the scope of the application, the features of the application, and how to use the application. Also, training methods tend to be very product specific. Moreover, data structures in which the relationships between a user's speech and text are correlated tend to be product specific.
[019] Hence, the significant training effort applied to a speech recognition program may not be reusable for any other program or system. In some cases, speakers must re-train systems between version updates of the same program. Temporary or permanent changes to a user's voice patterns affect performance and may require retraining. This significant training burden and lack of portability between products has worked against wide scale adoption of voice-based analytical data generation.
[020] Hence, a need exists for a system and / or technology that understands what the end user expects in terms of solution (expected to be generated), the overall scope and usage aspect of the analytics aspect of the solution, how the end user will interact with an analytics application comprising voice as an input and voice-based analytical data generation as an output (via reports), and based upon the end user’s expectations, the system gets trained to generate voice-based analytical data.
[021] Referring now to the drawings, and more particularly to FIGS. 1 through 3B, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.
[022] FIG. 1 illustrates an exemplary block diagram of a system 100 for neural network based voice-enabled analytical reports generation, in accordance with an embodiment of the present disclosure. In an embodiment, the system 100 includes one or more processors 104, communication interface device(s) or input/output (I/O) interface(s) 106, and one or more data storage devices or memory 102 operatively coupled to the one or more processors 104. The one or more processors 104 that are hardware processors can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the processor(s) is configured to fetch and execute computer-readable instructions stored in the memory 102. In an embodiment, the system 100 can be implemented in a variety of computing systems, such as laptop computers, notebooks, hand-held devices, workstations, mainframe computers, servers, a network cloud and the like.
[023] The I/O interface device(s) 106 can include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like and can facilitate multiple communications within a wide variety of networks N/W and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. In an embodiment, the I/O interface device(s) can include one or more ports for connecting a number of devices to one another or to another server.
[024] The memory 102 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes.
[025] According to an embodiment of the present disclosure, referring to FIG. 2, the architecture of the proposed disclosure may be considered in detail. An Automatic Speech Recognition (ASR) Engine 201, converts voice-based user inputs into textual information. A text-to-voice engine 202, facilitates a conversion of any analytical output into voice-enabled analytical output. The decision framework 203 identifies reports to be fetched corresponding to an analytical output (or data) to be generated. The one or more reports 204 comprise the analytical output (or data) to be generated. A query builder 205 facilitates building of queries.
[026] FIG. 3A through 3B, with reference to FIGS. 1 and 2, illustrates an exemplary flow diagram of a method for neural network based voice-enabled analytical reports generation, in accordance with some embodiments of the present disclosure. In an embodiment the system 100 comprises one or more data storage devices of the memory 102 operatively coupled to the one or more hardware processors 104 and is configured to store instructions for execution of steps of the method by the one or more processors 104. The steps of the method of the present disclosure will now be explained with reference to the components of the system 100 as depicted in FIG. 1 and the flow diagram. In the embodiments of the present disclosure, the hardware processors 104 when configured the instructions performs one or more methodologies described herein.
[027] According to an embodiment of the present disclosure, at step 301, the one or more hardware processors 104 capture a set of voice-based user inputs via a voice capturing device (not shown in the figure), for example, a microphone. In an embodiment, the system 100 supports a voice recognition software application (now shown in the figure). The voice recognition software comprises a standalone software application that can be integrated with word processing applications, e-mail applications, calendar applications, and so on. The voice recognition software operates with the use of a plurality of input devices capable of receiving a user's voice for input into the software application. The microphone may typically be used in conjunction with the voice recognition software for capturing the set of voice-based user inputs from a user. In an example implementation, the set of voice-based user inputs that may be captured via the voice capturing device may be as below:
“Hey, I want to see income statement of XYZ Enterprises”
[028] According to an embodiment of the present disclosure, at step 302, the one or more hardware processors 104 convert, by implementing the Automatic Speech Recognition (ASR) Engine 201, the set of voice-based user inputs into a set of textual information comprising of a plurality of text commands, wherein the conversion is performed using a Deep Neural Network (DNN) technique. The one or more hardware processors 104 initially store the set of voice-based user inputs into the memory 102 of the system 100 to get the set of voice-based user inputs communicated to the ASR Engine 201. The ASR Engine 201 is trained with training data using one or more DNN techniques.
[029] In general, neural networks typically comprise a large number of interconnected nodes. In some classes of neural networks, the nodes are separated into different layers, and the connections between the nodes are characterized by associated weights. Each node has an associated function causing it to generate an output dependent on the signals received on each input connection and the weights of those connections. Neural networks are adaptive, in that the connection weights can be adjusted to change the response of the network to a particular input or class of inputs.
[030] Typically, in speech recognition, Deep Neural Networks (DNNs) may be employed for phoneme recognition. An input speech data may be a part of training data used to train the DNNs. The training data may comprise multiple speech data sets associated with multiple speakers. During training, different speech with corresponding speaker representation data are applied to the DNN as input. With each speech data and corresponding speaker representation data applied to the DNN as input, a corresponding prediction of the corresponding phoneme is generated, and weighting coefficients of the deep neural network are updated based on the prediction of the phoneme generated and information in the training data. Alternatively, the input speech data may be deployment speech data collected by a speech recognition system.
[031] According to an embodiment of the present disclosure, unlike the traditional systems and methods, the ASR Engine 201 (as implemented in the proposed methodology) leverages both trained acoustic model and language models. Acoustic model processing is used to model a received utterance. An output of the acoustic model is a lattice for each utterance. The lattice comprises an intermediate probability model in the form of a directed graph. The language model is used to calculate the probability of the whole transcript/sentence, not just the probability of the named entity. The language model processing attempts to utilize the context information to compose phones from the candidate lattices to more meaningful word sequences. In an embodiment, in case of the language model, lexicons are also considered so that all the nuances associated with voice based interaction of analytics application get gainfully trained.
[032] In an embodiment of the present disclosure, the ASR Engine 201 thus gets trained on both the acoustic and language models and by implementing the one or more DNNs technique, the set of textual information may be generated as below:
“Hey, I want to see income statement of XY Enterprises”
It may be noted that “mo” is missing in “XY” from the set of textual information generated.
[033] According to an embodiment of the present disclosure, at step 303, the one or more hardware processors 104 compare, by implementing one or more fuzzy matching techniques, the set of textual information with a corpus of documents, wherein the corpus of documents comprise a plurality of words or sentences similar to the set of textual information. In an embodiment, the corpus of documents may further comprise of the training data with which the ASR Engine 201 is trained in the step 302. The step of comparison may now be considered in detail.
[034] In an embodiment, the set of textual information is initially compared with the training data or the corpus of documents using the one or more fuzzy matching techniques, for example, a Levenshtein distance algorithm, or a Jaro Winkler algorithm or a Soundex technique or any combination thereof. The training data may further be tokenized into tokenized words by the one or more fuzzy matching techniques. As is known in the art, fuzzy matching refers to an assortment of techniques to determine whether searched strings approximately match some given pattern string. Each implementation of fuzzy matching uses some similarity function, that is, an algorithm for determining whether the input and searched strings are similar to each other. Examples of common similarity function may comprises the Levenshtein distance algorithm and n-gram distance technique.
[035] In an embodiment, initially the set of textual information is tokenized into a plurality of tokenized words, wherein the plurality of tokenized words comprise closely matching words with one or more words from the set of textual information. Further, each of the plurality of tokenized words is compared with the tokenized words from the training data and with the plurality of words or sentences from the corpus of documents using the one or more fuzzy matching techniques. The one or more hardware processors 104 generate, based upon the comparison, a plurality of closest matching words.
[036] In an example implementation of the step 303, each of the words like “Hey”, “Statement”, “XY”, and “Enterprise” may be compared with the training data or the corpus of documents, and based upon the comparison, the plurality of closest matching words may be generated as “XYZ” and “Enterprises”.
[037] In an embodiment, the tokenization comprises generating, based upon the plurality of tokenized words, one or more sentences closely matching with the set of textual information, by implementing the one or more fuzzy matching techniques. The one or more hardware processors 104 compare the one or more sentences generated with a plurality of sentences corresponding to the training data or the corpus of documents to generate a closest matching sentence.
[038] Considering an example scenario, the one or more sentences closely matching with the set of textual information may be generated as below:
“Hey, I want to see income statement of XY Enterprises”; or
“I want to XYZ's Automatch percentage”; or
“What is XYZ's Monthly Automatch Percentage”
[039] The plurality of sentences corresponding to the training data or the corpus of documents may comprise (as an example):
“Please bring up XYZ's income statement”;
“Please show income statement for XYZ”; and
“What is the other income and Expenses for XYZ”
[040] Based upon the comparison, the closest matching sentence may be generated as:
“I want to see income statement of XY Enterprises”
[041] The step of comparison provides for a level of high accuracy as compared to traditional systems and methods, particularly when the accuracy of a speech or voice recognition device is low due to various parameters like presence of noise and low quality microphone.
[042] According to an embodiment of the present disclosure, at step 304, the one or more hardware processors 104 extract based upon the comparison, a first set of information comprising a plurality of user intents and entities corresponding to one or more voice-enabled analytical reports to be generated, wherein the first set of information is extracted by a cognitive engine (not shown in the figure) using one or more natural language processing techniques. In an embodiment, entity recognition process detects entities mentioned in the closest matching sentence generated based upon the comparison, and links together multiple occurrences of same entity.
[043] The entity recognition, inter-alia, performs a syntactic analysis, an entity detection of proper nouns and resolves one or more coreferences (that is, when two or more expressions in a text refer to the same person or thing in linguistics), and then determines to which named entities, the text in the closest matching sentence probably refers. These are referred to as recognized entities or recognized named entities. In an embodiment, different rules may be applied in the entity recognition for determining which antecedent is likely the best candidate entity.
[044] In an embodiment, the one or more natural language processing techniques may comprise for example, a Named Entity Recognition (NER) technique for identifying a plurality of analytical entities by assigning analytical constructs with data artifacts based on the pre-defined analytical vocabulary, to extract the first set of information. As is known in the art, the plurality of analytical entities may comprise (but not limited to) types, measures, dimensions, and variants. As used herein, the term “dimension” refers to a structure that categorizes facts and measures. Examples of dimension include products, people, financial elements, time and the like. For example, a sales report may be viewed across the dimension of a product, a store, geography, a date, a quantity, revenue generated, and the like. A measure is a measurement data that can be manipulated usually denoted in some metric, for example, units, currency, etc. A variant can be a form or version that differs in some respect from other forms. In the example, “profit” is identified as a “measure” and “Asia region” is identified as a “variant.”
[045] In an example implementation, the first set of information may be extracted using the one or more natural language processing techniques by the cognitive engine as below:
Intent – “Income Statement Report” and / or “Automatic Percentage”
Entities – “Quarter 4” and “Line of Business and Region”
[046] According to an embodiment of the present disclosure, at step 305, the one or more hardware processors 104 extract, based upon the first set of information, a second set of information comprising a plurality of queries on the one or more voice-enabled analytical reports to be generated. In an embodiment, initially, the first set of information comprising the plurality of user intents and entities is communicated by the one or more hardware processors 104 to a decision framework 203. The decision framework 203 facilitates identification of one or more reports 204 to be fetched corresponding to a set of analytical data to be generated (via the one or more voice-enabled analytical reports) in a textual form, wherein the set of analytical data (to be generated) corresponds to the plurality of user intents and entities. Further, the decision framework 203 also determines if a canned report corresponding to the set of analytical data to be generated is available.
[047] As used herein, the term “queries” (from the expression “plurality of queries”) is referred to in the broadest sense to refer, to either two or more questions, one or more commands, or some form of input used as a control variable by the system, or any combination thereof. For example, a query may consist of a question directed to a particular topic, such as “what is a network” in the context of a remote learning application. In an e-commerce application a query might consist of a command to “list all books by XYZ” for example. Similarly, while the answer in a remote learning application consists of text that is rendered into audible form, it could also be returned as another form of multi-media information, such as a graphic image, a sound file, a video file, etc. depending on the requirements of the particular application.
[048] According to an embodiment of the present disclosure, if the decision framework 203 determines that the one or more reports 204 corresponding to the set of analytical data to be generated is available, the one more hardware processors 104 generate the one or more reports 204 along with the canned report corresponding to the set of analytical data. However, if the decision framework 203 determines that the canned report is not available, the second set of information comprising the plurality of queries corresponding to the set of analytical data to be generated is extracted. In an embodiment, the queries may be built by the one or more hardware processors 104 via the query builder 205.
[049] In an example implementation, the second set of information comprising the plurality of queries may be extracted as below:
“Select revenue from tableName ‘Income Statement’ where (enterprise_name=’XYZ’)”; and
“Select revenue from tableName ‘Income Statement’ where (enterprise_name=’XYZ Enterprises’ from Location_name=’Mumbai’ whose LineofBusiness=’Multimedia’).
[050] According to an embodiment of the present disclosure, at step 306, the one or more hardware processors 104 generate, based upon the second set of information, one or more analytical reports in a textual form corresponding to the plurality of user intents and entities, wherein the one or more analytical reports comprise analytical data corresponding to the one or more voice-enabled analytical reports to be generated. The set of analytical data in the textual form is generated via the one or more reports 204, wherein the one or more reports 204 comprise the set of analytical data (or final output). In an embodiment, the one or more hardware processors 104 initially execute the plurality of queries to obtain one or more data points for generating the one or more reports 204.
[051] As is known in the art, a data point is a smallest individual entity on a chart. On non-Shape charts, the one or more data points are represented depending upon their chart type. For example, a line series consists of one or more connected data points. On Shape charts, the one or more data points are represented by individual slices or segments that add up to the whole chart. For example, on a pie chart, each piece is a data point. The one or more data points form a series. By default, all formatting options are applied to each of the one or more data points in the series.
[052] According to an embodiment of the present disclosure, at step 307, the one or more hardware processors 104 convert, using the text-to-voice engine 202, the one or more analytical reports in the textual form into the one or more voice-enabled analytical reports by implementing one or more deep learning techniques. As is known in the art, deep learning techniques perform a non-linear mapping between input and output features with a deep-layered, hierarchical structure. In voice conversion, deep neural networks are being used as conversion models that map source features to target features.
[053] In an embodiment, the text-to-voice engine 202 leverages the one or more deep learning techniques (for example, a Bidirectional Recurrent Neural Network) to produce or synthesize a naturally sounding speech with an ability to change pitch, pronunciation and speed of the naturally sounding speech. The proposed methodology facilitates supporting one or more open-source libraries as well as commercial text-to-speech libraries for generating the voice-enabled analytical reports.
[054] In an example implementation, the voice-enabled analytical reports (comprising final analytical output) may be generated as:
“Total revenue for this quarter was USD 25000 and total expenses were USD 15000” and
“This quarter saw a hike in Wage Expenses and Utilities expenses. Net Income stood at USD 10000”.
[055] According to an embodiment of the present disclosure, advantages of the proposed methodology may now be considered in detail. Any kind of analytics applications comprising a reporting application or an advanced visualization application may be implemented with the proposed methodology. Analytics applications which are quite complex to use and call for significant time to train an end user (intending to use analytics applications(s)) may be implemented with the proposed methodology to generate voice-enabled analytical reports, without requiring the end user any need to get trained for generating analytical output (or reports). The proposed methodology provides for an intrinsic human way of interaction, which is voice-enabled, and wherein the end user only focuses on output, and through a series of interactive and iterative interactions, the end user obtain optimal output(s) via the proposed methodology.
[056] The proposed methodology thus analyzes and understands the end user corresponding to business area comprising analytical problem(s) the end user intend to analyze, and based on that, the proposed methodology, by implementing the steps of the disclosure, generates one or more categories of reports / graphs that are available comprising all associated parameters to seek inputs corresponding to the associated parameters, and finally recommends the one or more reports 204 or even graphs layouts which would best represent the output. This leads to easy usage of the analytics applications.
[057] In an embodiment, the memory 102 can be configured to store any data that is associated with neural network based voice-enabled analytical reports generation. In an embodiment, the information pertaining to the set of voice-based user inputs, the set of textual information, the comparison of the set of textual information with the corpus of documents, the first set of information, the second set of information, the one or more analytical reports generated in the textual form, and the one or more voice-enabled analytical reports is stored in the memory 102.
[058] The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
[059] It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g. any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g. hardware means like e.g. an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g. an ASIC and an FPGA, or at least one microprocessor and at least one memory with software modules located therein. Thus, the means can include both hardware means and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g. using a plurality of CPUs.
[060] The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various modules described herein may be implemented in other modules or combinations of other modules. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
[061] The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope and spirit of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
[062] Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
[063] It is intended that the disclosure and examples be considered as exemplary only, with a true scope and spirit of disclosed embodiments being indicated by the following claims.

Documents

Application Documents

#	Name	Date
1	201821028818-STATEMENT OF UNDERTAKING (FORM 3) [31-07-2018(online)].pdf	2018-07-31
2	201821028818-REQUEST FOR EXAMINATION (FORM-18) [31-07-2018(online)].pdf	2018-07-31
3	201821028818-FORM 18 [31-07-2018(online)].pdf	2018-07-31
4	201821028818-FORM 1 [31-07-2018(online)].pdf	2018-07-31
5	201821028818-FIGURE OF ABSTRACT [31-07-2018(online)].jpg	2018-07-31
6	201821028818-DRAWINGS [31-07-2018(online)].pdf	2018-07-31
7	201821028818-COMPLETE SPECIFICATION [31-07-2018(online)].pdf	2018-07-31
8	201821028818-FORM-26 [05-09-2018(online)].pdf	2018-09-05
9	Abstract1.jpg	2018-09-14
10	201821028818-Proof of Right (MANDATORY) [06-12-2018(online)].pdf	2018-12-06
11	201821028818-Proof of Right (MANDATORY) [26-12-2018(online)].pdf	2018-12-26
12	201821025418-ORIGINAL UR 6(1A) FORM 26-120918.pdf	2019-02-13
13	201821028818-ORIGINAL UR 6(1A) FORM 1-271218.pdf	2019-04-12
14	201821028818- ORIGINAL UR 6(1A) FORM 1-121218.pdf	2019-04-18
15	201821028818-OTHERS [25-07-2021(online)].pdf	2021-07-25
16	201821028818-FER_SER_REPLY [25-07-2021(online)].pdf	2021-07-25
17	201821028818-COMPLETE SPECIFICATION [25-07-2021(online)].pdf	2021-07-25
18	201821028818-CLAIMS [25-07-2021(online)].pdf	2021-07-25
19	201821028818-FER.pdf	2021-10-18
20	201821028818-US(14)-HearingNotice-(HearingDate-01-04-2024).pdf	2024-03-01
21	201821028818-FORM-26 [27-03-2024(online)].pdf	2024-03-27
22	201821028818-Correspondence to notify the Controller [27-03-2024(online)].pdf	2024-03-27
23	201821028818-FORM-26 [30-03-2024(online)].pdf	2024-03-30
24	201821028818-FORM-26 [01-04-2024(online)].pdf	2024-04-01
25	201821028818-Written submissions and relevant documents [15-04-2024(online)].pdf	2024-04-15

Search Strategy

1	2020-12-2912-24-00E_30-12-2020.pdf