System And Method For Summarization Of Speech To Text/Speech

System And Method For Summarization Of Speech To Text/Speech Translation

Abstract: Disclosed is a system (100) that facilitates summarization of speech to text/speech translation. The system (100) includes an input unit (102) configured to receive an input speech in a first language. The system further (100) further includes a data processing apparatus (104) coupled to the input unit. The data processing apparatus (104) includes processing circuitry (110) that is configured to transcribe the input speech in the first language into corresponding textual data in the first language. The processing circuitry (110) is further configured to summarize the textual data in the first language and then translate the textual data summarized in the first language into the textual data summarized in a second language. Furthermore, the processing circuitry (110) is configured to convert the textual data summarized in the second language into an output speech summarized in the second language. FIG. 2 is the reference figure.

Patent Information

Application #

Filing Date

03 May 2024

Publication Number

33/2024

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Email

Parent Application

Applicants

IITI DRISHTI CPS FOUNDATION

IIT Indore, Khandwa Road Simrol, Indore, Madhya Pradesh, 453552, India

Inventors

1. Nivedita Sethiya

Sr. M.I.G. No.1, Janata Colony, Mandsaur, Madhya Pradesh, 458001, India

2. Balaram Sarkar

Abhaycharan Ashram, Mayapur, Nadia, West Bengal, 741313, India

3. Chandresh Kumar Maurya

CSE department, IIT Indore, Khandwa Road Simrol, Indore, Madhya Pradesh, 453552, India

Specification

Description:TECHNICAL FIELD
The present disclosure generally relates to a field of language translations and summarizations. More particularly, the present disclosure relates to a system and a method for summarization of speech to text/speech translations.
BACKGROUND
Now a days, whenever someone wants to listen to any content available in any audio/video such as any podcast talk, any lecture, any news etc., one has to go through the whole audio/video to listen and understand the content delivered in that audio/video. The problem is more pronounced when the content delivered in the audio/video is presented in any other language that is not known to the user. Currently, in such a scenario, one has to look for subtitles in a language known to them and keep on reading the subtitles for understanding the content delivered in the audio/video.
At present, there are various techniques available for transcribing an input audio in text and vice versa. Specifically, there are various speech to text (STT) techniques available in the state of art. However, none of the state of art has brought a solution to the abovementioned problem that facilitates in converting an unknown language content delivered in a long hour audio/video into a language known to a listener within a short interval of time.
Thus, there is a need for a system and method that facilitates summarization of speech to text/speech translation.
SUMMARY
In an embodiment of the present disclosure, a system is disclosed. The system includes an input unit, a data processing apparatus, and an output unit coupled to each other. The input unit is configured to receive an input speech in a first language. The data processing apparatus includes processing circuitry. The processing circuitry is configured to transcribe the input speech in the first language into corresponding textual data in the first language. Further, the processing circuitry is configured to summarize the textual data in the first language. Further, the processing circuitry is configured to translate the textual data summarized in the first language into the textual data summarized in a second language. Further, the processing circuitry is configured to convert the textual data summarized in the second language into an output speech summarized in the second language.
In some embodiments of the present disclosure, the output unit is configured to provide an output in the form of the output speech summarized in the second language.
In some embodiments of the present disclosure, the processing circuitry is configured to transcribe the input speech via an automatic speech recognition technique.
In another embodiment of the present disclosure, a method is disclosed. The method includes the step of receiving , by way of an input unit, an input speech in a first language. The method further includes the step of transcribing, by way of a processing circuitry coupled to the input unit, the input speech in the first language into corresponding textual data in the first language. The method further includes the step of summarizing, by way of the processing circuitry, the textual data in the first language. The method further includes the step of translating, by way of the processing circuitry, the textual data summarized in the first language into the textual data summarized in a second language.
In some embodiments of the present disclosure, the method further includes converting, by way of the processing circuitry, the textual data summarized in the second language into an output speech summarized in the second language.
In some embodiments of the present disclosure, the method further includes providing, by way of an output unit, an output speech summarized in the second language.
In some embodiments of the present disclosure, the method includes enabling an automatic speech recognition technique for transcribing the input speech.
BRIEF DESCRIPTION OF DRAWINGS
Other objects, features, and advantages of the embodiments will be apparent from the following description when read with reference to the accompanying drawings. In the drawings, wherein like reference numerals denote corresponding parts throughout the several views:
The diagrams are for illustration only, which thus is not a limitation of the present disclosure, and wherein:
FIG. 1 illustrates a block diagram of a system for summarization of speech to text/speech translations, in accordance with an embodiment of the present disclosure;
FIG. 2 illustrates a block diagram of a data processing apparatus of the system of Fig, 1, in accordance with an embodiment of the present disclosure;
FIG. 3 illustrates a flow chart of a method for summarization of speech to text/speech translations, in accordance with an embodiment of the present disclosure.
To facilitate understanding, like reference numerals have been used, where possible to designate like elements common to the figures.
DETAILED DESCRIPTION
Various embodiments of the present disclosure provide a system and a method for summarization of speech to text/speech translations. The following description provides specific details of certain embodiments of the disclosure illustrated in the drawings to provide a thorough understanding of those embodiments. It should be recognized, however, that the present disclosure can be reflected in additional embodiments and the disclosure may be practiced without some of the details in the following description.
The various embodiments including the example embodiments are now described more fully with reference to the accompanying drawings, in which the various embodiments of the disclosure are shown. The disclosure may, however, be embodied in different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure is thorough and complete, and fully conveys the scope of the disclosure to those skilled in the art. In the drawings, the sizes of components may be exaggerated for clarity.
As mentioned, there remains a need that facilitates a listener to listen and understand an unknown language content delivered in a long hour audio/video into a language known to the listener within a short interval of time. The present aspect, therefore, provides a system and a method for summarization of speech to text/speech translations.
The aspects herein and the various features and advantageous details thereof are explained more fully with reference to the non-limiting aspects that are illustrated in the accompanying drawings and detailed in the following description. Descriptions of well-known components and processing techniques are omitted so as to not unnecessarily obscure the aspects herein. The examples used herein are intended merely to facilitate an understanding of ways in which the aspects herein may be practiced and to further enable those of skill in the art to practice the aspects herein. Accordingly, the examples should not be construed as limiting the scope of the aspects herein.
FIG. 1 is a block diagram of a system 100 for summarization of speech to text/speech translations (hereinafter referred to as “the system 100”), in accordance with an embodiment of the present disclosure. The system 100 may include an input unit 102, a data processing apparatus 104 and an output unit 106. The data processing apparatus 104 may be communicatively coupled to the input unit 102 and the output unit 106, by way of a communication network 108. In some embodiments of the present disclosure, the input unit 102, the data processing apparatus 104, and the output unit 106 may be communicatively coupled through separate communication networks established therebetween that may be wired and/or wireless.
The input unit 102 may be configured to facilitate a user to input data, receive data, and/or transmit data within the system 100. Specifically, the input unit 102 may be configured to receive one or more speech signals (hereinafter referred to as “speech signals”) from the user in a first language. In some embodiments of the present disclosure, the input unit 102 may include, one of, a microphone, a smartphone, a tablet, a voice-activated remote control, in-car voice command systems, a smartwatch, a headset with microphones, a voice assistant device and/or a combination thereof. Embodiments of the present disclosure are intended to include or otherwise cover any type of known and later developed input unit, without deviating from the scope of the present disclosure. Although FIG. 1 illustrates that the system 100 includes a single input unit (i.e., the input unit 102), it will be apparent to a person skilled in the art that the scope of the present disclosure is not limited to it. In various other aspects, the system 100 may include multiple input units without deviating from the scope of the present disclosure. In such a scenario, each input unit is configured to perform one or more operations in a manner similar to the operations of the input unit 102 as described herein.
The output unit 106 may be configured to output a summarized text/speech in the at least one second language that is generated from the input speech signal in the at least one first language. In some embodiments of the present disclosure, the output unit 106 may include, any one of a computer monitor, a smartphone, a tablet, a projector screen, and/or a combination thereof. Embodiments of the present disclosure are intended to include and/or otherwise cover any type of known and later developed output unit, without deviating from the scope of the present disclosure.
The communication network 108 may include suitable logic, circuitry, and interfaces that may be configured to provide a plurality of network ports and a plurality of communication channels for transmission and reception of data related to operations of various entities (such as the input unit 102, the output unit 106 and the data processing apparatus 104) of the system 100. Each network port may correspond to a virtual address (or a physical machine address) for transmission and reception of the communication data. For example, the virtual address may be an Internet Protocol Version 4 (IPV4) (or an IPV6 address) and the physical address may be a Media Access Control (MAC) address. The communication network 108 may be associated with an application layer for implementation of communication protocols based on one or more communication requests from the input unit 102, the output unit 106, and the data processing apparatus 104. The communication data may be transmitted or received, via the communication protocols. Examples of the communication protocols may include, but are not limited to, Hypertext Transfer Protocol (HTTP), File Transfer Protocol (FTP), Simple Mail Transfer Protocol (SMTP), Domain Network System (DNS) protocol, Common Management Interface Protocol (CMIP), Transmission Control Protocol and Internet Protocol (TCP/IP), User Datagram Protocol (UDP), Long Term Evolution (LTE) communication protocols, or any combination thereof.
In an aspect of the present disclosure, the communication data may be transmitted or received via at least one communication channel of a plurality of communication channels in the communication network 108. The communication channels may include, but are not limited to, a wireless channel, a wired channel, a combination of wireless and wired channel thereof. The wireless or wired channel may be associated with a data standard which may be defined by one of a Local Area Network (LAN), a Personal Area Network (PAN), a Wireless Local Area Network (WLAN), a Wireless Sensor Network (WSN), Wireless Area Network (WAN), Wireless Wide Area Network (WWAN), a metropolitan area network (MAN), a satellite network, the Internet, a fiber optic network, a coaxial cable network, an infrared (IR) network, a radio frequency (RF) network, and a combination thereof. Aspects of the present disclosure are intended to include or otherwise cover any type of communication channel, including known, related art, and/or later developed technologies.
The data processing apparatus 104 may be a network of computers, a framework, or a combination thereof, that may provide a generalized approach to create a server implementation. In some embodiments of the present disclosure, the data processing apparatus 104 may be a server. Examples of the data processing apparatus 104 may include, but are not limited to, personal computers, laptops, mini-computers, mainframe computers, any non-transient and tangible machine that can execute a machine-readable code, cloud-based servers, distributed server networks, or a network of computer systems. The data processing apparatus 104 may be realized through various web-based technologies such as, but not limited to, a Java web-framework, a .NET framework, a personal home page (PHP) framework, or any other web-application framework. The data processing apparatus 104 may include one or more processing circuitries of which processing circuitry 110 is shown and a database 112.
The processing circuitry 110 may be configured to execute various operations associated with the system 100. The processing circuitry 110 may be configured to receive at least one input speech signal in a first language from the input unit 102, execute the one or more operations associated with the system 100 by communicating one or more commands and/or instructions over the communication network 108, and provide at least one output speech signal/text to the output unit 106 that is the summary of the input speech signal in a second language. Examples of the processing circuitry 110 may include, but are not limited to, an ASIC processor, a RISC processor, a CISC processor, a FPGA, and the like. Embodiments of the present disclosure are intended to include and/or otherwise cover any type of the processing circuitry 110 including known, related art, and/or later developed technologies.
The database 112 may be configured to store the logic, instructions, circuitry, interfaces, and/or codes of the processing circuitry 110 for executing various operations. The database 112 may be further configured to store therein, data associated with the input unit 102. It will be apparent to a person having ordinary skill in the art that the database 112 may be configured to store various types of data associated with the system 100, without deviating from the scope of the present disclosure. Examples of the database 112 may include but are not limited to, a Relational database, a NoSQL database, a Cloud database, an Object-oriented database, and the like. Further, the database 112 may include associated memories that may include, but is not limited to, a ROM, a RAM, a flash memory, a removable storage drive, a HDD, a solid-state memory, a magnetic storage drive, a PROM, an EPROM, and/or an EEPROM. Embodiments of the present disclosure are intended to include or otherwise cover any type of the database 112 including known, related art, and/or later developed technologies.
FIG. 2 illustrates the data processing apparatus 104 of the system 100 of FIG.1, in accordance with an aspect of the present disclosure. The data processing apparatus 104 may include the processing circuitry 110, the database 112, a network interface 200, and an input-output (I/O) interface 202 communicatively coupled to one another by way of a communication bus 204.
In some embodiments of the present disclosure, the processing circuitry 110 may include a data collection engine 206, a transcribing engine 208, a summarizing engine 210, a translating engine 212, and a converting engine 214. The data collection engine 206, the transcribing engine 208, the summarizing engine 210, the translating engine 212, and the converting engine 214 may be coupled to each other by way of a second communication bus 216.
The network interface 200 may include suitable logic, circuitry, and interfaces that may be configured to establish and enable a communication between the data processing apparatus 104 and different components of the system 100 (e.g., the input unit 102 and the output unit 106), via the communication network 108. The network interface 200 may be implemented by use of various known technologies to support wired or wireless communication of the data processing apparatus 104 with the communication network 108. The network interface 200 may include, but is not limited to, an antenna, a radio frequency (RF) transceiver, one or more amplifiers, a tuner, one or more oscillators, a digital signal processor, a coder-decoder (CODEC) chipset, a subscriber identity module (SIM) card, and a local buffer circuit.
The I/O interface 202 may include suitable logic, circuitry, interfaces, and/or code that may be configured to receive inputs (e.g., orders) and transmit outputs via a plurality of data ports in the data processing apparatus 104. The I/O interface 202 may include various input and output data ports for different I/O devices. Examples of such I/O devices may include, but are not limited to, a touch screen, a keyboard, a mouse, a joystick, a projector audio output, a microphone, an image-capture device, a liquid crystal display (LCD) screen and/or a speaker.
The processing circuitry 110 may be configured to perform one or more operations associated with the system 100 by way of the data collection engine 206, the transcribing engine 208, the summarizing engine 210, the translating engine 212, and the converting engine 214. In some embodiments of the present disclosure, the data collection engine 206 may be configured to receive input data from the input unit 102. In some embodiment of the present disclosure, the input data may be an input speech signal or an input audio signal in a first language. In some embodiments of the present disclosure, the data collection engine 206 may be configured to provide the received input speech signal in the first language to the transcribing engine 208.
In some embodiments of the present disclosure, the transcribing engine 208 may be configured to receive the input speech signal in the first language and transcribe the input speech signal in the first language to corresponding textual data in the first language. In some embodiments of the present disclosure, the transcribing engine 208 may be configured to transcribe the input speech signal in the first language via an automatic speech recognition (ASR) technique.
In some embodiment of the present disclosure, the summarizing engine 210 may be configured to receive the transcribed textual data and summarize into a summarized textual data in the first language. In some embodiments of the present disclosure, for summarizing, the summarizing engine 210 may be configured to identify the main idea in the transcribed textual data and provides the abstractive type summary of the transcribed textual data. In some embodiments of the present disclosure, the summarizing engine 210 may be configured to allow a user to set a length of the summarized textual data.
In some embodiments of the present disclosure, translating engine 212 may be configured to receive the summarized textual data in the first language and translate the summarized textual data in the first language into a translated summarized textual data in a second language.
In some embodiments of the present disclosure, the converting engine 214 may be configured to convert the translated summarized textual data in the second language to the translated summarized speech data in the second language. Further, the converting engine 214 may be configured to provide the output in the form of translated summarized output speech in the second language (hereinafter output speech) via the output unit 106.
FIG. 3 illustrates a flow chart of a method 300 for summarization of speech to text/speech translations (hereinafter referred to as “the method 300”),, in accordance with an aspect of the present disclosure.
At step 302, an input unit 102 may receive an input speech in a first language.
At step 304, a data processing apparatus 104 may transcribe the input speech in the first language into corresponding textual data in the first language.
At step 306, the data processing apparatus 104 may summarize the textual data in the first language.
At step 308, the data processing apparatus 104, may translate the textual data summarized in the first language into the textual data summarized in a second language.
At step 310, the data processing apparatus 104, may convert the textual data summarized in the second language into an output speech summarized in the second language.
The foregoing discussion of the present disclosure has been presented for purposes of illustration and description. It is not intended to limit the present disclosure to the form or forms disclosed herein. In the foregoing Detailed Description, for example, various features of the present disclosure are grouped together in one or more aspects, configurations, or aspects for the purpose of streamlining the disclosure. The features of the aspects, configurations, or aspects may be combined in alternate aspects, configurations, or aspects other than those discussed above. This method of disclosure is not to be interpreted as reflecting an intention the present disclosure requires more features than are expressly recited in each aspect. Rather, as the following aspects reflect, inventive aspects lie in less than all features of a single foregoing disclosed aspect, configuration, or aspect. Thus, the following aspects are hereby incorporated into this Detailed Description, with each aspect standing on its own as a separate aspect of the present disclosure.
Moreover, though the description of the present disclosure has included description of one or more aspects, configurations, or aspects and certain variations and modifications, other variations, combinations, and modifications are within the scope of the present disclosure, e.g., as may be within the skill and knowledge of those in the art, after understanding the present disclosure. It is intended to obtain rights which include alternative aspects, configurations, or aspects to the extent permitted, including alternate, interchangeable and/or equivalent structures, functions, ranges or steps to those disclosed, whether or not such alternate, interchangeable and/or equivalent structures, functions, ranges or steps are disclosed herein, and without intending to publicly dedicate any patentable subject matter.
As one skilled in the art will appreciate, the system 100 includes a number of functional blocks in the form of a number of units and/or engines. The functionality of each unit and/or engine goes beyond merely finding one or more computer algorithms to carry out one or more procedures and/or methods in the form of a predefined sequential manner, rather each engine explores adding up and/or obtaining one or more objectives contributing to an overall functionality of the system 100. Each unit and/or engine may not be limited to an algorithmic and/or coded form, rather may be implemented by way of one or more hardware elements operating together to achieve one or more objectives contributing to the overall functionality of the system 100. Further, as it will be readily apparent to those skilled in the art, all the steps, methods and/or procedures of the system 100 are generic and procedural in nature and are not specific and sequential.
Certain terms are used throughout the following description and aspects to refer to particular features or components. As one skilled in the art will appreciate, different persons may refer to the same feature or component by different names. This document does not intend to distinguish between components or features that differ in name but not structure or function. While various aspects of the present disclosure have been illustrated and described, it will be clear that the present disclosure is not limited to these aspects only. Numerous modifications, changes, variations, substitutions, and equivalents will be apparent to those skilled in the art, without departing from the spirit and scope of the present disclosure. , Claims:1. A system (100) comprising;
an input unit (102) configured to receive an input speech in a first language;
a data processing apparatus (104) coupled to the input unit (102) and comprising;
processing circuitry (110) configured to:
transcribe the input speech in the first language into corresponding textual data in the first language;
summarize the textual data in the first language;
translate the textual data summarized in the first language into the textual data summarized in a second language.
2. The system (100) as claimed in claim 1, wherein the processing circuitry (110) is further configured to convert the textual data summarized in the second language into an output speech summarized in the second language.
3. The system (100) as claimed in claim 1, further comprises an output unit (106) that is configured to provide an output in form of the output speech summarized in the second language.
4. The system (100) as claimed in claim 1, wherein the processing circuitry (110) is configured to transcribe the input speech via an automatic speech recognition technique.
5. A method (300) comprising;
receiving , by way of an input unit (102), an input speech in a first language;
transcribing, by way of a processing circuitry (110) coupled to the input unit (102), the input speech in the first language into corresponding textual data in the first language;
summarizing, by way of the processing circuitry (110), the textual data in the first language; and
translating, by way of the processing circuitry (110), the textual data summarized in the first language into the textual data summarized in a second language.
6. The method (300) as claimed in claim 5, further comprising converting, by way of the processing circuitry (110), the textual data summarized in the second language into an output speech summarized in the second language.
7. The method (300) as claimed in claim 5, further comprising providing an output, by way of an output unit (106), in form of the output speech summarized in the second language.
8. The method (300) as claimed in claim 5, wherein for transcribing the input speech, by way of the processing circuitry (110), an automatic speech recognition technique is enabled.

Documents

Application Documents

#	Name	Date
1	202421035153-STATEMENT OF UNDERTAKING (FORM 3) [03-05-2024(online)].pdf	2024-05-03
2	202421035153-FORM FOR SMALL ENTITY(FORM-28) [03-05-2024(online)].pdf	2024-05-03
3	202421035153-FORM FOR SMALL ENTITY [03-05-2024(online)].pdf	2024-05-03
4	202421035153-FORM 1 [03-05-2024(online)].pdf	2024-05-03
5	202421035153-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [03-05-2024(online)].pdf	2024-05-03
6	202421035153-EVIDENCE FOR REGISTRATION UNDER SSI [03-05-2024(online)].pdf	2024-05-03
7	202421035153-DRAWINGS [03-05-2024(online)].pdf	2024-05-03
8	202421035153-DECLARATION OF INVENTORSHIP (FORM 5) [03-05-2024(online)].pdf	2024-05-03
9	202421035153-COMPLETE SPECIFICATION [03-05-2024(online)].pdf	2024-05-03
10	Abstract1.jpg	2024-05-29
11	202421035153-FORM-26 [11-06-2024(online)].pdf	2024-06-11
12	202421035153-FORM-9 [12-08-2024(online)].pdf	2024-08-12
13	202421035153-Proof of Right [04-11-2024(online)].pdf	2024-11-04
14	202421035153-MSME CERTIFICATE [05-11-2024(online)].pdf	2024-11-05
15	202421035153-FORM28 [05-11-2024(online)].pdf	2024-11-05
16	202421035153-FORM 18A [05-11-2024(online)].pdf	2024-11-05
17	202421035153-FER.pdf	2024-11-08
18	202421035153-FORM 3 [25-11-2024(online)].pdf	2024-11-25
19	202421035153-PA [31-12-2024(online)].pdf	2024-12-31
20	202421035153-FORM28 [31-12-2024(online)].pdf	2024-12-31
21	202421035153-EVIDENCE FOR REGISTRATION UNDER SSI [31-12-2024(online)].pdf	2024-12-31
22	202421035153-EDUCATIONAL INSTITUTION(S) [31-12-2024(online)].pdf	2024-12-31
23	202421035153-ASSIGNMENT DOCUMENTS [31-12-2024(online)].pdf	2024-12-31
24	202421035153-8(i)-Substitution-Change Of Applicant - Form 6 [31-12-2024(online)].pdf	2024-12-31
25	202421035153-FER_SER_REPLY [02-01-2025(online)].pdf	2025-01-02
26	202421035153-DRAWING [02-01-2025(online)].pdf	2025-01-02
27	202421035153-COMPLETE SPECIFICATION [02-01-2025(online)].pdf	2025-01-02
28	202421035153-US(14)-HearingNotice-(HearingDate-15-07-2025).pdf	2025-07-04
29	202421035153-Correspondence to notify the Controller [11-07-2025(online)].pdf	2025-07-11
30	202421035153-Written submissions and relevant documents [25-07-2025(online)].pdf	2025-07-25

Search Strategy

1	searchE_07-11-2024.pdf
2	searchAE_14-01-2025.pdf