Abstract: ABSTRACT A SYSTEM AND METHOD FOR CONVERTING, PROCESSING AND DELIVERING OF TEXT INPUT INTO AUDIO OUTPUT Aspects of present disclosure relates to a system and method for converting, processing and delivering a text input as audio output ensuring that the textual content is maintained in the output audio as well. The invention discloses an improved text-to- audio speech processing that converts text from an electronic textual input into an audio output that includes speech associated with the text as well as audio contextual cues. method for converting text to audio speech, an embodiment of the invention includes at least: selecting a document or textual input to be converted to audio; parsing the selected document into constituent components; converting text in the selected document to audio in correspondence to the digital voice library; and creating an audio file based on the converted text. This provides improved text-to-audio processing that can present contextual information to listeners.
Claims:I/We Claim:
1. A system for converting, processing and delivering of text input into audio output, wherein the system comprising;
an electronic device with a processor and memory storing the input text document for execution by the processor [102]:
atleast one look-up logic module [104] for parsing a document to identify the plurality of components in the document;
a digital voice library [106] for corresponding each component to atleast one available voice recording and converting the text to audio;
a sentence construction algorithm for arranging the components to form sentences; and
a voice output device [108] for announcing the input text into desired voice output after conversion.
2. The system for converting, processing and delivering of text input into audio output as claimed in Claim 1, wherein the electronic device constitutes a non-transitory computer readable storage medium with a processor and storage for parsing a document into plurality of documents.
3. The system for converting, processing and delivering of text input into audio output as claimed in Claim 1, wherein the look-up module segregates the textual input to speech components so as to corresponding various databases such as abbreviation database, phrase database, word database, new word log.
4. The system for converting, processing and delivering of text input into audio output as claimed in Claim 1, wherein the digital voice library constitutes a plurality of voice recordings and speech items.
5. A method for converting, processing and delivering of text input into audio output, wherein the method comprising;
an electronic device with a processor and memory storing the input text for execution by the processor:
atleast one look-up logic module for parsing a document to identify the plurality of components in the document;
a digital voice library for corresponding each component to atleast one available voice recording and converting the text to audio;
a sentence construction algorithm for arranging the components to form sentences; and
a voice output device for announcing the input text into desired voice output after conversion.
6. The method for converting, processing and delivering of text input into audio output as claimed in Claim 5, wherein the digital voice library constitutes a plurality of speech items and voice recordings to corresponding to the text components via a look-up logic module.
7. The method for converting, processing and delivering of text input into audio output as claimed in Claim 5, wherein the audio is stored in Linear Predictive Coding (LPC), Codebook Excited Linear Prediction (CELP), or any other formats that may be used in either text-to-speech or text-to-voice systems.
8. The method for converting, processing and delivering of text input into audio output as claimed in Claim 5, wherein the voice output device constitutes speaker, sound box or any other audio output device for announcing the audio of the converting text.
, Description:A SYSTEM AND METHOD FOR CONVERTING, PROCESSING AND DELIVERING OF TEXT INPUT INTO AUDIO OUTPUT
TECHNICAL FIELD
[0001] The present disclosure generally relates to a text to audio processing, and more particularly an enhanced system and method of text-to-audio processing for improved document review.
BACKGROUND
[0002] Background description includes information that may be useful in understanding the present invention. It is not an admission that any of the information provided herein is prior art or relevant to the presently claimed invention, or that any publication specifically or implicitly referenced is prior art.
[0003] Systems and methods for converting text to audio and text-to-speech are well known for use in various applications. For users with impaired vision, listening to the resulting speech for a document is particularly important. Regardless of the reasons for listening to speech associated with a document, conventional text-to-speech processing is often not able to impart to the user (listener) contextual information about the text that is being spoken.
[0004] Further, in recent years, documents have become more complex and more diversified. As a result, today's documents can have many different formats and contain various different document elements, including links, headings, tables, captions, footnotes, etc., which makes text-to-audio processing more challenging. Thus, there is a need to provide improved text-to-audio processing that can present contextual information to listeners.
[0005] For users desiring to listen to documents while on-the-go, text-to-audio processing can generate audio output that can be listened to while on-the-go. However, text-to-audio speech processing is processor-intensive, making it impractical for many portable devices that have limited processing power. Hence, there is also a need to manage creation, delivery and consumption of audio outputs that provide speech associated with documents.
[0006] There are substantial number of works done on this subject before this present invention majorly pertaining to text-to-speech or text-to-voice processing. But most of the system or methods fail to disclose a wholesome processing unit wherein the text is not only converted to audio output but also delivers the requisite textual content portrayed through various components of the document. In certain systems are often unable to adapt to varying language proficiency of different users. This lack of flexibility often prevents users with limited language proficiency from understanding complex text-speech output.
[0007] There is a persistent need to manage creation, delivery and consumption of audio outputs that provide speech associated with documents as it would also ensure to provide the requisite textual content to the users or the listeners as per the various components disclosed in the textual document in the form of an enhanced audio output.
[0008] In order to achieve the aforementioned purpose, the method and system according to the said prior art comprise majorly system or/and technologies for text-to-speech conversion merely. The conversion merely comprises of the converting the text to audio failing to impart the textual content with the document comprising several parts like heading, sub-heading, body, footnotes, links etc.
[0009] Therefore, the present disclosure overcomes the above-mentioned problem associated with the traditionally available method or system, any of the above-mentioned invention can be used with the presented disclosed technique with or without modification.
[0010] All publications herein are incorporated by reference to the same extent as if each individual publication or patent application were specifically and individually indicated to be incorporated by reference. Where a definition or use of a term in an incorporated reference is inconsistent or contrary to the definition of that term provided herein, the definition of that term provided herein applies and the definition of that term in the reference does not apply.
[0011] As used in the description herein and throughout the claims that follow, the meaning of “a,” “an,” and “the” includes plural reference unless the context clearly dictates otherwise. Also, as used in the description herein, the meaning of “in” includes “in” and “on” unless the context clearly dictates otherwise.
[0012] The recitation of ranges of various text-to-speech or text-to-audio processing herein is merely intended to serve as a shorthand method of referring individually to each separate technologies falling within the range. Unless otherwise indicated herein, each individual spare part is incorporated into the specification as if it were individually recited herein. All methods described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The use of any and all examples, or exemplary language (e.g., “such as”) provided with respect to certain embodiments herein is intended merely to better illuminate the invention and does not pose a limitation on the scope of the invention otherwise claimed. No language in the specification should be construed as indicating any non-claimed element essential to the practice of the invention.
[0013] Groupings of alternative elements or embodiments of the invention disclosed herein are not to be construed as limitations. Each group member can be referred to and claimed individually or in any combination with other members of the group or other elements found herein. One or more members of a group can be included in, or deleted from, a group for reasons of convenience and/or patentability. When any such inclusion or deletion occurs, the specification is herein deemed to contain the group as modified, thus fulfilling the written description of all Markush groups used in the appended claims.
OBJECTS OF THE INVENTION
[0014] It is an object of the present disclosure which provides a system for converting and processing of textual input to deliver audio output.
[0015] It is an object of the present disclosure which provides a method for converting and processing text to audio output for its users.
[0016] It is an object of the present disclosure which provides a system and method for identifying and storing the various components of textual input for further processing through conversion into audio output.
[0017] It is another object of the present disclosure which provides a system and method to convert the textual document into audio output to deliver the enhanced textual content in an audio format.
SUMMARY
[0018] The present invention disclosure relates to a system and method for converting, processing and delivering of text input into audio output. It discloses a system wherein text is administered as an input and the same is subjected to processing to deliver as an audio output to its users.
[0019] In an aspect of the present disclosure relates to a system and method of converting, processing and delivering of text input into audio output. The method discloses an improved text-to- audio speech processing that converts text from an electronic textual input into an audio output that includes speech associated with the text as well as audio contextual cues.
[0020] In Another aspect of the present disclosure, the disclosure sites that the invention can be implemented in numerous ways, including as a method, system with several embodiments of the invention are disclosed herein. As a computer-implemented method for converting text to audio speech, an embodiment of the invention includes at least: selecting a document or textual input to be converted to audio; parsing the selected document into constituent components; converting text in the selected document to audio in correspondence to the digital voice library; and creating an audio file based on the converted text.
[0021] One should appreciate that although the present disclosure has been explained with respect to a defined set of functional modules, any other module or set of modules can be added/deleted/modified/combined and any such changes in architecture/construction of the proposed system are completely within the scope of the present disclosure. Each module can also be fragmented into one or more functional sub-modules, all of which also completely within the scope of the present disclosure.
[0022] Various objects, features, aspects and advantages of the inventive subject matter will become more apparent from the following detailed description of preferred embodiments, along with the accompanying drawing figures in which like numerals represent like components.
BRIEF DESCRIPTION OF THE DRAWINGS
[0023] The accompanying drawings are included to provide a further understanding of the present disclosure, and are incorporated in and constitute a part of this specification. The drawings illustrate exemplary embodiments of the present disclosure and, together with the description, serve to explain the principles of the present disclosure.
[0024] FIG 1 illustrates the various components [100] involved in the system for conversion of input text [102] to output audio [108].
[0025] FIG. 2 illustrates exemplary method [200] for the conversion of the textual content administered as an input [202] to an audio output [212] maintaining the proficiency of the textual content converted for better understanding of the user.
DETAILED DESCRIPTION
[0026] Aspects of the present disclosure relate to a system and method for converting, processing and delivering of text input as audio output wherein there are textual content are administered as an input for processing, converting and delivering the same as an audio output ensuring that the various textual components of the text or document is maintained as the present disclosure relates to a method for converting text to audio speech by selecting a document or textual input to be converted to audio; parsing the selected document into constituent components; converting text in the selected document to audio in correspondence to the digital voice library; and creating an audio file based on the converted text.
[0027] In the following description, numerous specific details are set forth in order to provide a thorough understanding of embodiments of the present invention. It will be apparent to one skilled in the art that embodiments of the present invention may be practiced without some of these specific details.
[0028] Embodiments of the present invention include various steps, which will be described below. The steps may be performed by hardware components or may be embodied in machine-executable instructions, which may be used to cause a general-purpose or special-purpose processor programmed with the instructions to perform the steps. Alternatively, steps may be performed by a combination of hardware, software, and firmware and/or by human operators.
[0029] Embodiments of the present invention may be provided as a computing device, which may include one or more storage medium tangibly embodying thereon instructions and unique identities of the device, the instruction may be used to prevent the unauthorized user to alter/erase the unique identities of the device. The storage mediums may include, but is not limited to, semiconductor memories, such as ROMs, PROMs, random access memories (RAMs), programmable read-only memories (PROMs), erasable PROMs (EPROMs), electrically erasable PROMs (EEPROMs), flash memory, or other type of media/machine-readable medium suitable for storing unique ID(s) of the device and electronic instructions (e.g., computer programming code, such as software or firmware).
[0030] Various methods described herein may be practiced by combining one or more machine-readable storage media containing the code/instruction according to the present invention with appropriate standard device hardware to execute the instruction contained therein. An apparatus for practicing various embodiments of the present invention may involve one or more computers (say server) (or one or more processors within a single computer) and storage systems containing or having network access to cloud server synced with multiple users in accordance with various methods described herein, and the method steps of the invention could be accomplished by following the methods stated or with modifications or minor alterations in the technique.
[0031] If the specification states a component or feature “may”, “can”, “could”, or “might” be included or have a characteristic, that particular component or feature is not required to be included or have the characteristic.
[0032] Although the present invention has been described with relation to text-to-audio processing, it should be appreciated that the same has been done merely to illustrate the invention in an exemplary manner and any other purpose or function for which the explained structure or configuration can be used, is covered within the scope of the present disclosure.
[0033] Exemplary embodiments will now be described more fully hereinafter with reference to the accompanying drawings, in which exemplary embodiments are shown. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. These embodiments are provided so that this disclosure will be thorough and complete and will fully convey the scope of the invention to those of ordinary skill in the art. Moreover, all statements herein reciting embodiments of the invention, as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Additionally, it is intended that such equivalents include both currently known equivalents as well as equivalents developed in the future (i.e., any elements developed that perform the same function, regardless of structure).
[0034] Various terms as used herein are shown below. To the extent a term used in a claim is not defined below, it should be given the broadest definition persons in the pertinent art have given that term as reflected in printed publications and issued patents at the time of filing.
[0035] In an embodiment of the present disclosure, FIG 1 illustrates the components for text-to-audio processing [100] comprising an electronic device wherein the text is administered as an input [102], a look-up logic module [104] for parsing the input text into various speech components, a digital voice library [106] for matching the corresponding segmented text inputs and structuring them to a sentence to deliver as audio output [108].
[0036] In an embodiment of the present disclosure, FIG 2 illustrates the exemplary method of text-to-audio processing [200] wherein the text to be converted is administered as an input [202] followed by the parsing of the input to various textual components [204]such as phrases, abbreviations, words, phonemes; looking for the components through the look-up logic module [206] following the identification of the corresponding sound recording previously stored in the digital voice library [208]; structuring the components to construct into a sentence [210] and consequently deliver as output audio [212].
[0037] In an aspect, the traditional method includes majorly system or/and technologies for mere conversion of text input to audio output. The mere conversion of text input to audio output often fails to distinguish and inflict differentiation on the textual components like the headings, sub-headings, body, footnotes. As a result, it sounds robotic translation which is unnatural and fails to render the textual content properly to the listeners.
[0038] In an aspect, the processing of text-to-speech or text-to-audio output is very much necessary as it is highly desirable for visually impaired individuals and also essentially demanded by people who prefers to listen on the go instead of reading. Thus, it is very much important to provide a system or a method for efficient conversion followed by processing of the audio after the text input to provide all the intricate details of the textual content inserted for conversion to an audio content. This would immensely help the visually impaired individuals to understand the written contents similarly over listening and also aid the listeners who would like to utilize their time on the go by listening to the audio output.
[0039] In an embodiment of the present disclosure, the present invention is a potential replacement for synthetic text-to-speech systems, and the digital voice library element act as a resource for other text-to-speech or text-to-voice systems. It includes a text-to-audio computerized system which may accept text as an input and provide high quality natural voice like audio output. According to the present invention, there is a digital voice library that has plurality of speeches with regard to various components of a text based on words, phrases, abbreviations, new words, syllables and others. This library is utilized to generated the audio output that sounds more natural and ensures to maintain the various inflections of a textual content. The audio output is often processed and stored in Linear Predictive Coding (LPC), Codebook Excited Linear Prediction (CELP) or any other suitable file format which is used further in text-to-audio or text-to-speech systems.
[0040] The incoming text is analyzed wherein the text component is parsed into its constituent speech elements that are looked up via the look-up logic module for elucidating on various speech components. The components are then correspondingly matches with similar voice inflections by accessing the digital voice library. The voice components are arranged and structured to form sentence and then delivering the specific voice output.
[0041] The present disclosure relies on the access of digital voice library to look for corresponding text input in the library via certain look-up logic to match the respective inflection or pronunciation of the textual content. The input text is received at the input interface in the form of words, abbreviations, phrases, syllables and other suitable text form. In most cases certain documents are parsed into various speech components and looked up in the respective databases to find the corresponding voice recording in the digital voice library.
[0042] In an embodiment of the present disclosure, the components of a parsed documents for example abbreviations are looked up through the look-up module to convey the appropriate expanded word for the abbreviations in the abbreviation database [304] of the look-up control module based on analysis inorder to confer the same contextual information pertaining to the use of the abbreviation in the input text.
[0043] The aspect of the present disclosure relates to a system as well as a method for processing textual input to deliver an audio output maintaining the intricacies of the input text with various inflections and textual contents. After analyzing the abbreviations, it is the phrases in the text input that is looked up in the word phrases database [300] in the look-up logic module wherein the related or the same phrase is located and conveyed inorder to look for the subsequent voice recording in the digital library.
[0044] Once the voice recordings for the input textual context is analyzed pertaining to the meaning of the text input, then the segregated voice recordings are composed and arranged inorder to structure it to form suitable sentence. In certain embodiments, the structuring of the sentence is done using algorithms inorder to preserve the inflections pertaining to the text input.
[0045] Inflection and pitch changes that occurs when a sentence is spoken in a definite pattern and not based on the writing but however certain emphasis is regarded parts of the textual content which can be maintained and enhanced only through properly analysis of the segregated components of the sentences pertaining to the textual input to give the maximum possible similar output audio with regard to the textual input. During conversion of a textual content to a audio output, many times several words are not there in the database and these new words are subsequently added to the database alongwith with recreating their voice recordings based on breaking the words by syllables and retrieving the pronunciation of the syllables from the digital voice library.
[0046] The textual content is finally processed and structured to form a sentence and then delivered through an output device as audio wherein the output device constitutes a speaker, headphone or any audible device components. In certain embodiments, the audio output is stored in audio format such as Linear Predictive Coding (LPC), Codebook Excited Linear Prediction (CELP), or any other formats that may be used in either text-to-speech or text-to-voice Systems so that they can be listened to repeatedly by multiple users. In certain embodiments of the present disclosure, the multiple users listening to the audio output might be connected over a network server.
[0047] In another aspect of the present disclosure, it overcomes the persistent limitation of the traditional state of art as it ensures to manage creation, delivery and consumption of audio outputs that provide speech associated with documents providing the requisite textual content to the users or the listeners as per the various components disclosed in the textual document in the form of an enhanced audio output.
[0048] While the foregoing describes various embodiments of the invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof. The scope of the invention is determined by the claims that follow. The invention is not limited to the described embodiments, versions or examples, which are included to enable a person having ordinary skill in the art to make and use the invention when combined with information and knowledge available to the person having ordinary skill in the art.
| # | Name | Date |
|---|---|---|
| 1 | 202241007576-STATEMENT OF UNDERTAKING (FORM 3) [12-02-2022(online)].pdf | 2022-02-12 |
| 2 | 202241007576-REQUEST FOR EARLY PUBLICATION(FORM-9) [12-02-2022(online)].pdf | 2022-02-12 |
| 3 | 202241007576-POWER OF AUTHORITY [12-02-2022(online)].pdf | 2022-02-12 |
| 4 | 202241007576-FORM-9 [12-02-2022(online)].pdf | 2022-02-12 |
| 5 | 202241007576-FORM 1 [12-02-2022(online)].pdf | 2022-02-12 |
| 6 | 202241007576-DRAWINGS [12-02-2022(online)].pdf | 2022-02-12 |
| 7 | 202241007576-DECLARATION OF INVENTORSHIP (FORM 5) [12-02-2022(online)].pdf | 2022-02-12 |
| 8 | 202241007576-COMPLETE SPECIFICATION [12-02-2022(online)].pdf | 2022-02-12 |