Smart Speech Synthesis System And Method Thereof

< Back

Smart Speech Synthesis System And Method Thereof

Abstract: SMART SPEECH SYNTHESIS SYSTEM AND METHOD THEREOF ABSTRACT A smart speech synthesis system (100) is disclosed. The system (100) is housed within an application server (108) with a dedicated processor (110). The system (100) operates in conjunction with a storage medium (112) containing executable programming instructions. A registration module (114) enables user registration via a computer application (104), acquiring user details through a designated user device (102). A data upload module (116) empowers users to upload vocal speech samples via their device's microphone, storing them in a database (106). The speech synthesis module (118) leverages Bidirectional Encoder Representations from Transformers (BERT) linguistic model to process and synthesize the uploaded sample, identifying corrections, and errors, and providing recommendations. A data transmission module (120) facilitates the seamless delivery of these feedback elements back to the user device (102). The system (100) offers a personalized and technologically advanced approach to speech enhancement and synthesis. Claims: 10, Figures: 4 Figure 1A is selected.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

05 December 2023

Publication Number

01/2024

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Applicants

SR University

SR University, Ananthasagar, Warangal, Telangana-506371, India (IN) Email ID: patent@sru.edu.in Mb: 08702818333

Inventors

1. Dr. Eranki. L. N.

Center for AI and CS, Ananthasagar, Hasanparthy (PO), Warangal-506371 India

Specification

Description:BACKGROUND
Field of Invention
[001] Embodiments of the present invention generally relate to a smart speech technology and particularly to a smart speech synthesis system.
Description of related art
[002] Speech is a fundamental mode of human communication, influencing various aspects of daily life, from personal interactions to professional engagements. The quality, clarity, and intelligibility of speech hold paramount importance in ensuring effective communication. However, individuals with speech impairments or linguistic challenges often face hurdles in conveying their messages.
[003] Existing speech improvement and synthesis systems provide valuable tools, but there remains a need for a more intelligent and adaptable solution. The “smart speech synthesis system” aims to address this gap by integrating cutting-edge technologies to enhance speech quality and facilitate seamless communication.
[004] There is thus a need for an improved and advanced smart speech synthesis system that can administer the aforementioned limitations in a more efficient manner.
SUMMARY
[005] Embodiments in accordance with the present invention provide a smart speech synthesis system. The system comprising: a processor located on an application server. The system further comprising: a storage medium comprising programming instructions executable by the processor. The storage medium comprises: a registration module configured to register a user using a computer application by receiving user details through a user device; and a data upload module configured to enable the user to upload a vocal speech sample to a database using a microphone of the user device; a speech synthesis module configured to synthesize and process the uploaded vocal speech sample on a Bidirectional Encoder Representations from Transformers (BERT) linguistic model for finding corrections, errors, recommendations, or a combination thereof on the uploaded vocal speech sample; and a data transmission module configured to transmit the corrections, the errors, the recommendations, or a combination thereof to the user device.
[006] Embodiments in accordance with the present invention further provide a method for speech improvement and synthesis. The method comprising steps of: registering a user using a computer application by receiving user details through a user device; enabling the user to upload a vocal speech sample to a database using a microphone of the user device; synthesizing and processing the uploaded vocal speech sample on a Bidirectional Encoder Representations from Transformers (BERT) linguistic model for finding corrections, errors, recommendations, or a combination thereof on the uploaded vocal speech sample; and transmitting the corrections, the errors, the recommendations, or a combination thereof to the user device.
[007] Embodiments of the present invention may provide a number of advantages depending on their particular configuration. First, embodiments of the present application may provide a smart speech synthesis system.
[008] Next, embodiments of the present application may provide a smart speech synthesis system that is easy to use and easy to understand.
[009] These and other advantages will be apparent from the present application of the embodiments described herein.
[0010] The preceding is a simplified summary to provide an understanding of some embodiments of the present invention. This summary is neither an extensive nor exhaustive overview of the present invention and its various embodiments. The summary presents selected concepts of the embodiments of the present invention in a simplified form as an introduction to the more detailed description presented below. As will be appreciated, other embodiments of the present invention are possible utilizing, alone or in combination, one or more of the features set forth above or described in detail below.
BRIEF DESCRIPTION OF THE DRAWINGS
[0011] The above and still further features and advantages of embodiments of the present invention will become apparent upon consideration of the following detailed description of embodiments thereof, especially when taken in conjunction with the accompanying drawings, and wherein:
[0012] FIG. 1A illustrates a block diagram of a smart speech synthesis system, according to an embodiment of the present invention;
[0013] FIG. 1B illustrates a storage medium of the smart speech synthesis system, according to an embodiment of the present invention;
[0014] FIG. 1C illustrates a model of the smart speech synthesis system, according to an embodiment of the present invention; and
[0015] FIG. 2 depicts a flowchart of a method for speech improvement and synthesis, according to an embodiment of the present invention.
[0016] The headings used herein are for organizational purposes only and are not meant to be used to limit the scope of the description or the claims. As used throughout this application, the word "may" is used in a permissive sense (i.e., meaning having the potential to), rather than the mandatory sense (i.e., meaning must). Similarly, the words “include”, “including”, and “includes” mean including but not limited to. To facilitate understanding, like reference numerals have been used, where possible, to designate like elements common to the figures. Optional portions of the figures may be illustrated using dashed or dotted lines, unless the context of usage indicates otherwise.
DETAILED DESCRIPTION
[0017] The following description includes the preferred best mode of one embodiment of the present invention. It will be clear from this description of the invention that the invention is not limited to these illustrated embodiments but that the invention also includes a variety of modifications and embodiments thereto. Therefore, the present description should be seen as illustrative and not limiting. While the invention is susceptible to various modifications and alternative constructions, it should be understood, that there is no intention to limit the invention to the specific form disclosed, but, on the contrary, the invention is to cover all modifications, alternative constructions, and equivalents falling within the scope of the invention as defined in the claims.
[0018] In any embodiment described herein, the open-ended terms "comprising", "comprises”, and the like (which are synonymous with "including", "having” and "characterized by") may be replaced by the respective partially closed phrases "consisting essentially of", “consists essentially of", and the like or the respective closed phrases "consisting of", "consists of”, the like.
[0019] As used herein, the singular forms “a”, “an”, and “the” designate both the singular and the plural, unless expressly stated to designate the singular only.
[0020] FIG. 1A illustrates a block diagram of a smart speech synthesis system 100 (hereinafter referred to as the system 100), according to an embodiment of the present invention. in an embodiment of the present invention, the system 100 may work on a Bidirectional Encoder Representations from Transformers (BERT) linguistic model. The Bidirectional Encoder Representations from Transformers (BERT) linguistic mode may be bidirectional rectified, in an embodiment of the present invention.
[0021] According to an embodiment of the present invention, the system 100 may comprise a user device 102, a computer application 104, a database 106, an application server 108, a processor 110, and a storage medium 112.
[0022] In an embodiment of the present invention, the user device 102 may be a device used by a user to upload a vocal speech sample to the system 100. The user device 102 may further be configured to receive corrections, errors, and recommendations from the system 100, in an embodiment of the present invention. The user device 102 may be, but not limited to, a personal computer, a consumer device, and alike. Embodiments of the present invention are intended to include or otherwise cover any type of the user device 102 including known, related art, and/or later developed technologies. In an embodiment of the present invention, the personal computer may be, but not limited to, a desktop, a server, a laptop, and alike. Embodiments of the present invention are intended to include or otherwise cover any type of the personal computer including known, related art, and/or later developed technologies.
[0023] Further, in an embodiment of the present invention, the consumer device may be, but not limited to, a tablet, a mobile phone, a notebook, a netbook, a smartphone, a wearable device, and so forth. Embodiments of the present invention are intended to include or otherwise cover any type of the consumer device including known, related art, and/or later developed technologies.
[0024] According to an embodiment of the present invention, the user device 102 may comprise software applications such as, but not limited to, a photo editing application, a video editing application, an audio editing application, and the like. In a preferred embodiment of the present invention, the user device 102 may comprise the computer application 104 which may be a computer-readable program installed in the user device 102 for executing functions associated with the system 100.
[0025] In an embodiment of the present invention, the computer application 104 when logged in by an admin may provide an admin-related interface for operating the system 100. The computer application 104 when logged in by the user may provide a user-related interface for operating the system 100, in an embodiment of the present invention.
[0026] In an embodiment of the present invention, the database 106 may be adapted to store the vocal speech sample uploaded by the user using the user device 102. In another embodiment of the present invention, the database 106 may store the user details. According to embodiments of the present invention, the user details may be, but not limited to, a username, a user age, a user gender, a password, a point of contact of the user, and so forth. Embodiments of the present invention are intended to include or otherwise cover any details associated with the user that may be stored in the database 106, including known, related art, and/or later developed technologies.
[0027] According to embodiments of the present invention, the database 106 may be for example, but not limited to, a distributed database, a personal database, an end-user database, a commercial database, a Structured Query Language (SQL) database, a non-SQL database, an operational database, a relational database, an object-oriented database, a graph database, a cloud server database, and so forth. Embodiments of the present invention are intended to include or otherwise cover any type of the database 106 including known, related art, and/or later developed technologies.
[0028] Further, the database 106 may be a cloud server database, in an embodiment of the present invention. In an embodiment of the present invention, the cloud server may be remotely located. In an exemplary embodiment of the present invention, the cloud server may be a public cloud server. In another exemplary embodiment of the present invention, the cloud server may be a private cloud server. In yet another embodiment of the present invention, the cloud server may be a dedicated cloud server. According to embodiments of the present invention, the cloud server may be, but not limited to, a Microsoft Azure cloud server, an Amazon AWS cloud server, a Google Compute Engine (GEC) cloud server, an Amazon Elastic Compute Cloud (EC2) cloud server, and so forth. Embodiments of the present invention are intended to include or otherwise cover any type of the cloud server including known, related art, and/or later developed technologies.
[0029] In an embodiment of the present invention, the application server 108 may be a hardware on which the processor 110 may be installed. According to embodiments of the present invention, the application server 108 may be, but not limited to, a motherboard, a wired board, a mainframe, and so forth. Embodiments of the present invention are intended to include or otherwise cover any type of the application server 108, including known, related art, and/or later developed technologies.
[0030] In an embodiment of the present invention, the processor 110 may be located on the application server 108. The processor 110 may be configured to execute the computer-readable instructions to generate an output relating to the system 100. According to embodiments of the present invention, the processor 110 may be, but not limited to, a Programmable Logic Control (PLC) unit, a microprocessor, a development board, and so forth. Embodiments of the present invention are intended to include or otherwise cover any type of the processor 110 including known, related art, and/or later developed technologies.
[0031] In an embodiment of the present invention, the storage medium 112 may store the computer programmable instructions in form of programming modules. The storage medium 112 may be a non-transitory storage medium, in an embodiment of the present invention. The storage medium 112 may communicate with the processor 110 and execute a computer-readable set of instructions present in storage medium 112, in an embodiment of the present invention.
[0032] According to embodiments of the present invention, the storage medium 112 may be, but not limited to, a Random-Access Memory (RAM), a Static Random-access Memory (SRAM), a Dynamic Random-access Memory (DRAM), a Read Only Memory (ROM), an Erasable Programmable Read-only Memory (EPROM), an Electrically Erasable Programmable Read-only Memory (EEPROM), a NAND Flash, a Secure Digital (SD) memory, a cache memory, a Hard Disk Drive (HDD), a Solid-State Drive (SSD) and so forth. Embodiments of the present invention are intended to include or otherwise cover any type of the storage medium 112, including known, related art, and/or later developed technologies. In an embodiment of the present invention, the storage medium 112 may further be explained in conjunction with FIG. 1B.
[0033] FIG. 1B illustrates the storage medium 112 of the system 100, according to an embodiment of the present invention. The storage medium 112 may comprise the computer-executable instructions in form of programming modules such as a registration module 114, a data upload module 116, a speech synthesis module 118, and a data transmission module 120.
[0034] In an embodiment of the present invention, the registration module 114 may be configured to register the user on the system 100 using the computer application 104 by receiving the user details. Upon successful registration, the registration module 114 may generate an identification name and a password for the corresponding registered user, in an embodiment of the present invention. In an embodiment of the present invention, the identification name and the password generated may be a series of characters. According to embodiments of the present invention, the character may be, but not limited to, an alphabetical character, a numerical character, a special character, and so forth. Embodiments of the present invention are intended to include or otherwise cover any type and any number of characters in the identification name and the password generated by the registration module 114, including known, related art, and/or later developed technologies.
[0035] Further, after the successful generation of the identification name and the password, the registered user may be eligible to log into the computer application 104 using the identification name and the password generated using the registration module 114. Further, after logging in to the computer application 104 the registered user may be presented with the user-related interface, after a successful login process of the user, the registration module 114 may transmit an activation signal to activate the data upload module 116.
[0036] In an embodiment of the present invention, the data upload module 116 may be activated upon receipt of the activation signal to the registration module 114. The data upload module 116 may be configured to enable the user to upload the vocal speech sample to the database 106 using a microphone of the user device 102, in an embodiment of the present invention. Upon uploading the vocal speech sample, the data upload module 116 may transmit a synthesis signal to the speech synthesis module 118.
[0037] In an embodiment of the present invention, the speech synthesis module 118 may be activated upon receipt of the synthesis signal from the data upload module 116. The speech synthesis module 118 may be configured to synthesize and process the uploaded vocal speech sample on the Bidirectional Encoder Representations from Transformers (BERT) linguistic model for finding corrections, errors, recommendations, and so forth on the uploaded vocal speech sample, in an embodiment of the present invention. Upon finding the the corrections, the errors, the recommendations, the speech synthesis module 118 may transmit a data transmission signal to the data transmission module 120.
[0038] In an embodiment of the present invention, the data transmission module 120 may be activated upon receipt of the data transmission signal from the speech synthesis module 118. The data transmission module 120 may be configured to transmit the corrections, the errors, the recommendations, or a combination thereof to the user device 102, in an embodiment of the present invention.
[0039] FIG. 1C illustrates a model 122 of the system 100, according to an embodiment of the present invention. In an embodiment of the present invention, the model 122 may be the Bidirectional Encoder Representations from Transformers (BERT) linguistic model. The Bidirectional Encoder Representations from Transformers (BERT) linguistic mode may be the bidirectional rectified, in an embodiment of the present invention.
[0040] FIG. 2 depicts a flowchart of a method 200 for speech improvement and synthesis using the system 100, according to an embodiment of the present invention.
[0041] At step 202, the system 100 may register the user using the computer application 104 by receiving user details through the user device 102.
[0042] At step 204, the system 100 may enable the user to upload the vocal speech sample to the database 106 using the microphone of the user device 102.
[0043] At step 206, the system 100 may synthesize and process the uploaded vocal speech sample on the Bidirectional Encoder Representations from Transformers (BERT) linguistic model for finding the corrections, the errors, the recommendations, and so forth on the uploaded vocal speech sample.
[0044] At step 208, the system 100 may transmit the corrections, the errors, the recommendations, and so forth to the user device 102.
[0045] While the invention has been described in connection with what is presently considered to be the most practical and various embodiments, it is to be understood that the invention is not to be limited to the disclosed embodiments, but on the contrary, is intended to cover various modifications and equivalent arrangements included within the scope of the appended claims.
[0046] This written description uses examples to disclose the invention, including the best mode, and also to enable any person skilled in the art to practice the invention, including making and using any devices or systems and performing any incorporated methods. The patentable scope of the invention is defined in the claims, and may include other examples that occur to those skilled in the art. Such other examples are intended to be within the scope of the claims if they have structural elements that do not differ from the literal language of the claims, or if they include equivalent structural elements within substantial differences from the literal languages of the claims. , Claims:CLAIMS
I/We Claim:
1. A smart speech synthesis system (100), the system (100) comprising:
a processor (110) located on an application server (108); and
a storage medium (112) comprising programming instructions executable by the processor (110), characterised in that the storage medium (112) comprises:
a registration module (114) configured to register a user using a computer application (104) by receiving user details through a user device (102);
a data upload module (116) configured to enable the user to upload a vocal speech sample to a database (106) using a microphone of the user device (102);
a speech synthesis module (118) configured to synthesize and process the uploaded vocal speech sample on a Bidirectional Encoder Representations from Transformers (BERT) linguistic model for finding corrections, errors, recommendations, or a combination thereof on the uploaded vocal speech sample; and
a data transmission module (120) configured to transmit the corrections, the errors, the recommendations, or a combination thereof to the user device (102).
2. The system (100) as claimed in claim 1, wherein the Bidirectional Encoder Representations from Transformers (BERT) linguistic model is a bidirectional rectified.
3. The system (100) as claimed in claim 1, wherein the database (106) is a cloud database.
4. The system (100) as claimed in claim 1, wherein the database (106) is configured to store the received user details.
5. The system (100) as claimed in claim 4, wherein the user details are selected from a username, a user age, a user gender, a password, a point of contact of the user, or a combination thereof.
6. The system (100) as claimed in claim 1, wherein the computer application (104) is installed on a user device (102).
7. A method (200) for speech improvement and synthesis using a smart speech synthesis system (100), the method (200) characterised by steps of:
registering a user using a computer application (104) by receiving user details through a user device (102);
enabling the user to upload a vocal speech sample to a database (106) using a microphone of the user device (102);
synthesizing and processing the uploaded vocal speech sample on a Bidirectional Encoder Representations from Transformers (BERT) linguistic model for finding corrections, errors, recommendations, or a combination thereof on the uploaded vocal speech sample; and
transmitting the corrections, the errors, the recommendations, or a combination thereof to the user device (102).
8. The method (200) as claimed in claim 7, wherein the computer application (104) is installed on a user device (102).
9. The method (200) as claimed in claim 7, wherein the database (106) is a cloud database.
10. The method (200) as claimed in claim 7, wherein the database (106) is configured to store the received user details.
Date: November 29, 2023
Place: Noida

Dr. Keerti Gupta
Agent for the Applicant
(IN/PA-1529)

Documents

Application Documents

#	Name	Date
1	202341082957-STATEMENT OF UNDERTAKING (FORM 3) [05-12-2023(online)].pdf	2023-12-05
2	202341082957-REQUEST FOR EARLY PUBLICATION(FORM-9) [05-12-2023(online)].pdf	2023-12-05
3	202341082957-POWER OF AUTHORITY [05-12-2023(online)].pdf	2023-12-05
4	202341082957-OTHERS [05-12-2023(online)].pdf	2023-12-05
5	202341082957-FORM-9 [05-12-2023(online)].pdf	2023-12-05
6	202341082957-FORM FOR SMALL ENTITY(FORM-28) [05-12-2023(online)].pdf	2023-12-05
7	202341082957-FORM 1 [05-12-2023(online)].pdf	2023-12-05
8	202341082957-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [05-12-2023(online)].pdf	2023-12-05
9	202341082957-EDUCATIONAL INSTITUTION(S) [05-12-2023(online)].pdf	2023-12-05
10	202341082957-DRAWINGS [05-12-2023(online)].pdf	2023-12-05
11	202341082957-DECLARATION OF INVENTORSHIP (FORM 5) [05-12-2023(online)].pdf	2023-12-05
12	202341082957-COMPLETE SPECIFICATION [05-12-2023(online)].pdf	2023-12-05
13	202341082957-Proof of Right [15-02-2024(online)].pdf	2024-02-15