Abstract: This disclosure relates generally to the text-to-speech synthesis and more particularly to a system and method for rendering textual messages using customized natural voice. In one embodiment, a system for rendering textual messages using customized natural voice, is disclosed, comprising a processor and a memory communicatively coupled to the processor. The memory stores processor instructions, which, on execution, causes the processor to receive present textual messages and at least one of previous textual messages, response to the previous textual messages or receiver’s context. The processor further predicts final emotional state of sender’s customized natural voice based on an intermediate emotional state and the receiver’s context. The processor further synthesizes the sender’s customized natural voice based on the predicted final emotional state of the sender’s customized natural voice, voice samples and voice parameters associated with the at least one sender. FIG.2
Claims:WE CLAIM
1. A method of rendering one or more present textual messages using customized natural voice of at least one sender, the method comprising:
receiving, by a customized voice synthesizer, the one or more present
textual messages from the at least one sender and at least one of one or more previous textual messages from the at least one sender, response to the one or more previous textual messages or receiver’s context;
predicting, by the customized voice synthesizer, final emotional state of sender’s customized natural voice based on an intermediate emotional state and the receiver’s context, wherein the intermediate emotional state is based on emotions associated with the at least one of the one or more present textual messages, the one or more previous textual messages, the response to the one or more previous textual messages or the receiver of the one or more present textual messages; and
synthesizing, by the customized voice synthesizer, the sender’s customized natural voice based on the predicted final emotional state of the sender’s customized natural voice, voice samples and voice parameters associated with the at least one sender.
2. The method as claimed in claim 1, further comprising generating a voice dataset, wherein the voice dataset comprises at least one of the voice parameters and the voice samples associated with the at least one sender.
3. The method as claimed in claim 2, wherein the voice samples is determined from at least one of voice calls, videos, audios, social sites, public domains or previously built databases.
4. The method as claimed in claim 1, wherein the voice parameters comprise at least one of pitch, rate, quality of voice, amplitude, style of speaking, tone, user’s pronunciation, prosody or pause taken in between each sentences.
5. The method as claimed in claim 1, wherein the receiver’s context comprises at least one of receiver’s location, receiver’s state, receiver’s health condition or receiver’s preferences.
6. The method as claimed in claim 1, further comprising summarizing the one or more present textual messages based on at least one of content of the one or more present textual messages and the receiver’s preferences.
7. The method as claimed in claim 1, wherein the final emotional state of sender’s customized natural voice is predicted using deep learning techniques.
8. The method as claimed in claim 1, wherein predicting the final emotional state of sender’s customized natural voice comprises:
determining, by a deep neural network, an intermediate emotional vector
that is associated with the intermediate emotional state;
assigning, by the deep neural network, weightages to the intermediate
emotional vector and the receiver’s context based on at least one of time lapse between receiving the one or more previous textual messages and receiving the one or more present textual messages, time lapse between receiving the response to the one or more previous textual messages and receiving the one or more present textual messages, overall emotion associated with the one or more present textual messages and the receiver associated with the one or more present textual messages; and
predicting, by the deep neural network, a final emotional vector based on the intermediate emotional vector and the weightages, wherein the final emotional vector is associated with the final emotional state of sender’s customized natural voice.
9. The method as claimed in claim 1, wherein synthesizing the customized natural voice is done using deep learning techniques.
10. A system for rendering one or more present textual messages using customized natural voice of at least one sender, the system comprising:
a processor;
a memory communicatively coupled to the processor, wherein the memory stores the processor-executable instructions, which, on execution, causes the processor to:
receive the one or more present textual messages from the at least one
sender and at least one of one or more previous textual messages from the at least one sender, response to the one or more previous textual messages or receiver’s context;
predict final emotional state of sender’s customized natural voice based on an intermediate emotional state and the receiver’s context, wherein the intermediate emotional state is based on emotions associated with the at least one of the one or more present textual messages, the one or more previous textual messages, the response to the one or more previous textual messages or the receiver of the one or more present textual messages; and
synthesize the sender’s customized natural voice based on the predicted final emotional state of sender’s customized natural voice, voice samples and voice parameters associated with the at least one sender.
11. The system as claimed in claim 10, wherein the processor is further configured to generate a voice dataset, wherein the voice dataset comprises at least one of the voice parameters and the voice samples associated with the at least one sender.
12. The system as claimed in claim 11, wherein the voice samples is determined from at least one of voice calls, videos, audios, social sites, public domains or previously built databases.
13. The system as claimed in claim 10, wherein the voice parameters comprise at least one of pitch, rate, quality of voice, amplitude, style of speaking, tone, user’s pronunciation, prosody or pause taken in between each sentences.
14. The system as claimed in claim 10, wherein the receiver’s context comprises at least one of receiver’s location, receiver’s state, receiver’s health condition or receiver’s preferences.
15. The system as claimed in claim 10, wherein the processor is further configured to summarize the one or more present textual messages based on at least one of content of the one or more present textual messages or the receiver’s preferences.
16. The system as claimed in claim 10, wherein the final emotional state of sender’s customized natural voice is predicted using deep learning techniques.
17. The system as claimed in claim 10, wherein the processor is configured to predict the final emotional state of the sender’s customized natural voice by:
determining an intermediate emotional vector that is associated with the
intermediate emotional state;
assigning weightages to the intermediate emotional vector and the
receiver’s context based on at least one of time lapse between receiving the one or more previous textual messages and receiving the one or more present textual messages, time lapse between receiving the response to the one or more previous textual messages and receiving the one or more present textual messages, overall emotion associated with the one or more present textual messages and the receiver associated with the one or more present textual messages; and
predicting the final emotional vector based on the intermediate emotional vector and the weightages, wherein the final emotional vector is associated with the final emotional state of sender’s customized natural voice.
18. The system as claimed in claim 10, wherein synthesizing the customized natural voice is done using deep learning techniques.
Dated this 31st day of March, 2017
Swetha SN
Of K&S Partners
Agent for the Applicant
, Description:TECHNICAL FIELD
This disclosure relates generally to the text-to-speech synthesis and more particularly to a system and method for rendering textual messages using customized natural voice.
| # | Name | Date |
|---|---|---|
| 1 | Power of Attorney [31-03-2017(online)].pdf | 2017-03-31 |
| 2 | Form 5 [31-03-2017(online)].pdf | 2017-03-31 |
| 3 | Form 3 [31-03-2017(online)].pdf | 2017-03-31 |
| 4 | Form 18 [31-03-2017(online)].pdf_67.pdf | 2017-03-31 |
| 5 | Form 18 [31-03-2017(online)].pdf | 2017-03-31 |
| 6 | Form 1 [31-03-2017(online)].pdf | 2017-03-31 |
| 7 | Drawing [31-03-2017(online)].pdf | 2017-03-31 |
| 8 | Description(Complete) [31-03-2017(online)].pdf_68.pdf | 2017-03-31 |
| 9 | Description(Complete) [31-03-2017(online)].pdf | 2017-03-31 |
| 10 | PROOF OF RIGHT [21-06-2017(online)].pdf | 2017-06-21 |
| 11 | Correspondence by Agent_Form 1_23-06-2017.pdf | 2017-06-23 |
| 12 | Abstract_201741011632.jpg | 2017-06-30 |
| 13 | 201741011632-REQUEST FOR CERTIFIED COPY [07-06-2018(online)].pdf | 2018-06-07 |
| 14 | 201741011632-Response to office action (Mandatory) [11-06-2018(online)].pdf | 2018-06-11 |
| 15 | 201741011632-PETITION UNDER RULE 137 [06-04-2021(online)].pdf | 2021-04-06 |
| 16 | 201741011632-Information under section 8(2) [06-04-2021(online)].pdf | 2021-04-06 |
| 17 | 201741011632-FORM 3 [06-04-2021(online)].pdf | 2021-04-06 |
| 18 | 201741011632-FER_SER_REPLY [06-04-2021(online)].pdf | 2021-04-06 |
| 19 | 201741011632-FER.pdf | 2021-10-17 |
| 20 | 201741011632-PatentCertificate06-06-2023.pdf | 2023-06-06 |
| 21 | 201741011632-IntimationOfGrant06-06-2023.pdf | 2023-06-06 |
| 22 | 201741011632-PROOF OF ALTERATION [11-09-2023(online)].pdf | 2023-09-11 |
| 1 | searchstrategyE_05-10-2020.pdf |