Abstract: A voice crafting system adapted to craft, in accordance with pre-determined rules, a song sung by a non-recognised user, said sung song comprising voice components and melody components, to include voice components and melody components of a recognised user, said system comprising: database means adapted to store at least one song of at least one recognised user having its constituent voice components and melody components; selecting means adapted to select at least one song from said database means; first parsing means adapted to parse said selected song into its constituent voice components and its constituent melody components in accordance with pre-defined rules to provide first voice components and first melody components; first feature extraction means adapted to extract pre-defined features from said first voice components and said first melody components in relation to said recognised user; receiving and recording means adapted to receive and record a song sung by a non-recognised user of said system; second parsing means adapted to parse said received recorded song into its constituent voice components and its constituent melody components in accordance with pre-defined rules to provide second voice components and second melody components; separating means adapted to separate said parsed voice components and said parsed melody components; second feature extracting means adapted to extract pre-defined features from said second voice components and said second melody components in relation to said non-recognised user; mapping means adapted to map first voice components with second voice components and first melody components with second melody components; and feature processing means adapted to process said extracted pre-defined features from said mapped voice components and said mapped melody components to provide a voice crafted song.
FORM-2
THE PATENTS ACT, 1970
(39 of 1970)
&
THE PATENTS RULES, 2006
PROVISIONAL SPECIFICATION
(See section 10 and rule 13)
SYSTEM FOR VOICE CRAFTING
TATA CONSULTANCY SERVICES LTD.,
an Indian Company,
of Nirmal Building,
9th floor, Nariman Point.
Mumbai - 400 021, Maharashtra, India
The following specification particularly Describes the nature of the invention.
FIELD OF THE INVENTION
The present invention relates to the field of speech analysis and synthesis. Particularly, the present invention relates to the field of analyzing speech of one person and synthesizing it to sound like another person.
BACKGROUND OF THE INVENTION AND PRIOR ART
Songs are an integral part of cinema and the entertainment industry. Very rarely will one come across an Indian movie without a song. More often a playback singer, different from the performer/actor in the movie is employed to render the song. The playback singer has to mimic the voice of the actor so that the final rendering of the song sounds like being sung by the performer/actor and hence integrate into the overall cinema experience.
However, in an attempt to mimic the performer/actor the singer is not able to concentrate on the melody of the song, thus leading to multiple retakes of the song recording which in turn increases time as well as, cost.
Therefore, there is a need for a system which will enable the singers to focus on the song and will add the required personalized touch of the performer/actor by merging the melody of the expert singer with the voice and expressions of the performers/actors.
OBJECTS OF THE INVENTION
It is an object of the present invention to provide a system which enables the song to retain the voice characteristics of the amateur while sounding like an expert in terms of melody.
2
It is another object of the present invention to provide a system which enables an amateur singer to sound like an expert singer even when the amateur has no formal training in singing.
It is still another object of the present invention to provide a system which enables the playback singer to concentrate on the melody of the song rather than try to mimic another person.
It is yet another object of the present invention to provide a system which generates the craned song having the melody of the expert while retaining the voice characteristics of the amateur.
SUMMARY OF THE INVENTION
The present invention envisages a system for voice crafting. In accordance with the preferred embodiment of the present invention, the system for voice crafting processes an amateur singers utterance of the lyrics to make it sound like a song sung by an expert singer. The present invention retains the voice characteristics of the amateur but makes it sound like an expert in terms of melody and rendering.
In accordance with one aspect of the present invention, the essential mechanism that accomplishes voice crafting is the ability to separate out the voice and the melody characteristics from the voice using speech signal processing and 'intelligently' overlaying the melody characteristics of the expert on to the voice characteristics of the amateur.
In accordance with an embodiment of the present invention, the system comprises an input means adapted to receive the same segment of speech or song spoken or sung by both an amateur and an expert. The segment of the voices of the amateur and the expert are sent to the feature extraction means. The feature extraction means further comprises a voiced unvoiced detection means for detecting the voiced and the unvoiced regions in the speech waveform, a pitch extraction means which marks the pitch periods from the voiced regions and a pitch synchronous linear predictor feature extraction means which for each marked pitch period extracts the Linear Predictor Coefficients (LPC).
In accordance with another embodiment of the present invention, the extracted features of the expert and amateur speech are then sent to a voice crafting means which by means of feature substitution, feature deletion and feature insertion produces an output speech sample which is melodious while sounding like it has been sung by the amateur.
In accordance with the present invention, the system for voice crafting finds
use in movie dubbing and in the music industry, specifically, in the
entertainment industry, cinema and television. Voice crafting enables a song
to sound as if it has actually been sung by the actor but with the playback
singer's expertise.
BRIEF DESCRIPTION OF ACCOMPANYING DRAWINGS
The invention will now be described in relation to the accompanying
drawings, in which:
Figure 1 illustrates a schematic of the voice crafting system;
4
-1 JUL 2009
Figure 2 illustrates the pitch synchronous feature extraction from the speech waveforms, in accordance with the present invention; and Figure 3 illustrates dynamic time wrapping of the features extracted from the speech waveforms, in accordance with the present invention.
DESCRIPTION OF THE INVENTION
The system for voice crafting will now be described with reference to the accompanying drawings which do not limit the scope and ambit of the present invention. The description and the drawings are provided purely by way of example and illustration.
In accordance with the preferred embodiment of the present invention, the system for voice crafting processes an amateur singers utterance of the lyrics to make it sound like a song sung by an expert.
The essential mechanism that accomplishes voice crafting is the ability to separate out the voice and the melody characteristics from the voice using speech signal processing and 'intelligently' overlaying the melody characteristics of the expert on to the voice characteristics of the amateur.
-1 JUL 2009
Referring to the drawings, Figure 1 shows a block diagram of the voice crafting system, represented by reference numeral 10 of Figure 1. The system 10 comprises of three main components namely an input means represented by block 12 of Figure 1, a feature extraction means represented by block 14 of Figure 1, and a voice crafting means represented by block 22 of Figure 1.
In accordance with an embodiment of the present invention, the voice crafting system 10 requires the same segment of speech or song spoken or sung by both the amateur and the expert as its inputs. For the purpose of voice crafting the recorded music segment sung by an expert is represented by Xe(f) and the same music segment sung by an amateur is represented by Xa(f). These music segments are given as inputs to the input means 12.
In accordance with another embodiment of the present invention, the melody/prosody characteristic of the music segment sung by the expert is captured by using the feature extraction means 14 and then implanted onto the music segment sung by the amateur by the voice crafting means 22.
In accordance with still another embodiment of the present invention, the feature extraction means 14 comprises of a voiced and unvoiced detection means represented by block 16 of Figure 1, a pitch extraction means represented by block 18 of Figure 1, and a pitch synchronous linear predictor feature extraction means represented by block 20 of Figure 1.
The feature extraction means 14 receives the segments sung by the expert and the amateur. The feature extraction means 14 extracts the amateur and the expert singers pitch synchronous linear predictor coefficient/ features. The feature extraction means extracts Ne features from Xe for expert singer and Na features from Xa for amateur singer.
The voice signal generated by the expert singer is referred by Xe(f) further Xe(f) is represented as the product of Se(f) the source function and Ve(f) the vocal tract function, namely, Xe(f) = Se(f)Ve(f). Similarly, the voice signal generated by amateur Xa(f) = Sa(f)Va(f).
6
>■
These signals are then passed to the voiced and unvoiced detection means 16 which identifies the voiced and unvoiced regions in the speech waveform and extracts the voiced regions of the speech waveform and passes it to the pitch extraction means 18 which marks the pitch periods. The pitch synchronous linear predictor feature extraction means 20 then from each marked pitch period extracts the linear predictor coefficients as seen in Figure 2.
The pitch extraction means further calculates the average pitch of Xe and Xa and uses these averages to calculate the pitch scaling factor Pfac=Pa/Pe-
Further, the extracted features are sent to voice crafting means 22. Voice crafting means performs dynamic time wrapping to synchronize the expert singers Ne pitch synchronous features and the amateur's Na pitch synchronous features. The voice crafting means 22 scales the pitch duration of the expert singer by the pitch scaling factor Pfac.
Internally, the feature extraction means 14 extracts pitch synchronized salient features (coefficients) from the speech signal and the voice crafting means 22 uses these parameters to align the two speech segments using dynamic programming.
Referring to Figure 3, once the two speech samples are aligned, the voice and the melody characteristics embedded in the speech signal are extracted from bo.th the speech samples of the amateur and the expert. Now, the melody characteristics of the expert singer and the voice characteristics of the amateur signer are stitched together by the steps of feature substitution, feature deletion and feature insertion to produce a speech signal which
7
1 JUL 2009
produces an output speech sample which is melodious while 'sounding' like it has been sung by the amateur.
The craned signal, which has the voice characteristics of the amateur and the melody of the expert can be constructed as Xcraft(f) = Se'(f)Va(f) where Se'(f) is a modified (usually pitch scaled) form of Se(f).
A method for performing voice craning in accordance with the present invention includes the below steps:
1. From the speech waveform (Xe) of the expert singer pitch
synchronous linear predictor coefficient /features Ne are extracted.
This is done by first:
o identifying the voiced and the unvoiced regions in the speech
waveform; o extracting and marking the pitch from the voiced regions; and o extracting for each marked pitch period the linear predictor
coefficients (features);
2. From the speech waveform (Xa) of the amateur singer pitch
synchronous linear predictor coefficient /features Na are extracted.
This is done by first:
o identifying the voiced and the unvoiced regions in the speech
waveform; o extracting and marking the pitch from the voiced regions; and o extracting for each marked pitch period the linear predictor
coefficients (features);
3. calculating the average pitch of Xe (say Pe ) and average pitch of Xa
(say Pa) and calculating the pitch scaling factor, namely, Pfac = Pa/Pe;
8
1 JUL 2009
4. performing dynamic time warping to synchronize the expert's Ne
pitch synchronous features and the amateur's Na pitch synchronous
features. This usually involves three operations, feature substitution,
feature deletion and feature insertion;
5. scaling the pitch duration, of the expert sitiger by the pitch scaling factor Pfac; and
6. using the time warped features of the amateur and the scaled pitch duration of the expert to reconstruct speech waveform which is the crafted speech file.
The technical advancements of the present invention include:
• providing a system which enables a novice singer to sound like an expert singer even when the novice has no formal training in singing;
• providing a system which does not require the playback singer to mimic the person performing on the screen. This allows the playback singer to concentrate on the melody of the song rather than try to mimic another person; and
• providing a system which generates the crafted song having the melody of the expert while retaining the voice characteristics of the amateur.
While considerable emphasis has been placed herein on the particular features of this invention, it will be appreciated that various modifications can be made, and that many changes can be made in the preferred embodiments without departing from the principles of the invention. These and other modifications in the nature of the invention or the preferred embodiments will be apparent to those skilled in the art from the disclosure
9
-1 JUL 2009
herein, whereby it is to be distinctly understood that the foregoing descriptive matter is to be interpreted merely as illustrative of the invention and not as a limitation.
Dated this 30th day of J
MOHAN DEWAN
Of R. K. DEWAN & CO.
APPLICANT'S PATENT ATTORNEY
| # | Name | Date |
|---|---|---|
| 1 | 1555-MUM-2009-FORM 18(30-11-2010).pdf | 2010-11-30 |
| 2 | 1555-MUM-2009-CORRESPONDENCE(30-11-2010).pdf | 2010-11-30 |
| 3 | Other Patent Document [08-10-2016(online)].pdf | 2016-10-08 |
| 4 | Other Document [10-02-2017(online)].pdf | 2017-02-10 |
| 5 | Examination Report Reply Recieved [10-02-2017(online)].pdf | 2017-02-10 |
| 6 | Description(Complete) [10-02-2017(online)].pdf_307.pdf | 2017-02-10 |
| 7 | Description(Complete) [10-02-2017(online)].pdf | 2017-02-10 |
| 8 | Claims [10-02-2017(online)].pdf | 2017-02-10 |
| 9 | Abstract [10-02-2017(online)].pdf | 2017-02-10 |
| 10 | 1555-MUM-2009-Written submissions and relevant documents (MANDATORY) [18-10-2017(online)].pdf | 2017-10-18 |
| 11 | 1555-MUM-2009-Response to office action (Mandatory) [19-03-2018(online)].pdf | 2018-03-19 |
| 12 | 1555-MUM-2009-PatentCertificate31-03-2018.pdf | 2018-03-31 |
| 13 | 1555-MUM-2009-IntimationOfGrant31-03-2018.pdf | 2018-03-31 |
| 14 | RTOA_1555_ MUM_2009.pdf | 2018-08-10 |
| 15 | Form26.pdf | 2018-08-10 |
| 16 | CompleteSpecification_amended & Clean.pdf | 2018-08-10 |
| 17 | Claims_amended+Clean.pdf | 2018-08-10 |
| 18 | Abstract_amended & Clean.pdf | 2018-08-10 |
| 19 | abstract1.jpg | 2018-08-10 |
| 20 | 1555-MUM-2009_EXAMREPORT.pdf | 2018-08-10 |
| 21 | 1555-MUM-2009-ORIGINAL UNDER RULE 6 (1A)-261017.pdf | 2018-08-10 |
| 22 | 1555-MUM-2009-HearingNoticeLetter.pdf | 2018-08-10 |
| 23 | 1555-MUM-2009-FORM 5(1-7-2010).pdf | 2018-08-10 |
| 24 | 1555-mum-2009-form 3.pdf | 2018-08-10 |
| 25 | 1555-mum-2009-form 26.pdf | 2018-08-10 |
| 26 | 1555-mum-2009-form 2.pdf | 2018-08-10 |
| 28 | 1555-mum-2009-form 2(title page).pdf | 2018-08-10 |
| 29 | 1555-mum-2009-form 2(title page)-(provisional)-(1-7-2009).pdf | 2018-08-10 |
| 30 | 1555-MUM-2009-FORM 2(TITLE PAGE)-(1-7-2010).pdf | 2018-08-10 |
| 31 | 1555-mum-2009-form 2(1-7-2010).pdf | 2018-08-10 |
| 32 | 1555-mum-2009-form 1.pdf | 2018-08-10 |
| 33 | 1555-MUM-2009-FORM 1(8-12-2009).pdf | 2018-08-10 |
| 34 | 1555-mum-2009-form 1(1-7-2009).pdf | 2018-08-10 |
| 35 | 1555-mum-2009-drawing.pdf | 2018-08-10 |
| 36 | 1555-mum-2009-drawing(provisional)-(1-7-2009).pdf | 2018-08-10 |
| 37 | 1555-MUM-2009-DRAWING(1-7-2010).pdf | 2018-08-10 |
| 38 | 1555-mum-2009-description(provisional).pdf | 2018-08-10 |
| 40 | 1555-MUM-2009-DESCRIPTION(COMPLETE)-(1-7-2010).pdf | 2018-08-10 |
| 41 | 1555-mum-2009-correspondence.pdf | 2018-08-10 |
| 42 | 1555-MUM-2009-CORRESPONDENCE(9-8-2011).pdf | 2018-08-10 |
| 43 | 1555-MUM-2009-CORRESPONDENCE(8-12-2009).pdf | 2018-08-10 |
| 44 | 1555-MUM-2009-CORRESPONDENCE(1-7-2010).pdf | 2018-08-10 |
| 45 | 1555-MUM-2009-CLAIMS(1-7-2010).pdf | 2018-08-10 |
| 46 | 1555-MUM-2009-ABSTRACT(1-7-2010).pdf | 2018-08-10 |
| 47 | 1555-MUM-2009-Response to office action [23-04-2021(online)].pdf | 2021-04-23 |
| 48 | 1555-MUM-2009-RELEVANT DOCUMENTS [29-09-2021(online)].pdf | 2021-09-29 |
| 49 | 1555-MUM-2009-RELEVANT DOCUMENTS [26-09-2022(online)].pdf | 2022-09-26 |
| 50 | 1555-MUM-2009-RELEVANT DOCUMENTS [28-09-2023(online)].pdf | 2023-09-28 |
| 51 | 1555-MUM-2009-Template for order under Rule 6(5) for missed communication-13-11-2025.pdf | 2025-11-13 |