A System And Method For Voice Customization Of A Selectrd Entity On A

< Back

A System And Method For Voice Customization Of A Selectrd Entity On A Digital Display System

Abstract: A System and a method for interactive voice changer for a selected entity on a digital display system is provided. The system includes a core processing module configured to generate a first query for a look-up table to fetch a selected entity using a first unique identifier, to find a match between a plurality of entities present in a video frame and the selected entity using a face-search sub-module, to detect a lip movement for the selected entity using a lip gesture movement extraction sub module, to generate an interrupt for a voice controller, to generate a second query for the look-up table to fetch a second unique identifier and to generate a third query for a voice database to fetch a selected voice sample using the second unique identifier. The method includes changing voice of the selected entity with the selected voice sample.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

11 April 2011

Publication Number

42/2012

Publication Type

INA

Invention Field

COMPUTER SCIENCE

Status

Parent Application

Patent Number

Legal Status

Grant Date

2021-07-26

Renewal Date

Applicants

SAMSUNG ELECTRONICS COMPANY

416 MAETAN-DONG, YEONGTONG-GU, SUWON-SI, GYEONGGI-DO 442-742

Inventors

1. ADITI GARG

SAMSUNG INDIA SOFFWARE CENTER, 10TH FLOOR, TOWER A, LOGIX CYBER PARK, C28-29, SECTOR 62, NOIDA, U.P 201 301.

2. KASTHURI JAYACHAND YADLAPALLI

SAMSUNG INDIA SOFTWARE CENTER, 10TH FLOOR, TOWER A, LOGIX CYBER PARK, C28-29, SECTOR 62, NOIDA, U.P 201 301.

Specification

A SYSTEM AND METHOD FOR VOICE CUSTOMIZATION OF A SELECTED ENTITY

ON A DIGITAL DISPLAY SYSTEM

FIELD OF INVENTION

[0001] This invention relates to the field of broadcasting and more particularly to the
field of digital display system.

BACKGROUND

[0002] Typically, an internet protocol television (IPTV) is a system used for delivering multimedia services, for example audio and video information, across an internet protocol (IP) based network. Examples of the multimedia services include, but are not limited to, live television (TV), video on demand and time shifted programming. Conventional methods are available for replacing face of an entity present in, for example, a video clip. The entity defines a face of a particular user selected character in the video clip. A face recognition method is used for providing efficient replacement of user selected entity with another entity in the video clip. Further, conventional methods also allow a user to change dialog for the user selected entity. The user can select a first dialog from a scene and replace the first dialog with a second dialog. However, a technique to change voice of a selected entity is desired.

[0003] In light of the foregoing discussion there is a need for an efficient system and a method for customizing voice of the selected entity.

SUMMARY

[0004] Embodiments of the present disclosure described herein provide a method and system for voice customization of a selected entity on a digital display system. [0005] An example of a method for voice customization of a selected entity on a digital display system includes capturing one or more entities present in a first video frame. The method also includes listing the one or more entities present in the first video frame using a first presentation module. The method further includes selecting a first entity from a plurality of entities present in the first video frame. Further, the method includes selecting a voice sample for the first entity. Moreover, the method includes storing a second unique identifier for the selected voice sample in a look-up table. The method also includes finding a match between a plurality of entities present in an input video frame and the first entity. The method further includes detecting a lip movement for the selected entity in the input video frame using a lip gesture movement extraction sub module. Further, the method includes changing voice of the selected entity with the selected voice sample.

[0006] An example of a system for voice customization of a selected entity on a digital display system includes a core processing module configured to generate a first query for a look-up table to fetch a first entity using a first unique identifier, to find a match between a plurality of entities present in an input video frame and the first entity using a face-search sub-module, to detect a lip movement of the selected entity in the input video frame using a lip gesture movement extraction sub module, to generate an interrupt for a voice controller, to generate a second query for the look-up table to fetch

a second unique identifier and to generate a third query for a voice database to fetch a selected voice sample using the second unique identifier.

BRIEF DESCRIPTION OF FIGURES

[0007] In the accompanying figures, similar reference numerals refer to identical or functionally similar elements. These reference numerals are used in detailed description to illustrate various embodiments and to explain various aspects and advantages of the
present disclosure.

[0008] FIG. 1 illustrates a block diagram of a system for customizing voice of a
selected entity on a digital display system in accordance with one embodiment.

[0009] FIG. 2 is a flowchart illustrating a method for customizing voice of a selected
entity on a digital display system in accordance with one embodiment.

[0010] FIG. 3 is a flowchart illustrating a method for selecting and updating an entity
using a first presentation module ] in accordance with one embodiment

[0011] FIG. 4 is a block diagram illustrating a user interface along with a look-up
table (LUT) for selection of an entity in accordance with one embodiment.

[0012] FIG. 5 is a flowchart illustrating a method for selecting a voice sample for
voice customization using a voice sub-sampler module in accordance with one
embodiment.

[0013] FIG. 6 is a block diagram illustrating a user interface along with a look-up
table (LUT) for selecting a voice sample in accordance with one embodiment.

[0014] FIG. 7 is a flowchart illustrating a method for customizing voice using a core
processing module in accordance with one embodiment.

[0015] Persons skilled in the art will appreciate that elements in the figures are illustrated for simplicity and clarity and have not been drawn to scale. For example, the dimensions of some of the elements in the figure may be exaggerated relative to other elements to help to improve understanding of various embodiments of the present disclosure.

DETAILED DESCRIPTION

[0016] It should be observed the method steps and system components have been
represented by conventional symbols in the figure, showing only specific details which are relevant for an understanding of the present disclosure. Further, details may be
readily apparent to person ordinarily skilled in the art may not have been disclosed. In
the present disclosure, relational terms such as first and second, and the like, may be
used to distinguish one entity from another entity, without necessarily implying any
actual relationship or order between such entities.

[0017] Embodiments in the present disclosure as described herein provide a system
and a method for an interactive voice changer for a selected entity on a digital display
system.

[0018] FIG. 1 illustrates a block diagram of a system 100 for customizing voice of a
selected entity on a digital display system in accordance with one embodiment.

[0019] FIG. 1 consists of a first video frame 105, a face detector module 110, a first presentation module 115 for selecting an entity, a look up table (LUT) 120 for saving a first entity 140, a second presentation module 125 for selecting a voice sample, a
second unique identifier 130, a first unique identifier 195, an input video frame 135, a core processing module 145, the core-processing module 145 further including a face search sub module 150, a lip gesture movement extraction sub module 155 and a voice controller 160, a user inputted voice sample file 165, a recorded voice 170, a voice sub sampler module 180, the voice sub sampler module 180 further including a voice processing module 175 and a record module 185 and a voice database 190.

[0020] The first video frame 105 is displayed on the digital display system. Examples of digital display systems can include, but are not limited to, computer, IPTV, VOD and internet TV. Examples of the first video frame 105 include, but are not limited to, scene in a movie, broadcast stream, live video and a video clip. Examples of the digital display system include, but are not limited to, a consumer electronic (CE) device and an internet protocol television (IPTV). The digital display system receives the first video frame through a network. Examples of the network include, but are not limited to, a wireless network, internet, intranet, Bluetooth, small area network (SAN), metropolitan area network (MAN) and Ethernet. The first video frame 105 includes a plurality of entities. Entities can be regarded as a plurality characters present in the first video frame 105. The user can select a particular entity from the plurality of entities present in the first video frame 105 for voice customization. The particular entity selected by the user can be regarded as the first entity 140, herein, for simplicity. The user enables a voice configuration option in the digital display system for voice customization. On selection of the voice configuration option, the face detector module 110 is invoked to capture the first video frame 105. The face detector module extracts one or more entities present in the captured first video frame 105. The face detector module can employ multiple characteristic features to extract the one or more entities present in the first video frame 105. Examples of characteristic features include, but are not limited to, skin texture, motion, size, shape and location. Further, the face detector module can employ various algorithms to extracts the entities present in the first video frame 105. Subsequently the entities present in the first video frame 105 are listed using, for example, the first presentation Module 115. The entities listed on the first presentation module 115 facilitate the user to select a particular entity for voice customization. Upon selecting the entity by the user, the first entity 140 is stored in the look up table (LUT) 120. The first entity 140 is associated with the first unique identifier 195. The first unique identifier 195 represents the first entity. Further, a first unique identifier 195 is exclusive to the first entity 140. The LUT 120 further includes the second unique identifier 130. The second unique identifier represents a voice sample used for voice customization. A plurality of voice samples is stored in the voice database 190. The user can select a particular voice sample that can be used for voice customization from the plurality of voice samples stored in the voice database 190. The particular voice sample selected by the user for voice customization is regarded as a selected voice sample, herein, for simplicity. Further, the second presentation module 125 is used to list the voice samples stored in the voice database. The second presentation module 125 facilitates the user to selects the particular voice sample that can be used for voice customization.

[0021] The voice sub-sampler module 180 is responsible for processing the selected voice sample that is selected by the user. Examples of the user inputted voice samples include one of the recorded voice sample 170, a built-in voice sample (not shown)
stored in the voice database 190 and the user inputted voice sample 165. Typically, the built-in voice samples are provided by the service provider. The voice sub-sampler module 180 further eliminates noises present in the voice samples. Examples of noises include, but are not limited to, random noise and quantization noise. Furthermore, the voice sub-sampler module 180 enhances the quality of the particular voice sample by passing it through smooth filters prior to storing the particular voice sample in the voice database 190. Further, the Voice Sub-Sampler Module 180 facilitates the user to record a voice sample in real time using, for example, the voice recorder module 185. Furthermore, the user can input a voice sample to the voice sub-sampler module 180 from the web. One of the user inputted voice samples, and the recorded voice samples are processed by the voice processing module 175. The processed voice samples are entered into the voice database 190. Upon entry of the voice sample in the voice database 190, the second unique identifier 130 is generated. Each of the voice samples stored in the voice database is associated with a corresponding second unique identifier. The second unique identifier generated for each of the voice samples stored in the voice database is used to indicate the voice samples exclusively. Subsequently a plurality of voice samples is listed for user selection using the second presentation Module 125. The user can select a particular voice sample from the voice samples listed on the second presentation module 125. On selection of the particular voice sample for voice change, the second unique identifier 130 corresponding to the particular voice selected by the user is stored in the LUT 130. The second unique identifier 130 is used to map the particular voice sample selected by the user from the voice database 190 with the first entity 140.

[0022] The voice sub-sampler module is followed by the core processing module 145 which includes the face search sub-module 150, the lip gesture movement extraction sub module 155 and the voice controller 160. The core processing module 145 is the heart of the system 100. The core processing module 145 determines if the voice configuration option has been enabled in the system 100.

[0023] On enabling the voice configuration option, the core processing module 145 receives the input video frame 135. The video frame 135 can be regarded as a video clip using which the user can implement voice customization. The core processing module 145 generates the first query for the look up table (LUT) 120. The first query is used to fetch the user selected first entity 140 stored in the LUT 120. The first entity 140 represented by the first unique identifier 195 serves as an input for the face search sub-module 150. The face search sub-module 150 is responsible for capturing a plurality of entities in the input video frame 135. The face search sub-module 150 finds a match between the plurality of entities present in the input video frame 135 and the first entity 140. The core processing module 145 uses, for example, image processing technique to find the match between the plurality of entities present in the input video frame 135 and the first entity 140. Upon finding the successful match for the first entity 140 among the plurality of entities present in the input video frame 135, the core processing module 145 invokes lip movement gesture extraction sub-module 155. The lip movement gesture extraction sub-module 155 is used to determine the presence of lip movement for the selected entity in the input video frame 135. The lip movement gesture extraction sub-module 155 analyzes the input video frame 135 to determine the presence of lip movement for the selected entity. Upon detection of the lip movement for selected entity
associated with the input video frame 135, the lip movement gesture extraction sub-module 155 generates an interrupt for the voice controller 160. The voice controller 160 is used to change the voice of the selected entity. The voice controller 160 generates a second query for the LUT 120 to fetch the second unique identifier 130 stored in the LUT 120. Further, the voice controller 160 generates a third query for the voice database 190 to fetch the selected voice sample using the second unique identifier 130. The voice controller 160 customizes the voice by altering the voice characteristics, in one example, timbre and pitch and the like. The voice change can also be performed using, in one example, a Voice Morphing method. The Voice Morphing method can also be referred to as voice transformation method.

[0024] The Look up table (LUT) 120 is used to map the selected entity for which voice change is to be applied. The LUT 120 stores the first entity 140; the second unique identifier 130 and the first unique identifier 195 for a specified time period. Upon selection of the entity from the first presentation module 115, the corresponding first entity 140 is entered in the LUT 120. Further, the LUT 120 generates the first unique identifier 195 upon entering the first entity 140 in the LUT 120. Furthermore, upon selecting the selected voice sample from the second presentation module 125 the second unique identifier 130 for the selected voice is entered in LUT 120.

[0025] . Upon detection of the selected entity bearing lip movement in the input video frame 135, the LUT extracts the second unique identifier. The second unique identifier is used to fetch the voice sample from the voice database which is to be applied on the selected entity. The voice controller module 160 extracts voice characteristics, for example, timbre and pitch and the like for customizing the voice of the selected entity.

Further, voice customization can be performed without interrupting the user viewing the digital display system.

[0026] FIG. 2 is a flowchart illustrating a method for customizing voice of a selected entity on a digital display system in accordance with one embodiment. The method starts at step 205. At step 210, one or more entities present in a first video frame are captured. The one or more entities can be regarded as, in one example, a face of a character present in, for example, the first video frame. Examples of the first video frame include, but are not limited to, a movie clip and a broadcast video. The one or more entities present in the first video frame can be captured using, in one example, a face detector module. The face detector module can employ multiple characteristic features to capture the one or more entities present in the first video frame. Examples of characteristic features include, but are not limited to, skin texture, motion, size, shape and location. Further, the face detector module can employ various algorithms to capture the one or more entities present in the first video frame.

[0027] At step 215, the one or more entities present in the first video frame are listed. The listing of the one or more entities present in the first video frame can be performed using, in one example, a first presentation module. Further, the first presentation module displays the one or more entities present in the first video frame. A user can select a particular entity from the one or more entities that are listed using the first presentation module.

[0028] At step 220, the user selects a first entity from the one or more of entities present in the first video frame. The entities present in the first video frame can be listed
using the first presentation module. Selection of the first entity among one or more entities present in the first video frame can be performed using a user interface. Examples of the user interface include, but are not limited to, a graphical user interface (GUI), a Touch screen and a command line interface. In one example, the user can select the first entity among the one or more entities present in the first video frame, by providing an input to the first presentation module using the GUI.

[0029] In one embodiment, the first entity selected by the user can be stored in a look-up table (LUT). The LUT can be configured to generate a first unique identifier to identify the first entity. The first unique identifier represents the first entity. Similarly, a plurality of first unique identifiers can be generated to represent corresponding plurality of entities present in the first presentation module. Hence, one or more entities can be stored in the LUT. The LUT can be implemented using, for example, a processor. [0030] In another embodiment, a hash table can also be used to store the one or more entities present in the first video frame.

[0031] At step 225, a first voice sample is selected. The first voice sample can be stored in a voice database. The voice database can be located locally or remotely. The voice database includes a plurality of voice samples. In one embodiment, the first voice sample can be represented using a second unique identifier. The second unique identifier representing the first voice sample can be stored in the LUT. Similarly, a plurality of second unique identifiers can be generated to represent corresponding plurality of voice samples stored in the voice database.

[0032] In another embodiment, a hash table can also be used to store the second unique identifier representing the first voice sample.

[0033] At step 230, a match between the one or more entities present in an input video frame and the first entity is found. The matching between the one or more entities present in the input video frame and the first entity is performed using, for example, a face-search sub-module. The face-search sub-module compares the one or more entities present in the input video frame with the first entity. Digital image processing techniques can be employed for comparison of the one or more entities with the first entity. Further, the face-search sub-module matches the one or more entities present in the input video frame with the first entity. On subsequent matching of the one or more entities present in the input video frame with the first entity selected by the user, a lip movement for an entity selected from among the one or more entities present in the input video frame is determined.

[0034] In one embodiment, various facial recognition algorithms can also be used by the face-search sub-module for finding the match between the one or more entities present in the input video frame and the first entity.

[0035] At step 235, the first entity selected among the one or more entities present in the input video frame is analyzed for the presence of the lip movement. The lip movement in the selected entity is detected using a Lip Gesture Movement Extraction Sub- Module. The Lip Gesture Movement Extraction Sub-Module employs a speech processing technique for analyzing the presence of the lip movement in the selected entity.

[0036] In one embodiment, the Lip Gesture Movement Extraction Sub-Module determines a need for a voice change. The Lip Gesture Movement Extraction Sub-Module detects for the presence of the lip movement. If the lip movement is present in
the selected entity, the Lip Gesture Movement Extraction Sub-Module initiates a process that performs the voice change. Further, if the lip movement is not present in the selected entity, the Lip Gesture Movement Extraction Sub-Module bypasses the process that performs the voice change.

[0037] In another embodiment, various algorithms can be embedded into the Lip Gesture Movement Extraction Sub-Module to detect the presence of lip movement in the selected entity.

[0038] At step 240, the voice of the selected entity is changed. The voice of the selected entity can be changed using, for example, a voice controller. Changing of voice includes replacing the voice of the selected entity with the first voice sample. In one example, the first voice sample can be stored in the voice database. The voice database includes a plurality of voice samples. The voice controller employs various voice synthesis techniques to change the voice of the selected entity with the first voice sample.

[0039] In one embodiment, the Lip Gesture Movement Extraction Sub-Module invokes the voice controller to change the voice of the selected entity with the first voice sample. In one example, the invoking can be performed using an interrupt. The Lip Gesture Movement Extraction Sub-Module generates the interrupt. Further, the interrupt enables the voice controller to change the voice of the selected entity with the first voice sample. Further, the voice change can be applied for a specified period of time. The specified period of time indicates the span during which the voice change can be applied. The method ends at step 245.

[0040] FIG. 3 is a flowchart illustrating a method for selecting and updating an entity
using a first presentation module ] in accordance with one embodiment.
The method starts at step 305. At step 310 a first video frame is received as input to the
face detection module. The first video frame, in one example can include, scene in a
movie, broadcast stream, live video and a video clip. The first video frame further
includes a plurality of entities. The entity can be regarded as face of a character in the
first video frame.

[0041] At step 315, the first video frame is captured using the face detection module.
Examples of image capturing techniques include, but are not limited to, digital image
processing techniques and chroma key techniques.

[0042] At step 320, one or more entities present in the first video frame are extracted
by the face detection module. Extracting the entities can be done by identifying multiple
characteristic features associated with the entities present in the first video frame.
Examples of characteristic features include, but are not limited to, skin texture, motion,
size, shape and location. Further various algorithms can be employed for capturing the
one or more entities present in the first video frame.

[0043] At step 325, the one or more entities present in the first video frame are listed.
The listing of the one or more entities present in the first video frame can be performed
using, in one example, a first presentation module. Further, the first presentation
module displays the one or more entities present in the first video frame. A user can
select a particular entity from the one or more entities that are listed using the first
presentation module. The particular entity selected by the user can be regarded as a
first entity herein for simplicity.

[0044] At step 330, the user selects a first entity from the one or more entities present in the first video frame. The entities present in the first video frame can be listed using the first presentation module. Selection of the first entity among one or more of entities present in the first video frame can be performed using a user interface. Examples of the user interface include, but are not limited to, a graphical user interface (GUI), a Touch screen and a command line interface

[0045] .At step 335, the first entity is stored in a look-up table (LUT). The LUT can be configured to generate a first unique identifier to identify the first entity. The first unique identifier represents the first entity. Similarly, a plurality of first unique identifiers can be generated to represent corresponding plurality of entities present in the first presentation module. Hence, one or more entities can be stored in the LUT.

[0046] The method ends at step 340.

[0047] FIG. 4 is a block diagram illustrating a user interface along with a look-up table (LUT) for selection of an entity in accordance with one embodiment. FIG. 4 consists of a digital display system 405, a video frame 410 including one or more entities, a first entity 415, a second entity 420, a Look up table (LUT) 425 including a first unique identifier 430 for identifying the selected first entity 415.

[0048] The digital display system 405 displays the one or more entities present in the video frame 410. Examples of digital display systems can include, but are not limited to, computer, IPTV, VOD and internet TV. The one or more entities are detected from the video frame 410 using, for example, a face detector module. The detected one or more entities present in the video frame 410 are listed for user selection using, for example, a Presentation Module. The list presented by the presentation module, herein, contains the first entity 415 and the second entity 420 for user selection. Further, in one example, the user selects the first entity 415. Furthermore, in another example, the user can select a second entity 420. The first entity 415 selected by the user is stored in the look up table (LUT) 425. The look-up table (LUT) 425 generates the first unique identifier 430 for the selected first entity 415. The first unique identifier 430 represents the selected first entity 415. Further, the LUT can generate another first unique identifier that represents the second entity. Similarly, a plurality of first unique identifiers can be generated to represent corresponding plurality of entities present in the presentation module. Hence, one or more entities can be stored in the LUT 425.

[0049] FIG. 5 is a flowchart illustrating a method for selecting a voice sample for voice customization in accordance with one embodiment.

[0050] Voice sub-sampler module is responsible for processing the user inputted voice samples. Examples of the user inputted voice samples include one of a recorded voice sample, a sample voice and a user inputted voice sample.

[0051] The method starts at step 505. At step 510, the user is provided with an option of selecting a pre-configured voice out of a plurality of pre-configured voice samples stored in the voice database. The pre-configured voices can be regarded as built-in voice samples. The built-in voice samples are stored in a voice database. Typically, the built-in voice samples are provided by the service provider. If the user wishes to select a pre-configured voice for voice customization, then the user selects the pre-configured voice sample out of the plurality of voice samples stored in the voice database as shown in step 525. Further, if the user does not wish to use the pre-configured voice sample for voice customization then the user can use a recorded voice sample for voice customization. At step 515, the user can record the voice sample using a record module. If the user wishes to use the recorded voice sample for voice customization then the process of recording begins as shown in step 530. Further, if the user does not wish to use the recorded voice sample for voice customization then the user can input a voice sample that can be used for voice customization as shown in step 520. Further, at step 535, the recorded voice sample is processed using voice sub sampler module. The voice sub-sampler module eliminates various kinds of noises, for example, random noise and quantization noise present in the recorded voice sample. The voice sub sampler module enhances the recorded voice sample by passing the recorded voice sample through smooth filters prior to storing in the voice database as shown in step 540. At step 540, the recorded voice sample is stored in the voice database.

Furthermore the user inputted voice sample for voice customization as shown in step 520 is processed using the voice sub sampler module as shown in step 535. At step 540, the user inputted voice sample is stored in the voice database. The method ends at step 545.

[0052] FIG. 6 is a block diagram illustrating a user interface along with a look-up table (LUT) for selecting a voice sample in accordance with one embodiment. FIG. 6 includes a digital display system 605 a record module 610 and a look up table 640.

[0053] The digital display system 605 displays one or more entities present in a video frame. Examples of digital display systems can include, but are not limited to, computer, IPTV, VOD and internet TV. The user selects an entity from one or more entities present in the video frame. Selection can be performed, in one example, by dragging the cursor and placing it on the selected entity, by providing inputs using the keyboard or by using touch pads. In one example, the selected entity can be a character present in the video frame as shown in 635. The selected entity as shown in 635 is stored in the look-up table (LUT) 640. Further, the look-up table (LUT) 640 generates the first unique identifier 645. The first unique identifier 645 is exclusive to the selected entity 635. Similarly a plurality of first unique identifiers for corresponding plurality of selected entities can be stored in the LUT 640. In one example, the user wishes to record the voice sample using the record module 610. Examples of voice samples available, herein, for customizing voice of the selected entity 635 includes, but are not limited to, a robot voice sample 615, a celebrity voice sample 620 and a baby voice sample 625. The voice samples discussed above are stored in the voice database. Each of the voice samples stored in the voice database includes a second unique identifier exclusive to each of voice samples discussed above. The choice of voice sample discussed above that is stored in the voice database for the purpose of voice change depends upon the user. Upon selection of the voice sample for voice change the second unique identifier 630 corresponding to the selected voice sample is stored in the LUT 640. The second unique identifier 630 is used to fetch the voice sample from the voice database. The selected voice sample can be used for customizing voice of the selected entity.

[0054] FIG. 7 is a flowchart illustrating a method for customizing voice using a core processing module in accordance with one embodiment. The method starts at step 705. At step 710, the core processing module receives an input video frame. The input video frame, in one example, can include a scene in a movie, a video clip and a broadcast video stream. At step 715, the core processing module checks if a user wishes for voice customization. If the user wishes for voice customization then the input video frame is analyzed as shown in step 710. Further, if the user does not wish for voice customization then, the core processing module by-passes the steps involved in voice customization process.

[0055] At step 720, the inputted video frame is analyzed. The core processing module analyzes the video frame by capturing one or more entities present in the input video frame. The capturing of one or more entities can be performed using a face search sub-module. The face search sub-module can capture the entities by identifying multiple characteristic features associated with the entities present in the input video frame. Examples of characteristic features include, but are not limited to, skin texture, motion, size, shape and location. Further various algorithms can be employed for capturing the one or more entities present in the input video frame.

[0056] At step 725, the core processing module generates a first query for a look up table (LUT) to fetch a user selected first entity that is stored in the LUT. The user selected first entity stored in the LUT serves as the input for the face search sub-module.

[0057] At step 730, the core processing module finds a match between the plurality of entities present in the input video frame and first entity stored in the LUT. The match between the plurality of entities present in the input video frame and first entity stored in the LUT can be found using the face search sub-module.

[0058] At step 735 the core processing module determines if the match between the plurality of entities present in the input video frame and first entity stored in the LUT is found. Upon successful match, the core processing module checks the input video frame, to determine presence of a lip movement for a selected entity. Further, if the match is not found, the steps involved for voice customization is bypassed as shown in step 765.

[0059] At step 740 the core processing module checks the input video frame, to determine presence of the lip movement for the corresponding selected entity. A lip movement gesture extraction sub-module can be used to determine the lip movement for the corresponding selected entity. Upon determining the lip movement in the input video frame for corresponding selected entity, an interrupt is generated for a voice controller. Further, if the lip movement is not present for the corresponding selected entity, the steps involved for voice customization is bypassed as shown in step 765.

[0060] At step 745 the lip movement gesture extraction sub-module generates the interrupt for the voice controller. The interrupt is generated to signify the implementation of voice customization for the selected entity. The interrupt is generated for the voice controller upon the presence of the lip movement in the input video frame for the corresponding selected entity.

[0061] At step 750, the voice controller generates a second query for the LUT to fetch a second unique identifier. The second unique identifier represents a selected voice. The selected voice is used to customize voice for the selected entity. The second query facilitates the procurement of the second unique identifier that represents the selected voice.

[0062] At step 755, a third query is generated for the voice database to fetch the selected voice sample using the second unique identifier. The voice database stores a plurality of voice samples that can be used for voice customization. Each of the voice samples stored in the voice database is associated with corresponding plurality of second unique identifiers. Hence, a plurality of voice samples associated with corresponding second unique identifiers can be stored in the voice database. The third query is used for the procurement of the selected voice sample from the voice database.

[0063] At step 760 the voice of the selected entity is replaced by the selected voice. The voice controller is used to replace the voice of selected entity with the selected voice sample. The voice change performed by the voice controller further includes changing the characteristics of the voice, for example, timbre, pitch and the like. The method ends at step 770.

[0064] In the preceding description, the present disclosure and its advantages have been described with reference to specific embodiments. However, it will be apparent to a person of ordinary skill in the art that various modifications and changes can be made, without departing from the scope of the present disclosure, as set forth in the claims below. Accordingly, the specification and figures are to be regarded as illustrative
examples of the present disclosure, rather than in restrictive sense. All such possible modifications are intended to be included within the scope of the present disclosure.

I/We claim:

1. A method for customizing voice of a selected entity on a digital display system,
the method comprising:

capturing one or more entities present in a first video frame using a face detector module;
listing the one or more entities present in the first video frame using a first presentation module;

selecting a first entity from a plurality of entities present in the first video frame, wherein the first entity is stored in a look-up table;

selecting a first voice sample for the first entity, wherein the first voice sample is selected from a voice database;

determining a match between a plurality of entities present in an input video frame and the first entity using a face-search sub-module;

detecting a lip movement for a selected entity in the input video frame using a lip gesture movement extraction sub module; and

changing the voice of the selected entity with the first voice sample using a voice controller.

2. The method as claimed in claim 1, wherein the digital display system comprises
one of a consumer electronic (CE) device and an internet protocol television (IPTV).

3. The method as claimed in claim 1, wherein the one or more entities comprises a plurality of characters present in the first video frame.

4. The method as claimed in claiml, wherein the first video frame comprises one of a scene in a movie, broadcast stream, live video and a video clip.

5. The method as claimed in claiml, wherein the one or more entities captured using the face detector module is listed using the first presentation module.

6. The method as claimed in claim 5, wherein the one or more entities are listed on the first presentation module using one or more user interfaces.

7. The method as claimed in claim 1, wherein the entity is selected from a plurality of entities present in the first video frame using one or more user interfaces.

8. The method as claimed in claiml, wherein the look-up table generates a first unique identifier for the first entity.

9. The method as claimed in claim 1, wherein the first voice sample comprises one of a default voice sample, a recorded voice sample and a user inputted voice sample.

10. The method as claimed in claim 9, wherein the default voice sample, the recorded voice sample and the user inputted voice sample are stored in the voice database.

11. The method as claimed in claim 9, wherein one of the recorded voice sample and the user inputted voice sample are processed using a voice sub-sampler module.

12. The method as claimed in claim1, wherein the voice database generates a second unique identifier for the selected voice sample.

13. The method as claimed in claim 12, wherein the second unique identifier is stored in the look-up table.

14. The method as claimed in claim 1, wherein changing the voice of the selected entity with the first voice sample is performed using a voice controller.

15. The method as claimed in claim 1, wherein changing the voice of the selected entity with the first voice sample is performed for a predetermined time period.

16. A system for customizing voice of a selected entity on a digital display system,
the system comprising:
a core processing module configured to generate a first query for a look-up table to fetch a first entity using a first unique identifier;
find a match between a plurality of entities present in an input video frame and the first entity using a face-search sub-module;
detect a lip movement for the selected entity using a lip gesture movement extraction sub module;
generate an interrupt for a voice controller;
generate a second query for the look-up table to fetch a second unique identifier; and
generate a third query for the voice database to fetch a first voice sample using the second unique identifier;

17. The system as claimed in claim 16, wherein the core processing module comprises atleast one of the face search sub-module, the lip gesture movement extraction sub module and the voice controller.

18. The system as claimed in claim 17, wherein atleast one of the plurality of entities present in a first video frame and the first entity is received by the face-search sub-module.

Documents

Application Documents

#	Name	Date
1	1248-CHE-2011 CLAIMS 11-04-2011.pdf	2011-04-11
2	1248-CHE-2011 CORRESPONDENCE OTHERS 11-04-2011.pdf	2011-04-11
3	1248-CHE-2011 ABSTRACT 11-04-2011.pdf	2011-04-11
4	1248-CHE-2011 POWER OF ATTORNEY 11-04-2011.pdf	2011-04-11
5	1248-CHE-2011 FORM-5 11-04-2011.pdf	2011-04-11
6	1248-CHE-2011 FORM-3 11-04-2011.pdf	2011-04-11
7	1248-CHE-2011 FORM-2 11-04-2011.pdf	2011-04-11
8	1248-CHE-2011 FORM-1 11-04-2011.pdf	2011-04-11
9	1248-CHE-2011 DRAWINGS 11-04-2011.pdf	2011-04-11
10	1248-CHE-2011 DESCRIPTION (COMPLETE) 11-04-2011.pdf	2011-04-11
11	1248-CHE-2011 CORRESPONDENCE OTHERS 25-04-2013.pdf	2013-04-25
12	1248-CHE-2011 FORM-18 25-04-2013.pdf	2013-04-25
13	1248-CHE-2011 FORM-13 18-07-2015.pdf	2015-07-18
14	Form 13_Address for service.pdf	2015-07-20
15	Amended Form 1.pdf	2015-07-20
16	Form 3 [27-06-2017(online)].pdf	2017-06-27
17	1248-CHE-2011-FORM-26 [27-11-2017(online)].pdf	2017-11-27
18	1248-CHE-2011-RELEVANT DOCUMENTS [22-02-2018(online)].pdf	2018-02-22
19	1248-CHE-2011-Changing Name-Nationality-Address For Service [22-02-2018(online)].pdf	2018-02-22
20	1248-CHE-2011-FER.pdf	2019-04-24
21	1248-CHE-2011-RELEVANT DOCUMENTS [19-10-2019(online)].pdf	2019-10-19
22	1248-CHE-2011-FORM-26 [19-10-2019(online)].pdf	2019-10-19
23	1248-CHE-2011-FORM 13 [19-10-2019(online)].pdf	2019-10-19
24	1248-CHE-2011-PETITION UNDER RULE 137 [21-10-2019(online)].pdf	2019-10-21
25	1248-CHE-2011-FORM 3 [21-10-2019(online)].pdf	2019-10-21
26	1248-CHE-2011-FER_SER_REPLY [21-10-2019(online)].pdf	2019-10-21
27	1248-CHE-2011-DRAWING [21-10-2019(online)].pdf	2019-10-21
28	1248-CHE-2011-COMPLETE SPECIFICATION [21-10-2019(online)].pdf	2019-10-21
29	1248-CHE-2011-CLAIMS [21-10-2019(online)].pdf	2019-10-21
30	1248-CHE-2011-ABSTRACT [21-10-2019(online)].pdf	2019-10-21
31	Correspondence by Agent_Power of Attorney_28-10-2019.pdf	2019-10-28
32	1248-CHE-2011-PatentCertificate26-07-2021.pdf	2021-07-26
33	1248-CHE-2011-IntimationOfGrant26-07-2021.pdf	2021-07-26
34	1248-CHE-2011-PROOF OF ALTERATION [20-01-2023(online)].pdf	2023-01-20
35	1248-CHE-2011-PROOF OF ALTERATION [20-01-2023(online)]-2.pdf	2023-01-20
36	1248-CHE-2011-PROOF OF ALTERATION [20-01-2023(online)]-1.pdf	2023-01-20
37	1248-CHE-2011-RELEVANT DOCUMENTS [26-09-2023(online)].pdf	2023-09-26

Search Strategy

1	2019-04-0911-17-31_09-04-2019.pdf