System And Method For Generating Animated Visual Appearance Of User,

< Back

System And Method For Generating Animated Visual Appearance Of User, Based On Audio Message

Abstract: The present invention relates to a system and method for generating an animated visual appearance representing a user based on an audio message. An audio message is received from a first user device and processed to extract at least one 10 or more phonemes and emotion information associated with the audio message. Thereafter, a visual appearance of a first user of the first user device is retrieved based on the emotion information and movement data is retrieved based on the extracted one or more phonemes. Thereafter, the visual appearance of the first user is processed based on the movement data to generate one or more animated 15 visual appearances and the one or more animated visual appearances are integrated with the audio message to generate the audio-visual message corresponding to the audio message. Lastly, the generated audio-visual message is transmitted to the second user device.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

18 February 2020

Publication Number

36/2021

Publication Type

INA

Invention Field

COMMUNICATION

Status

ipo@knspartners.com

Parent Application

Applicants

HIKE PRIVATE LIMITED

4th Floor, Indira Gandhi International Airport, Worldmark 1, Northern Access Rd, Aerocity, New Delhi, Delhi 110037, India

Inventors

1. Dipankar Sarkar

4th Floor, Worldmark 1, Northern Access Road, Aerocity, New Delhi, Delhi 110037, India

2. Ankur Narang

4th Floor, Worldmark 1, Northern Access Road, Aerocity, New Delhi, Delhi 110037, India

3. Kavin Bharti Mittal

4th Floor, Worldmark 1, Northern Access Road, Aerocity, New Delhi, Delhi 110037, India

Specification

[0001] The present disclosure generally relates to communications. More
specifically, the present disclosure relates to generating an animated visual
5 appearance representing a user based on an audio message.
BACKGROUND
[0002] Messaging services/applications allow users to communicate without
being physically present at the same location. The messaging services allow users
10 to communicate via a number of communication mechanisms, such as telephony,
email, multimedia messaging, and instant messaging. One or more of these
communication mechanisms allow a user to record audio and send the recorded
audio to another user.
15 [0003] However, when a user plays received audio messages on messaging
platform/application, an audio player is displayed on screen of user’s device.
However, it would be advantageous if the user can also see visual representation,
depicting user who has sent the audio message, while the audio message is being
played. This would be advantageous to make the audio message player of
20 messaging platform more attractive that may display the visual representation of
the user who has sent the audio message, in order to enhance the communication.
In view of the foregoing, there exists a need in the art to provide a solution which
overcomes the above-mentioned problems.
25 SUMMARY
[0004] Exemplary aspects are directed to a system and method for generating an
animated visual appearance representing a user based on an audio message. The
system may provide visual effects along with audio effects while playing a
30 received audio message. In another embodiment, the present invention provides a
system which provide the animated visual appearance of the user who has sent
the audio message. In yet another embodiment, the present invention provides a
3
system that provides visual effects which represents emotions and sentiments
associated with the received audio message.
[0005] In an exemplary embodiment, the present disclosure describes a method
5 for generating an animated visual appearance of a user from an audio message.
The method comprises receiving audio message to be sent, from a first user
device, to a second user device. Said method further comprises processing the
audio message to extract at least one or more phonemes and emotion information
associated with the audio message. As a subsequent step the method further
10 discloses retrieving a visual appearance of a first user of the first user device from
a database unit, based on the emotion information. In next step, the method
discloses retrieving movement data from the database unit based on the extracted
one or more phonemes. Thereafter, the method discloses processing the visual
appearance of the first user, based on the movement data for each of the one or
15 more phonemes, to generate one or more animated visual appearances
corresponding to the one or more phonemes. In next step, the method discloses
integrating the one or more animated visual appearances with the audio message
to generate the audio-visual message corresponding to the audio message. Lastly,
the method discloses transmitting the generated audio-visual message to the
20 second user device.
[0006] In another embodiment, the present disclosure describes the movement
data comprises at least one of mouth expressions, lips movements, head
movements, eyes expressions, eyebrows movements or combination thereof.
25
[0007] In yet another embodiment, the present disclosure further describes that
integrating the one or more animated visual appearances with the received audio
message comprise: synthesizing the one or more animated visual appearances
corresponding to the one or more phonemes based on time stamps stored over a
30 blockchain in association with the one or more phonemes; and synchronising the
synthesized animated visual appearances with the audio message.
4
[0008] In yet another embodiment, the present disclosure further describes that
retrieving the movement data from the database unit comprises mapping the
extracted one or more phonemes with one or more predefined phonemes stored in
the database unit, and fetching movement data associated with the mapped one or
5 more predefined phonemes in the database unit.
[0009] In yet another embodiment, the present disclosure describes that the visual
appearance of the first user comprises any one of: an emoji of the first user, an
avatar of the first user, and a facial image of the first user, wherein different visual
10 appearances of the first user are stored in the database unit in association with
different emotion information, and wherein the emotion information represents
emotion of the first user while sending the audio message.
[0010] In yet another embodiment, the present disclosure is directed to a system
15 for system for generating an animated visual appearance of a user from an audio
message. The system comprises a first user device communicatively coupled to a
second user device via a server. The server comprises a database unit and a
receiver configured to receive audio message to be sent, from the first user device,
to the second user device. The server comprises a processing unit that processes
20 the audio message to extract at least one or more phonemes and emotion
information associated with the audio message and retrieves a visual appearance
of a first user of the first user device from the database unit based on the emotion
information. The processing unit further retrieves movement data from the
database unit based on the extracted one or more phonemes, and process the visual
25 appearance of the first user, based on the movement data for each of the one or
more phonemes, to generate one or more animated visual appearances
corresponding to the one or more phonemes. The server further comprises an
integrating unit configured to integrate the one or more animated visual
appearances with the audio message to generate the audio-visual message
30 corresponding to the audio message, and a transmitter configured to transmit the
generated audio-visual message to the second user device.
5
[0011] In another embodiment, the present disclosure describes the movement
data comprises at least one of mouth expressions, lips movements, head
movements, eyes expressions, eyebrows movements or combination thereof.
5 [0012] In yet another embodiment, the present disclosure further describes that
the integrating unit is further configured to synthesize the one or more animated
visual appearances corresponding to the one or more phonemes based on time
stamps stored over a blockchain in association with the one or more phonemes;
and synchronise the synthesized animated visual appearances with the audio
10 message.
[0013] In yet another embodiment, the present disclosure further describes that
the processing unit further configured to retrieve the movement data from the
database unit by mapping the extracted one or more phonemes with one or more
15 predefined phonemes stored in the database unit and fetching movement data
associated with the mapped one or more predefined phonemes in the database
unit.
[0014] In yet another embodiment, the present disclosure describes that the visual
20 appearance of the first user comprises any one of: an emoji of the first user, an
avatar of the first user, and a facial image of the first user, wherein different visual
appearances of the first user are stored in the database unit in association with
different emotion information. The emotion information represents emotion of the
first user while sending the audio message.
25
[0015] The foregoing summary is illustrative only and is not intended to be in any
way limiting. In addition to the illustrative aspects, embodiments, and features
described above, further aspects, embodiments, and features will become apparent
by reference to the drawings and the following detailed description
30
OBJECTIVES OF THE INVENTION
6
[0016] An object of present invention is to provide system and method generating
an animated visual appearance representing the user based on an audio recording.
[0017] Another object of present invention is to provide system and method to
5 efficiently and accurately display the emotions and expression of a user who has
send the audio message to other user’s device.
[0018] Yet another object of present invention is to provide system and method to
provide the visual effects along with audio effects while playing a received audio
10 message.
[0019] Yet another object of the present invention is to provide system and method
to provide the visual effects which represents emotions and sentiments associated
with the received audio message.
15
BRIEF DESCRIPTION OF THE DRAWINGS
[0020] The accompanying drawings, which are incorporated in and constitute a
part of this disclosure, illustrate exemplary embodiments and, together with the
20 description, serve to explain the disclosed embodiments. In the figures, the leftmost digit(s) of a reference number identifies the figure in which the reference
number first appears. The same numbers are used throughout the figures to
reference like features and components. Some embodiments of system and/or
methods in accordance with embodiments of the present subject matter are now
25 described, by way of example only, and with reference to the accompanying
figures, in which:
[0021] Fig. 1 is a block diagram of a system for generating an animated visual
representation representing a user based on an audio message.
30
[0022] FIG. 2 is a block diagram of a system for generating an animated visual
representation representing a user based on an audio message.
7
[0023] FIG. 3 is a flow diagram illustrating the process of generating an animated
visual representation representing a user based on an audio message.
5 [0024] It should be appreciated by those skilled in the art that any block diagrams
herein represent conceptual views of illustrative systems embodying the
principles of the present subject matter. Similarly, it will be appreciated that any
flow charts, flow diagrams, state transition diagrams, pseudo code, and the like
represent various processes which may be substantially represented in computer
10 readable medium and executed by a computer or processor, whether or not such
computer or processor is explicitly shown.
DETAILED DESCRIPTION
15 [0025] In the present document, the word “exemplary” is used herein to mean
“serving as an example, instance, or illustration.” Any embodiment or
implementation of the present subject-matter described herein as “exemplary” is
not necessarily to be construed as preferred or advantageous over other
embodiments.
20
[0026] While the disclosure is susceptible to various modifications and alternative
forms, specific embodiment thereof has been shown by way of example in the
drawings and will be described in detail below. It should be understood, however
that it is not intended to limit the disclosure to the particular forms disclosed, but
25 on the contrary, the disclosure is to cover all modifications, equivalents, and
alternatives falling within the scope of the disclosure.
[0027] The terms “comprises”, “comprising”, “include(s)”, or any other variations
thereof, are intended to cover a non-exclusive inclusion, such that a setup, system
30 or method that comprises a list of components or steps does not include only those
components or steps but may include other components or steps not expressly
listed or inherent to such setup or system or method. In other words, one or more
8
elements in a system or apparatus proceeded by “comprises… a” does not,
without more constraints, preclude the existence of other elements or additional
elements in the system or apparatus.
5 [0028] In the following detailed description of the embodiments of the disclosure,
reference is made to the accompanying drawings that form a part hereof, and in
which are shown by way of illustration specific embodiments in which the
disclosure may be practiced. These embodiments are described in sufficient detail
to enable those skilled in the art to practice the disclosure, and it is to be
10 understood that other embodiments may be utilized and that changes may be made
without departing from the scope of the present disclosure. The following
description is, therefore, not to be taken in a limiting sense.
[0029] The present invention will be described herein below with reference to the
15 accompanying drawings. In the following description, well known functions or
constructions are not described in detail since they would obscure the description
with unnecessary detail.
[0030] The present invention relates to system and method for generating an
20 animated visual representation of a user based on an audio message. The system
comprises a sever that is accessible by user devices over a network. In another
embodiment, the system and method in the present disclosure provide the visual
effects along with audio effects while playing a received audio message. The
visual effects may comprise providing an animated visual appearance of a user
25 who has sent the audio message. The visual effects represent emotions and
sentiments associated with the received audio message.
[0031] Referring to figure 1, an exemplary system/environment 100 is disclosed
for generating an animated visual appearance representing a user based on the
30 received audio message in the system 100. In an aspect, various elements/entities
such as a server 110 and user devices 120A and 120B of the system 100 as shown
in figure 1 may communicate with each other via network. The sever 110 may
9
remain operatively connected to one or more user devices 120A and 120B to
receive, process and forward the communication received from the source user
device to destined user device. The network may include one or more types of
networks, but not limited to, internet, a local area network, a wide area network,
5 a peer-to-peer network, and/or other similar technologies for connecting various
entities as discussed above.
[0032] In an exemplary aspect, in fig. 1, only two user devices 120A and 120B
are shown for the sake of ease and should not be construed as limiting the scope
10 and multiple user devices may be present in the system 100. The user devices
120A and 120B may be operated by users for communications. The user devices
120A and 120B may represent desktop computers, laptop computers, mobile
devices (e.g., Smart phones or personal digital assistants), tablet devices, or other
type of computing devices, which have computing, messaging and networking
15 capabilities. The user device 120 may be equipped with one or more computer
storage devices (e.g., RAM, ROM, PROM, SRAM, etc.), communication unit and
one or more processing devices (e.g., central processing units) that are capable of
executing computer program instructions. According to an exemplary
embodiment, the communication may be in the form exchange of one or more of
20 the following, but not limited to, text, audio, video, emoji, stickers, animations,
and images, audio-visual media etc.
[0033] According to an exemplary embodiment, a user of the user device 120A
(hereafter called as “first user device”) may send an audio message to a user of
25 the user device 120B (hereafter called as “second user device”). The server 110
facilitates communication between the first user device 120A and the second user
device 120B. The server 110 receives the audio message from the user device
120A to be sent to the second user device 120B. Further, the server 110 may
process the received audio message and may generate an animated visual
30 appearance of the user of the first user device 120A. Thereafter, the server 110
may integrate the audio message and the animated visual appearance of user to
generate an audio-video message corresponding to the audio message. The server
10
110 may transmit the generated audio-visual message to the second user device
120B. When the user of the second user device 120B plays the received message,
the visual appearance of the user of the first user device is displayed to the user
of the second user device 120B along with the audio. In this manner, emotions,
5 sentiments and expression of the user who sent the message is displayed
efficiently to the receiving user. Above mentioned technique(s) is described in
more detail in below paragraphs.
[0034] Fig. 2 illustrates a block diagram of the sever 110 that is configured for
10 generating an animated visual appearance representing a user based on the
received audio message. The server 110 may comprise various entities such as a
receiver 200, processing unit 202, a database unit 204, an integrating unit 206,
and a transmitter 208. These various entities may communicate with each other
via wireless or wired link. The processing unit 202 may comprises one or more
15 processors to process the communications received from the user devices 120.
[0035] In one aspect, the receiver 200 may receive the audio message from the
first user device 120A to be sent to the second user device 120B. Upon receiving
the audio message, the processing unit 202 may process the audio message to
20 extract at least one or more phonemes present in audio of the audio message. The
processing unit 202 also determines emotion information associated with the
audio message by processing the audio message. Furthermore, the processing unit
202 retrieves a visual appearance of the user of the first user device 120A
(hereafter called as “first user”) from the database unit 204 based on the emotion
25 information. In an exemplary aspect, the visual appearance of the first user may
comprises any one of: an emoji of the first user, an avatar of the first user, and a
facial image of the first user, but not limited thereto. The database unit 204 of the
sever may be configured to store different visual appearances of the first user in
association with different emotion information. According to an exemplary
30 embodiment, the emotion information represents emotion/sentiment of the first
user while sending the audio message.
11
[0036] Thereafter, the processing unit 202 may retrieve movement data from the
database unit based on the extracted one or more phoneme. According to an
exemplary embodiment, the movement data stored in the database unit 204 may
comprise at least one of mouth expressions, lips movements, head movements,
5 eyes expressions, eyebrows movements or combination thereof, but not limited
thereto. For retrieving the movement data from the database unit, the processing
unit may map the extracted one or more phonemes with one or more predefined
phonemes stored in the database unit and may fetch movement data associated
with the mapped one or more predefined phonemes in the database unit.
10
[0037] Further, the processing unit 202 may process the visual appearance of the
first user, based on the movement data for each of the one or more phonemes, to
generate one or more animated visual appearances corresponding to the one or
more phonemes. According to an exemplary embodiment, the processing unit 202
15 may modify the retrieved visual appearance such as emoji/avatar/facial image to
reflect the movement corresponding to the extracted phoneme. The processing
unit 202 generate one or more animated visual appearances corresponding to each
of the extracted phonemes.
20 [0038] Thereafter, the integrating unit 206 may integrate the one or more animated
visual appearances with the audio message to generate the audio-visual message
corresponding to the audio message. According to an exemplary embodiment, for
integrating the one or more animated visual appearances with the received audio
message, the integrating unit 206 may synthesize the one or more animated visual
25 appearances corresponding to the one or more phonemes based on time stamps
stored over a blockchain in association with the one or more phonemes. The
integrating unit 206 may retrieve the time stamps, associated with the one or more
phonemes, stored over the blockchain.
30 [0039] According to an exemplary embodiment, the processing unit 202 may
determine and store the time stamps over the blockchain while extracting the one
or more phonemes from the audio message. Furthermore, the integrating unit 206
12
may synchronise the synthesized animated visual appearances with the audio
message. Such synchronization ensure that the animated visual appearances are
completely in line with the each of the phonemes and match with length of the
audio message.
5
[0040] After generation of the audio-visual message corresponding to the audio
message, the transmitter 208 of the sever 110 transmits the audio message to the
second user device 120B. When the user of the second user device 120B plays the
received message, the visual appearance of the user of the first user device is
10 displayed to the user of the second user device 120B along with the audio. In this
manner, emotions, sentiments, and expression of the user who sent the message
is displayed efficiently to the receiving user.
[0041] In this manner, the system 100 may generate an animated visual
15 appearance representing the user based on an audio message. The system 100
efficiently and accurately displays the emotions and expression of a user who has
send the audio message to another user’s device. Further, the system 100 provide
the visual effects along with audio effects while playing a received audio message.
The visual effects may represent emotions and sentiments associated with the
20 received audio message.
[0042] According to an exemplary embodiment of the present disclosure, the
database unit 204 may be configured to store information representing visual
appearance, movement data and emotion data of users, and phonemes, but not
25 limited thereto. The visual appearance, movement data, emotion data, and
phonemes may be used to generate the animated visual appearance of the user
who sends the audio message. According to an exemplary embodiment, the
above-mentioned information may be stored in the database unit 204 by way of
training the server 110. According to an exemplary embodiment, the server 110
30 may be trained to identify movement data and emotion data for various users,
associated with the phonemes based on the videos of the various users. The server
110 may store the movement data such as facial expression, head movement, eye
13
expression, lips movement, and emotion in the database unit 204 in association
with the phonemes of the user’s voice.
[0043] For explaining the embodiments defined in paragraphs [0031]-[0042], let
5 us consider that while saying the “I LOVE YOU”, the first user is generally in an
emotion state “happy” and smiles by stretching his lips. Accordingly, the server
110 may be trained and database unit 204 may store movement data for each of
phoneme corresponding to the “I LOVE YOU”. Such movement may comprise
“opening of a mouth for pronouncing ‘I’”, “lowering the chin while opening the
10 mouth for pronouncing ‘LOVE’”, and “a tightening of lips for pronouncing ‘U’”.
[0044] Now, whenever the user sends an audio message saying I LOVE YOU, the
server 110 may extract one or more phonemes and emotion information associated
with audio message “I LOVE YOU”. The sever 110 may determine that emotion
15 information defines that the user is “happy”. Thus, the server may retrieve an
avatar of the user which reflects the “happy” emotional state of the user. Such
emotional state may be reflected by selecting an avatar which has a smiling face
by stretching lips, or colour of face, or by background of the avatar, etc., but not
limited thereto. The server 110 may retrieve the movement data such as “opening
20 of a mouth for pronouncing ‘I’”, “lowering the chin while opening the mouth for
pronouncing ‘LOVE’”, and “a tightening of lips for pronouncing ‘U’” from the
database unit 204. Thereafter, the sever 110 may animate the retrieved avatar of
user based on the retrieved movement data. Further, the server 110 may integrate
the animated visual appearances with the audio message and may forward the
25 same to destined user.
[0045] Figure 3 discloses a process 300 for generating an animated visual
representation representing a user based on the received audio message. At step
302, the audio message may be received by the server 110 via the receiver 200.
30 At step 304, the processing unit 202 of the server 110 may processing the audio
message to extract at least one or more phonemes and emotion information
associated with the audio message. At step 306, the processing unit 202 may
14
retrieve a visual appearance of the first user of the first user device from a database
unit 204, based on the emotion information. According to an exemplary
embodiment, the visual appearance of the first user comprises any one of: an
emoji of the first user, an avatar of the first user, and a facial image of the first
5 user, but not limited thereto. The database unit 204 stores different visual
appearances of the first user in association with different emotion information,
wherein the emotion information represents emotion of the first user while
sending the audio message.
10 [0046] At step 308, the processing unit 202 may retrieve movement data from the
database unit 204 based on the extracted one or more phonemes. According to an
exemplary embodiment, retrieving the movement data from the database unit
comprises mapping the extracted one or more phonemes with one or more
predefined phonemes stored in the database unit 204, and fetching movement data
15 associated with the mapped one or more predefined phonemes in the database unit
204.
[0047] At step 310, the processing unit 202 may process the visual appearance of
the first user, based on the movement data for each of the one or more phonemes,
20 to generate one or more animated visual appearances corresponding to the one or
more phonemes. In order to generate one or more animated visual appearances,
the processing unit 202 may modify the visual appearance of the first user
according to the fetched movement data corresponding to the phonemes.
25 [0048] At step 312, the integrating unit 206 may integrate the one or more
animated visual appearances with the audio message to generate the audio-visual
message corresponding to the audio message. According to exemplary
embodiment, integrating the one or more animated visual appearances with the
received audio message comprise synthesizing the one or more animated visual
30 appearances corresponding to the one or more phonemes based on time stamps
stored over a blockchain in association with the one or more phonemes, and
synchronising the synthesized animated visual appearances with the audio
15
message. Lastly, at step 314, the generated audio-visual message may be
transmitted by the transmitter 208 to the second user device 120B.
[0049] The illustrated steps are set out to explain the exemplary embodiments
5 shown, and it should be anticipated that ongoing technological development will
change the manner in which particular functions are performed. These examples
are presented herein for purposes of illustration, and not limitation. Further, the
boundaries of the functional building blocks have been arbitrarily defined herein
for the convenience of the description. Alternative boundaries can be defined so
10 long as the specified functions and relationships thereof are appropriately
performed.
[0050] In this manner, the system and method describe in the present disclosure
may generate an animated visual appearance representing the user based on an
15 audio message. The system and method enable efficiently and accurately display
of the emotions and expression of a user who has send the audio message to
another user’s device. Further, the system and method provide the visual effects
along with audio effects while playing a received audio message. The visual
effects may represent emotions and sentiments associated with the received audio
20 message.
[0051] Alternatives (including equivalents, extensions, variations, deviations,
etc., of those described herein) will be apparent to persons skilled in the relevant
art(s) based on the teachings contained herein. Such alternatives fall within the
25 scope and spirit of the disclosed embodiments.
[0052] Furthermore, one or more computer-readable storage media may be
utilized in implementing embodiments consistent with the present disclosure. A
computer-readable storage medium refers to any type of physical memory on
30 which information or data readable by a processor may be stored. Thus, a
computer-readable storage medium may store instructions for execution by one
or more processors, including instructions for causing the processor(s) to perform
16
steps or stages consistent with the embodiments described herein. The term
“computer- readable medium” should be understood to include tangible items and
exclude carrier waves and transient signals, i.e., are non-transitory. Examples
include random access memory (RAM), read-only memory (ROM), volatile
5 memory, non-volatile memory, hard drives, CD ROMs, DVDs, flash drives, disks,
and any other known physical storage media.
[0053] Suitable processors include, by way of example, a general purpose
processor, a special purpose processor, a conventional processor, a digital signal
10 processor (DSP), a plurality of microprocessors, one or more microprocessors in
association with a DSP core, a controller, a microcontroller, Application Specific
Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) circuits,
any other type of integrated circuit (IC), and/or a state machine.
15 [0054] Although the present invention has been described in considerable detail
with reference to figures and certain preferred embodiments thereof, other
versions are possible. Therefore, the spirit and scope of the present invention
should not be limited to the description of the preferred versions contained herein.
20 Referral Numerals:
Reference
Numeral
Description
100
Exemplary system/environment for generating an animated visual
appearance representing a user based on the received audio message
110 Server
120A First user device
120B Second user device
200 Receiver
202 Processing Unit
17
204 Database Unit
206 Integrating Unit
208 Integrating Unit
300
Method for generating an animated visual representation
representing a user based on the received audio message

We claim:

1. A method for generating an animated visual appearance of a user from an
audio message, the method comprising:
5 receiving audio message to be sent, from a first user device, to a second
user device;
processing the audio message to extract at least one or more phonemes and
emotion information associated with the audio message;
retrieving a visual appearance of a first user of the first user device from a
10 database unit, based on the emotion information;
retrieving movement data from the database unit based on the extracted
one or more phonemes;
processing the visual appearance of the first user, based on the movement
data for each of the one or more phonemes, to generate one or more animated
15 visual appearances corresponding to the one or more phonemes;
integrating the one or more animated visual appearances with the audio
message to generate the audio-visual message corresponding to the audio
message; and
transmitting the generated audio-visual message to the second user device.
20
2. The method as claimed in claim 1, wherein the movement data comprises
at least one of mouth expressions, lips movements, head movements, eyes
expressions, eyebrows movements or combination thereof.
25 3. The method as claimed in claim 1, wherein integrating the one or more
animated visual appearances with the received audio message comprise:
synthesizing the one or more animated visual appearances corresponding
to the one or more phonemes based on time stamps stored over a blockchain in
association with the one or more phonemes; and
30 synchronising the synthesized animated visual appearances with the audio
message.
19
4. The method as claimed in claim 1, wherein retrieving the movement data
from the database unit comprises:
5 mapping the extracted one or more phonemes with one or more predefined
phonemes stored in the database unit; and
fetching movement data associated with the mapped one or more
predefined phonemes in the database unit.
10 5. The method as claimed in claim 1, wherein the visual appearance of the
first user comprises any one of: an emoji of the first user, an avatar of the first
user, and a facial image of the first user;
wherein different visual appearances of the first user are stored in the
database unit in association with different emotion information; and
15 wherein the emotion information represents emotion of the first user while
sending the audio message.
6. A system for generating an animated visual appearance of a user from an
audio message, the system comprising :
20 a first user device communicatively coupled to a second user device via a
server, wherein the server comprises:
a database unit;
a receiver configured to receive audio message to be sent, from the first
user device, to the second user device;
25 a processing unit configured to:
process the audio message to extract at least one or more phonemes
and emotion information associated with the audio message,
retrieve a visual appearance of a first user of the first user device
from the database unit based on the emotion information,
30 retrieve movement data from the database unit based on the
extracted one or more phonemes, and
20
process the visual appearance of the first user, based on the
movement data for each of the one or more phonemes, to generate one or
more animated visual appearances corresponding to the one or more
phonemes;
5 an integrating unit configured to integrate the one or more animated visual
appearances with the audio message to generate the audio-visual message
corresponding to the audio message; and
a transmitter configured to transmit the generated audio-visual message to
the second user device.
10
7. The system as claimed in claim 6, wherein the movement data is stored in
the database unit, and wherein the movement data comprises at least one of mouth
expressions, lips movements, head movements, eyes expressions, eyebrows
movements or combination thereof.
15
8. The system as claimed in claim 6, wherein for integrating the one or more
animated visual appearances with the received audio message, the integrating unit
is configured to:
synthesize the one or more animated visual appearances corresponding to
20 the one or more phonemes based on time stamps stored over a blockchain in
association with the one or more phonemes; and
synchronise the synthesized animated visual appearances with the audio
message.
25 9. The system as claimed in claim 6, wherein for retrieving the movement
data from the database unit, the processing unit is configured to:
map the extracted one or more phonemes with one or more predefined
phonemes stored in the database unit; and
fetch movement data associated with the mapped one or more predefined
30 phonemes in the database unit.
21
10. The system as claimed in claim 6, wherein the visual appearance of the
first user comprises any one of: an emoji of the first user, an avatar of the first
user, and a facial image of the first user, and
wherein the database unit stores different visual appearances of the first
5 user in association with different emotion information; and
wherein the emotion information represents emotion of the first user while
sending the audio message.

Documents

Application Documents

#	Name	Date
1	202011006909-FORM 18 [25-10-2023(online)].pdf	2023-10-25
1	202011006909-STATEMENT OF UNDERTAKING (FORM 3) [18-02-2020(online)].pdf	2020-02-18
2	abstract.jpg	2021-10-18
2	202011006909-PROVISIONAL SPECIFICATION [18-02-2020(online)].pdf	2020-02-18
3	202011006909-POWER OF AUTHORITY [18-02-2020(online)].pdf	2020-02-18
3	202011006909-COMPLETE SPECIFICATION [18-02-2021(online)].pdf	2021-02-18
4	202011006909-FORM 1 [18-02-2020(online)].pdf	2020-02-18
4	202011006909-CORRESPONDENCE-OTHERS [18-02-2021(online)].pdf	2021-02-18
5	202011006909-DRAWING [18-02-2021(online)].pdf	2021-02-18
5	202011006909-DRAWINGS [18-02-2020(online)].pdf	2020-02-18
6	202011006909-DECLARATION OF INVENTORSHIP (FORM 5) [18-02-2020(online)].pdf	2020-02-18
6	202011006909-Proof of Right [09-08-2020(online)].pdf	2020-08-09
7	202011006909-DECLARATION OF INVENTORSHIP (FORM 5) [18-02-2020(online)].pdf	2020-02-18
7	202011006909-Proof of Right [09-08-2020(online)].pdf	2020-08-09
8	202011006909-DRAWING [18-02-2021(online)].pdf	2021-02-18
8	202011006909-DRAWINGS [18-02-2020(online)].pdf	2020-02-18
9	202011006909-CORRESPONDENCE-OTHERS [18-02-2021(online)].pdf	2021-02-18
9	202011006909-FORM 1 [18-02-2020(online)].pdf	2020-02-18
10	202011006909-POWER OF AUTHORITY [18-02-2020(online)].pdf	2020-02-18
10	202011006909-COMPLETE SPECIFICATION [18-02-2021(online)].pdf	2021-02-18
11	abstract.jpg	2021-10-18
11	202011006909-PROVISIONAL SPECIFICATION [18-02-2020(online)].pdf	2020-02-18
12	202011006909-STATEMENT OF UNDERTAKING (FORM 3) [18-02-2020(online)].pdf	2020-02-18
12	202011006909-FORM 18 [25-10-2023(online)].pdf	2023-10-25