Abstract: Disclosed is a device for processing a speech delivered by a speaker. A voice analyzer receives an audio stream, comprising multiple pronounced words, indicating a speech delivered by a speaker. The audio stream may be received from an audio input device. The voice analyzer converts the multiple pronounced words into a textual format in real-time. The comparator matches each pronounced word with a word, of the plurality of words, corresponding to a pronounced word, of the multiple pronounced words, in order to identify one or more additional words and at least one pronounced word matched with at least one word, of the set of words, having the metadata. A voice synthesizer and a gesture synthesizer process the speech by eliminating one or more additional words and generating a trigger, indicating the speaker to perform a gesture, respectively.
PRIORITY INFORMATION
[001] This patent application does not take priority from any application.
TECHNICAL FIELD
[002] The present subject matter described herein, in general, relates to, processing a speech delivered by a speaker, and more particularly relates to device and method for processing the speech by using a voice synthesizer and a gesture synthesizer.
BACKGROUND
[003] Pervasiveness of electronic form of communication has greatly reduced the need for one to one and live communications. But still each person, now and then, prefers vocalization for communicating with an audience at large in order to persuade, inspire or inform. Various ways of the communication may include delivering a speech to the audience or a presentation to stakeholders in an organization. The effectiveness of the vocalization is increased to manifold in case content, of the speech or the presentation, is supported by equally good way of delivering the content to the audience.
[004] However, it has been observed that, people generally use filler words (such as “um”, “ah”, “you know”, and “I think”) during their conversation with the audience. The filler words indicate that the speaker has paused to think but has not yet finished speaking. Though the filler words may be spoken unconsciously by the speaker but may weaken the impact of the message to be conveyed to the audience through the speech. In contrast to the filler words, a gesture performed by the speaker, corresponding to a specific word or a phrase, may assist to the speaker to emphasize more on such specific word or phrase. Thus, the gesture performed, may strengthen the impact of the message to be conveyed. However, sometimes, the speaker is too engrossed in the content of the speech that he/she may forget to perform a gesture while delivering the speech. Thus, it becomes utmost important for the speaker to keep a check that when and which gesture is to be performed.
SUMMARY
[005] Before the present systems and methods, are described, it is to be understood that this application is not limited to the particular systems, and methodologies described, as there can be multiple possible embodiments which are not expressly illustrated in the present disclosures. It is also to be understood that the terminology used in the description is for the purpose of describing the particular versions or embodiments only, and is not intended to limit the scope of the present application. This summary is provided to introduce concepts related to devices and methods for processing a speech delivered by a speaker and the concepts are further described below in the detailed description. This summary is not intended to identify essential features of the claimed subject matter nor it is intended for use in determining or limiting the scope of the claimed subject matter.
[006] In one implementation, a device for processing a speech delivered by a speaker is disclosed. The device comprises a control unit and a memory unit coupled to the control unit. The control unit further comprises a voice analyzer, a comparator, a voice synthesizer and a gesture synthesizer. The voice analyzer may be configured to receive an audio stream, comprising multiple pronounced words, indicating a speech delivered by a speaker. The audio stream may be received from an audio input device. In one aspect, the audio stream may be associated to a predefined speech stored in the memory unit. The predefined speech further comprises a plurality of words along with metadata associated to a set of words of the plurality of words. The voice analyzer may further be configured to convert the multiple pronounced words into a textual format in real-time. The comparator may be configured to match each pronounced word with a word, of the plurality of words, corresponding to a pronounced word of the multiple pronounced words. Each pronounced word may be matched with a word in order to identify one or more additional words, from the multiple pronounced words, unmatched with the plurality of words and at least one pronounced word matched with at least one word, of the set of words, having the metadata. The voice synthesizer and the gesture synthesizer may be configured to process the speech. In one aspect, the voice synthesizer may process the speech by eliminating the one or more additional words by suppressing an audio pertaining to the one or more additional words. The gesture synthesizer, on the other hand, may process the speech by generating a trigger, indicating the speaker to perform a gesture, based on the metadata of the at least one word matched with the at least one pronounced word.
[007] In another implementation, a method for processing a speech delivered by a speaker is disclosed. In order to process the speech, initially, an audio stream may be received. The audio stream comprises multiple pronounced words indicating the speech delivered by a speaker. The audio stream may be received from an audio input device. In one aspect, the audio stream may be associated to a predefined speech stored in a memory unit. After receiving the audio stream, the multiple pronounced words may be converted into a textual format in real-time by using a voice analyzer of a control unit coupled with the memory unit. Subsequently, each pronounced word may be matched with a word, of the plurality of words, corresponding to a pronounced word of the multiple pronounced words. In one aspect, each pronounced word may be matched, using a comparator of the control unit, with the word in order to identify one or more additional words, from the multiple pronounced words, unmatched with the plurality of words and at least one pronounced word matched with at least one word, of the set of words, having the metadata. After matching each pronounced word with the word, the speech may be processed by eliminating one or more additional words by suppressing an audio pertaining to the one or more additional words using a voice synthesizer of the control unit. Further, the speech is processed by generating a trigger, via a gesture synthesizer of the control unit, indicating the speaker to perform a gesture. In one aspect, the trigger may be generated based on the metadata of the at least one word matched with the at least one pronounced word.
BRIEF DESCRIPTION OF THE DRAWINGS
[008] The foregoing detailed description of embodiments is better understood when read in conjunction with the appended drawings. For the purpose of illustrating the disclosure, example constructions of the disclosure is shown in the present document; however, the disclosure is not limited to the specific methods and apparatus disclosed in the document and the drawings.
[009] The detailed description is given with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same numbers are used throughout the drawings to refer like features and components.
[0010] Figure 1 illustrates an implementation of a device for processing a speech delivered by a speaker, in accordance with an embodiment of the present subject matter.
[0011] Figure 2 illustrates the device, in accordance with an embodiment of the present subject matter.
[0012] Figure 3 illustrates a method for processing a speech delivered by a speaker, in accordance with an embodiment of the present subject matter.
[0013] Figure 4 illustrates a method for processing the speech in accordance with an embodiment of the present subject matter.
DETAILED DESCRIPTION
[0014] Some embodiments of this disclosure, illustrating all its features, will now be discussed in detail. The words "comprising," "having," "containing," and "including," and other forms thereof, are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms "a," "an," and "the" include plural references unless the context clearly dictates otherwise. Although any systems and methods similar or equivalent to those described herein can be used in the practice or testing of embodiments of the present disclosure, the exemplary, systems and methods are now described. The disclosed embodiments are merely exemplary of the disclosure, which may be embodied in various forms.
[0015] Various modifications to the embodiment will be readily apparent to those skilled in the art and the generic principles herein may be applied to other embodiments. However, one of ordinary skill in the art will readily recognize that the present disclosure is not intended to be limited to the embodiments illustrated, but is to be accorded the widest scope consistent with the principles and features described herein.
[0016] It has been observed that a speech is an utmost tool to convey a message to an audience at large. The speech becomes joy to listen as the audience gets easily engrossed in the speech delivered by a speaker when it was appropriately presented to the audience. It has been noted that sometimes filler words (such as “um”, “ah”, “you know”, “I think”) spoken unconsciously, by the speaker, while delivering the speech and hence weaken the impact of the message to be conveyed. The speech might easily sway the listeners to point of view in case it is devoid of the filler words. This becomes all the more important in case of long speeches where the speaker needs time to recollect what he/she has to say next. Thus, in order to overcome, the present subject matter facilitates to process the speech delivered by the speaker.
[0017] The processing of the speech facilitates to eliminate speech impediments like the filler words in real-time. The processing may involve analysis of the voice samples received via a microphone or any audio input device (such as smartphone or dictaphone). After analyzing the voice samples, the filler words, pronounced by the speaker, that were not part of a predefined speech may be eliminated. In one aspect, the filler words may be eliminated by suppressing an audio pertaining to the filler words in real-time. After eliminating the audio of the filler words, the audio pertaining to speech may be produced by a sound producing unit coupled with the device for generating sound of the speech delivered by the speaker. In one aspect, the present invention further facilitates to enhance the quality of the speech by introducing artificial vocal variety that alters the volume or pitch needed at appropriate instances of the speech. This may enable the speaker to focus on the content of the speech.
[0018] In addition to the above, body language also plays a vital role for enhancing effectiveness of the speech. Since gestures make the speech more interactive, sometimes the speaker becomes so engrossed in the speech that he/she forgets to make appropriate gesture, corresponding to a specific word or a phrase used in the speech. In order to indicate the speaker to perform the gesture, the present subject matter further facilitates to generate a trigger. The trigger may then actuate vibrator sensors (in the form of smart wearable device) coupled with different parts of the body of the speaker. The vibrator sensors, when vibrated, signal the speaker to perform the gesture as required by script of the speech. Thus, in this manner, the speech may be processed in order to enhance the effectiveness of the speech.
[0019] The capabilities of the present subject matter, as aforementioned, i.e. suppression of the audio pertaining to filler words and generating the trigger indicating the speaker to perform the gesture may also be used by the speaker for self-training. The present subject matter further discloses a counter that counts the occurrences of the filler words pronounced or the speaker fails to use appropriate gestures while delivering the speech. The count of the occurrences indicates the speaker whether he /she is improving in his/her speaking skills.
[0020] While aspects of described a device and a method for processing the speech delivered by a speaker and may be implemented in any number of different computing systems, environments, and/or configurations, the embodiments are described in the context of the following exemplary system.
[0021] Referring now to Figure 1, a network implementation 100 of a device 104 for processing a speech delivered by a speaker is disclosed. The device 104 comprises a control unit and a memory unit coupled to the control unit. In order to process the speech, initially, the device 104 receives an audio stream, comprising multiple pronounced words, indicating a speech delivered by a speaker. The audio stream may be received from an audio input device. The audio input device may include, but not limited to a microphone 102. In one aspect, the audio stream may be associated to a predefined speech stored in the memory unit. The predefined speech further comprises a plurality of words along with metadata associated to a set of words of the plurality of words. Subsequently, the device 104 matches each pronounced word with a word, of the plurality of words, corresponding to a pronounced word of the multiple pronounced words. In one aspect, the device 104 matches each pronounced word may be matched with the word to identify one or more additional words, from the multiple pronounced words, unmatched with the plurality of words. The device 104 further identifies at least one pronounced word matched with at least one word, of the set of words, having the metadata. Upon identification, the device 104 eliminates one or more additional words by suppressing an audio pertaining to the one or more additional words and generates a trigger, indicating the speaker to perform a gesture. The trigger may be generated based on the metadata of the at least one word matched with the at least one pronounced word. In one embodiment, the trigger may actuate one or more vibrators 108-1, 108-2, 108-3, 108-N coupled with one or more body parts of the speaker.
[0022] Although the present disclosure is explained considering that the device 104 is implemented on a server, it may be understood that the device 104 may also be implemented in a variety of computing systems, such as a smart microphone, Dictaphone, smartphone, laptop computer, a desktop computer, a notebook. The device may be connected with a sound producing unit 108 for generating sound of the speech delivered by the speaker. It will be understood that the device 104 may be accessed via an I/O interface, by multiple users through one or more user devices or stakeholders, for associating the metadata to a set of words of the predefined speech. Examples of the user devices may include, but are not limited to, a portable computer, a personal digital assistant, a handheld device, and a workstation. The I/O interface may include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like. The I/O interface may allow the user device for associating the metadata to the set of words. Further, the I/O interface 204 may enable the system 102 to communicate with other computing devices, such as web servers and external data servers (not shown). The I/O interface can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. The I/O interface may include one or more ports for connecting a number of devices to one another or to another server.
[0023] In one embodiment, the device may be communicatively coupled with the one or more vibrators (108-1, 108-2, 108-3, 108-N), hereinafter referred to as the one or more vibrators 108, coupled with one or more body parts of the speaker. The one or more vibrators 108 are communicatively coupled to the device 104 through a network 106. In one implementation, the network 106 may be a wireless network, a wired network or a combination thereof. The network 106 can be implemented as one of the different types of networks, such as intranet, local area network (LAN), wide area network (WAN), the internet, Personal Area Network (PAN) and the like. The network 106 may either be a dedicated network or a shared network. The shared network represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), Wireless Application Protocol (WAP), Wi-Fi®, Bluetooth®, ZigBee®, and the like, to communicate with one another. Further the network 106 may include a variety of network devices, including routers, bridges, servers, computing devices, storage devices, and the like.
[0024] Referring now to Figure 2, the device 104 is illustrated in accordance with an embodiment of the present subject matter. In one embodiment, the device 104 may include a control unit 206 and a memory unit 208. The control unit 206 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. The control unit 206 may include a voice analyzer 210, a comparator 212, voice synthesizer 214, a gesture synthesizer 216, and a counter 218. The memory unit 208 may include any computer-readable medium or computer program product known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes.
[0025] As there are challenges observed in the existing art, the challenges necessitate processing the speech delivered by the speaker. The processing of the speech facilitates to eliminate speech impediments like the filler words in real-time. The processing may involve analysis of the voice samples received, via a microphone 102 (shown in Fig.1), as an audio stream. The audio stream is then transmitted to the device 104 for processing. The device 104 may employ the control unit 206 and the memory unit 208 coupled to the control unit 206 in order to process the audio stream received from the microphone 102. The control unit 206 further comprises the voice analyzer 210, the comparator 212, the voice synthesizer 214, the gesture synthesizer 216, and the counter 218. In one embodiment, the device 104 may be present inside the microphone 102 for processing the speech. In another embodiment, the device 104 may be present outside the microphone 102. In a scenario, where the device 104 is present outside the microphone 102, the device 104 may be coupled with the microphone 102 as shown in the figure 1. The detail functioning of the components present in the control unit 206, of the device 104, are described as below.
[0026] In one embodiment, the device 104 comprises two modes for processing the speech by using the device 104. The two modes comprise an Active mode and a Training mode. The Active mode facilitates to process the speech by eliminating the filler words present in the speech. Upon eliminating the filler words, the device 104 indicates the speaker to perform a gesture for a specific word in the speech in order to make the speech more interactive. The detail functioning of the Active mode and the Training mode are explained below.
[0027] Further referring to figure 2, in the Active mode, the voice analyzer 210 is configured to receive the audio stream from the microphone 102. The audio stream comprises multiple pronounced words pronounced by the speaker. The multiple pronounced words indicate the speech delivered by the speaker. In one aspect, the audio stream may be associated to a predefined speech stored in the memory unit 208. The predefined speech further comprises a plurality of words along with metadata associated to a set of words of the plurality of words. In one aspect, the metadata associated to the set of words, contains instances, in the predefined speech, where a trigger is to be generated indicating the speaker to perform the gesture for each respective word of the set of words.
[0028] After receiving the multiple pronounced words, the voice analyzer 210 converts the multiple pronounced words into a textual format in real-time. Once the voice analyzer 210 converts the multiple pronounced words, the comparator 212 matches each pronounced word with a word, of the plurality of words, corresponding to a pronounced word of the multiple pronounced words. In order to elucidate the functioning of the voice analyzer 210 and the comparator 212, consider an example where the voice analyzer 210 receives the speech, in the form of the audio stream, from the speaker via the microphone 102. The speaker pronounced the multiple pronounced words based on the predefined speech stored in the memory unit 208. The plurality of words of the predefined speech are as follow:
[0029] “The computer is one of the most important innovations of Science and Technology. Computers are constantly being updated to make our lives better. In fact the computer is a wonderful electronic brain that we have come to rely on in our everyday life”…………………………………………………………………………...……………...(1)
[0030] It is to be noted that the memory unit 208 stores the metadata associated to the set of words including ‘important’, ‘brain’, and ‘everyday’.
[0031] Based on the predefined speech, the multiple pronounced words pronounced by the speaker are:
[0032] “The computer is one of the most important innovations of Science and Technology. Amm.….Computers are constantly being updated to make our lives better. Uhh…In fact the computer is a wonderful electronic brain that we have come to rely on in our everyday life”……………………………………………………………………………(2)
[0033] After receiving the multiple pronounced words, the voice analyzer 210 then converts the multiple pronounced words into the textual format in real-time. The textual format of the multiple pronounced words is mentioned in the para (2) as above. Subsequently, the comparator 212 matches each pronounced word of the para (2), in a sequence, with the plurality of words mentioned in the para (1). For example, ‘The’ is matched with ‘The’. Similarly, ‘computer’ is matched with ‘computer’ ‘is’ matched with ‘is’ ‘one’ is matched with ‘one’ and likewise.
[0034] Upon matching, the comparator 212 identifies one or more additional words, from the multiple pronounced words, unmatched with the plurality of words. In one aspect, the one or more additional words indicate words are not present in the memory unit 208. It is observed in the above example that the speaker has pronounced additional words like Amm.…. and Uhh… that are not present in the plurality of words of the predefined speech. In addition to identifying the additional words, the comparator 212 further identifies at least one pronounced word matched with at least one word, of the set of words, having the metadata. In the above example, since the memory unit 208 stores the metadata corresponding to the set of words, the comparator 212 identifies ‘important’, ‘brain’, and ‘everyday’ as the pronounced words, having the metadata, matched with the set of words.
[0035] Subsequent to the identification of the one or more additional words, the voice synthesizer 214 eliminates the one or more additional words from the multiple pronounced words. In one aspect, the voice synthesizer 214 eliminates the one or more additional words by suppressing an audio pertaining to the one or more additional words. In the above example, the audio pertaining to ‘Amm.….’ and ‘Uhh…’, that are identified as the additional words, is suppressed by the voice synthesizer 214. Thus, in this manner the speech may be processed by the device 104. In one embodiment, after processing the speech, the audio pertaining to the speech may be produced by a sound producing unit 108 (shown in Fig. 1). The sound producing unit 108 is coupled with the voice synthesizer 214 for generating the audio of the speech delivered by the speaker.
[0036] On the other hand, upon matching of the at least one pronounced word with the at least one word having the metadata, the gesture synthesizer 216 generates a trigger indicating the speaker to perform a gesture. In one aspect, the trigger may be generated based on the metadata of the at least one word matched with the at least one pronounced word. Since the pronounced words i.e. ‘important’, ‘brain’, and ‘everyday’ are having the metadata, the gesture synthesizer 216 generates the trigger on pronunciation of each of ‘important’, ‘brain’, and ‘everyday’ by the speaker in his/her speech. Upon generation of the trigger, the speaker may perform the gesture on actuation of one or more vibrators, coupled with one or more body parts of the speaker. In one aspect, the trigger transmits a signal for actuating the one or more vibrators.
[0037] The device 104 further comprises the Training mode. The Training mode enables the speaker to self-train as the device 104 monitors a count of instances, by using the counter 218, where the one or more additional words have been pronounced by the speaker. In addition, the device 104, in training mode, further monitors the speaker to know a count of instances, by using the counter 218, where the speaker should have performed the gesture but forgot to perform the gesture. The counter 218 counts each additional word pronounced by the speaker. The counter 218 further counts each gesture performed by the speaker while the speaker is delivering the speech. The speaker may then revisits his/her progress using time series trends via a dashboard in order to improve in his/her speaking habits by checking the count of instances where the one or more additional words have been has pronounced and the speaker should have performed the gesture but forgot to perform the gesture.
[0038] Referring now to Figure 3, a method 300 for processing a speech delivered by a speaker is shown, in accordance with an embodiment of the present subject matter. The order in which the method 300 is described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method 300 or alternate methods. Additionally, individual blocks may be deleted from the method 300 without departing from the spirit and scope of the subject matter described herein. Furthermore, the method can be implemented in any suitable hardware, software, firmware, or combination thereof. However, for ease of explanation, in the embodiments described below, the method 300 may be considered to be implemented as described in the device 104.
[0039] At block 302, an audio stream, comprising multiple pronounced words, indicating a speech delivered by a speaker, may be received. The audio stream may be received from an audio input device. The audio stream may be associated to a predefined speech stored in a memory unit 208 coupled with the control unit 206. In one aspect, the predefined speech comprises a plurality of words along with metadata associated to a set of words of the plurality of words. In one implementation, the audio stream may be received by the voice analyzer 210.
[0040] At block 304, the multiple pronounced words may be converted into a textual format in real-time. In one implementation, the multiple pronounced words may be converted by the voice analyzer 210.
[0041] At block 306, each pronounced word may be matched with a word, of the plurality of words, corresponding to a pronounced word of the multiple pronounced words. In one aspect, each pronounced word may be matched to identify one or more additional words, from the multiple pronounced words, unmatched with the plurality of words and at least one pronounced word matched with at least one word, of the set of words having the metadata. In one implementation, each pronounced word may be matched by the comparator 212.
[0042] At block 308, the speech may be processed. In one implementation, the speech may be processed by the voice synthesizer 214 and the gesture synthesizer 216.
[0043] Referring now to Figure 4, a method 308 for processing the speech is shown, in accordance with an embodiment of the present subject matter.
[0044] At block 402, one or more additional words may be eliminated by suppressing an audio pertaining to the one or more additional words. In one implementation, the one or more additional words may be eliminated by the voice synthesizer 214.
[0045] At block 404, a trigger, indicating the speaker to perform a gesture may be generated. In one aspect, the trigger may be generated based on the metadata of the at least one word matched with the at least one pronounced word. In one implementation, the trigger may be generated by the gesture synthesizer 216.
[0046] Exemplary embodiments discussed above may provide certain advantages. Though not required to practice aspects of the disclosure, these advantages may include those provided by the following features.
[0047] Some embodiments of a device and a method enable a speaker to enhance effectiveness of a speech by eliminating additional words, pronounced by the speaker, before delivering the speech to audience/listeners.
[0048] Some embodiments of a device and a method enable the speaker to perform certain gestures necessitude by the context of the speech thereby enhancing the importance of the message to be conveyed to the audience at large.
[0049] Some embodiments of a device and a method enable the speaker to self-train as the device monitors the number of times the speaker pronounced the additional words and also the number of occasions, the speaker should have made a gesture but forgot to make.
[0050] Although implementations for methods and devices for processing a speech delivered by a speaker have been described in language specific to structural features and/or methods, it is to be understood that the appended claims are not necessarily limited to the specific features or methods described. Rather, the specific features and methods are disclosed as examples of implementations for processing the speech.
Claims:WE CLAIM:
1. A method for processing a speech delivered by a speaker, the method comprising:
receiving, by a control unit 206, an audio stream, comprising multiple pronounced words, indicating a speech delivered by a speaker, wherein the audio stream is received from an audio input device 102, and wherein the audio stream is associated to a predefined speech stored in a memory unit 208 coupled with the control unit 206, wherein the predefined speech further comprises a plurality of words along with metadata associated to a set of words of the plurality of words;
converting, by the control unit 206, the multiple pronounced words into a textual format in real-time by using a voice analyzer 210;
matching, by the control unit 206, each pronounced word with a word, of the plurality of words, corresponding to a pronounced word, of the multiple pronounced words, in order to identify
one or more additional words, from the multiple pronounced words, unmatched with the plurality of words, and
at least one pronounced word matched with at least one word, of the set of words, having the metadata; and
processing, by the control unit 206, the speech by
eliminating one or more additional words by suppressing an audio pertaining to the one or more additional words, and
generating a trigger, indicating the speaker to perform a gesture, based on the metadata of the at least one word matched with the at least one pronounced word.
2. The method of claim 1, wherein the one or more additional words indicates words not present in the memory unit 208.
3. The method of claim 1, wherein the audio pertaining to the one or more additional words are suppressed by using a voice synthesizer 216.
4. The method of claim 1, wherein the speaker performs the gesture on actuation of one or more vibrators 108 coupled with one or more body parts of the speaker, and wherein the one or more vibrators 108 are actuated based on the generation of the trigger.
5. The method of claim 1 further comprising counting the one or more additional words and one or more gestures performed by the speaker while the speaker is delivering the speech.
6. A device 104 for processing a speech delivered by a speaker, the device 104 comprising:
a control unit 206; and
a memory unit 208 coupled to the control unit 206, wherein the control unit 206 further comprises:
a voice analyzer 210 for
receiving an audio stream, comprising multiple pronounced words, indicating a speech delivered by a speaker, wherein the audio stream is received from an audio input device 108, and wherein the audio stream is associated to a predefined speech stored in the memory unit 208, wherein the predefined speech further comprises a plurality of words along with metadata associated to a set of words of the plurality of words, and
converting the multiple pronounced words into a textual format in real-time;
a comparator 212 for matching each pronounced word with a word, of the plurality of words, corresponding to a pronounced word, of the multiple pronounced words, in order to identify
one or more additional words, from the multiple pronounced words, unmatched with the plurality of words, and
at least one pronounced word matched with at least one word, of the set of words, having the metadata; and
a voice synthesizer 214 and a gesture synthesizer 216 for processing the speech, wherein the voice synthesizer 214 eliminates one or more additional words by suppressing an audio pertaining to the one or more additional words, and wherein the gesture synthesizer 216 generates a trigger, indicating the speaker to perform a gesture, based on the metadata of the at least one word matched with the at least one pronounced word.
7. The device 104 of claim 6, wherein the one or more additional words indicate words not present in the memory unit 208.
8. The device 104 of claim 6, wherein the speaker performs the gesture on actuation of one or more vibrators 108 coupled with one or more body parts of the speaker, and wherein the one or more vibrators 108 are actuated based on the generation of the trigger.
9. The device 104 of claim 6 further comprising a counter 218 for counting each additional word pronounced by the speaker and a gesture counter for counting each gesture performed by the speaker while the speaker is delivering the speech.
10. The device 104 of claim 6 further comprising a sound producing unit 108 coupled with the voice synthesizer 214 for generating sound of the speech delivered by the speaker.
| # | Name | Date |
|---|---|---|
| 1 | 2923-DEL-2015-RELEVANT DOCUMENTS [20-09-2023(online)].pdf | 2023-09-20 |
| 1 | Form 3 [16-09-2015(online)].pdf | 2015-09-16 |
| 2 | Form 20 [16-09-2015(online)].pdf | 2015-09-16 |
| 2 | 2923-DEL-2015-IntimationOfGrant01-12-2021.pdf | 2021-12-01 |
| 3 | Drawing [16-09-2015(online)].pdf | 2015-09-16 |
| 3 | 2923-DEL-2015-PatentCertificate01-12-2021.pdf | 2021-12-01 |
| 4 | Description(Complete) [16-09-2015(online)].pdf | 2015-09-16 |
| 4 | 2923-DEL-2015-US(14)-HearingNotice-(HearingDate-24-09-2021).pdf | 2021-10-17 |
| 5 | 2923-DEL-2015-Written submissions and relevant documents [05-10-2021(online)].pdf | 2021-10-05 |
| 5 | 2923-del-2015-Form-1-(14-03-2016).pdf | 2016-03-14 |
| 6 | 2923-DEL-2015-Proof of Right [24-09-2021(online)].pdf | 2021-09-24 |
| 6 | 2923-del-2015-Correspondecne Others-(14-03-2016).pdf | 2016-03-14 |
| 7 | Form 26 [23-06-2016(online)].pdf | 2016-06-23 |
| 7 | 2923-DEL-2015-Correspondence to notify the Controller [17-09-2021(online)].pdf | 2021-09-17 |
| 8 | 2923-del-2015-GPA-(29-06-2016).pdf | 2016-06-29 |
| 8 | 2923-DEL-2015-FORM 13 [09-07-2021(online)].pdf | 2021-07-09 |
| 9 | 2923-DEL-2015-POA [09-07-2021(online)].pdf | 2021-07-09 |
| 9 | 2923-del-2015-Correspondence others-(29-06-2016).pdf | 2016-06-29 |
| 10 | 2923-DEL-2015-CLAIMS [29-01-2019(online)].pdf | 2019-01-29 |
| 10 | 2923-DEL-2015-FER.pdf | 2018-11-12 |
| 11 | 2923-DEL-2015-COMPLETE SPECIFICATION [29-01-2019(online)].pdf | 2019-01-29 |
| 11 | 2923-DEL-2015-OTHERS [29-01-2019(online)].pdf | 2019-01-29 |
| 12 | 2923-DEL-2015-DRAWING [29-01-2019(online)].pdf | 2019-01-29 |
| 12 | 2923-DEL-2015-FER_SER_REPLY [29-01-2019(online)].pdf | 2019-01-29 |
| 13 | 2923-DEL-2015-DRAWING [29-01-2019(online)].pdf | 2019-01-29 |
| 13 | 2923-DEL-2015-FER_SER_REPLY [29-01-2019(online)].pdf | 2019-01-29 |
| 14 | 2923-DEL-2015-COMPLETE SPECIFICATION [29-01-2019(online)].pdf | 2019-01-29 |
| 14 | 2923-DEL-2015-OTHERS [29-01-2019(online)].pdf | 2019-01-29 |
| 15 | 2923-DEL-2015-CLAIMS [29-01-2019(online)].pdf | 2019-01-29 |
| 15 | 2923-DEL-2015-FER.pdf | 2018-11-12 |
| 16 | 2923-del-2015-Correspondence others-(29-06-2016).pdf | 2016-06-29 |
| 16 | 2923-DEL-2015-POA [09-07-2021(online)].pdf | 2021-07-09 |
| 17 | 2923-DEL-2015-FORM 13 [09-07-2021(online)].pdf | 2021-07-09 |
| 17 | 2923-del-2015-GPA-(29-06-2016).pdf | 2016-06-29 |
| 18 | 2923-DEL-2015-Correspondence to notify the Controller [17-09-2021(online)].pdf | 2021-09-17 |
| 18 | Form 26 [23-06-2016(online)].pdf | 2016-06-23 |
| 19 | 2923-DEL-2015-Proof of Right [24-09-2021(online)].pdf | 2021-09-24 |
| 19 | 2923-del-2015-Correspondecne Others-(14-03-2016).pdf | 2016-03-14 |
| 20 | 2923-DEL-2015-Written submissions and relevant documents [05-10-2021(online)].pdf | 2021-10-05 |
| 20 | 2923-del-2015-Form-1-(14-03-2016).pdf | 2016-03-14 |
| 21 | Description(Complete) [16-09-2015(online)].pdf | 2015-09-16 |
| 21 | 2923-DEL-2015-US(14)-HearingNotice-(HearingDate-24-09-2021).pdf | 2021-10-17 |
| 22 | Drawing [16-09-2015(online)].pdf | 2015-09-16 |
| 22 | 2923-DEL-2015-PatentCertificate01-12-2021.pdf | 2021-12-01 |
| 23 | 2923-DEL-2015-IntimationOfGrant01-12-2021.pdf | 2021-12-01 |
| 24 | Form 3 [16-09-2015(online)].pdf | 2015-09-16 |
| 24 | 2923-DEL-2015-RELEVANT DOCUMENTS [20-09-2023(online)].pdf | 2023-09-20 |
| 1 | 2923del2015searchstd_12-11-2018.pdf |