Abstract: A system and a method for controlling electrical and electronic devices based on a human voice generated cue has been disclosed. Particularly, the present invention discloses the use of universality of the "Shhh" sound for controlling a device, more specifically for mute control of audio/video devices.
FORM-2
THE PATENTS ACT, 1970
(39 of 1970) &
THE PATENTS RULES, 2003
COMPLETE SPECIFICATION
(See Section 10; Rule 13)
A SYSTEM AND METHOD FOR CONTROLLING ELECTRONIC
AND ELECTRICAL DEVICES
TATA CONSULTANCY SERVICES LTD.,
an Indian Company of NirmaJ Building, 9th Floor, Nariman Point, Mumbai - 400 021,
Maharashtra, India.
The following specification particularly describes the invention and the manner in which it is to be performed.
FIELD OF INVENTION
The present invention relates to the field of telecommunication.
Particularly, the present invention relates to the field of controlling electronic and electrical devices.
DEFINITIONS OF TERMS USED IN THE SPECIFICATION
% Sampling: Sampling in defined as the reduction of a continuous signal to a discrete signal.
• Frame; Frames are samples of a signal of a pre-determined length which are grouped together.
% Spectral maxima: A spectral maximum is the highest peak that is located in the Fourier magnitude spectrum of a signal.
% Location in frequency: Location in frequency is the position of the spectral maxima in the frequency spectrum.
% Amplitude: Amplitude is the magnitude of the peaks in the frequency spectrum.
% Roll off value: Roll off value is the point of the spectrum at which a fixed percentage of the total sum of the spectral energies of the frame is reached.
• Discrete Fourier Transform: Discrete Fourier Transform (DFT) is a technique which allows the computation of spectra from discrete-time data.
• Five point averaging technique: In the five point averaging technique each peak in a particular frame averages with five other neighboring peaks to produce a smooth signal representation with less abrupt changes
in value.
BACKGROUND OF THE INVENTION AND PRIOR ART
The electronic devices including television and audio players are typically controlled by means of a remote control. The remote controls are infrared based devices which work efficiently only when the line of sight of the remote with the receiver of the electronic device is achieved. Hence, when the users are not in the line of sight and need to control the operations of the electronic device they need to try various permutations of the remote position/angle to remotely control the device.
Moreover, when the user is away from the remote or the remote has been misplaced the users need to either hunt for the remote control or control the operations from the control buttons provided on the device itself. This makes the remote control based operations inconvenient. Additionally, the remote controls are battery operated and the operations are dependent on the life of the battery.
Hence there is felt a need for a system which is independent of infrared technology for controlling the electronic/electrical devices. In addition, there is felt a need for a system which liberates the users from the restrictive operations of remote controls.
Therefore, to overcome these shortcomings there was felt a need for voice activated control of devices. Broadly speaking, detection of sounds is important for a variety of tasks, many tasks including that of audio surveillance, and as diverse as human machine interfaces to name a few.
In the Prior Art several patents disclose a variety of sound activated systems for controlling devices:
United States Patent 3892920 illustrates an acoustic activated switch which includes a combination of tuned transducers, bistable switch and monostable switch in a predetermined frequency range in order to control the two functions of an auxiliary device coupled to the switch. The auxiliary device may be a television receiver and the functions controlled are the electrical power to the receiver and the channel selection.
United States Patent 4471683 illustrates a voice command weapons launching system for use by a pilot of an aircraft against a plurality of simultaneously appearing (i.e. existing) targets in a manner such that the pilot can keep his hands engaged on maneuvering the aircraft, whilst using his eyes and voice to engage the weapon command system, thus easing the multitasking process. Voice controlled input devices used in this system may
be selected from a range which includes two types, i.e. discrete word recognition type, and connected word recognition type; and they can be speaker dependent (i.e. they must recognize the voice of who is talking to be operative) or speaker independent (i.e. any voice can make the devices operative).
United States Patent 4641292 illustrates a voice controlled welding system for permitting a human operator to adjust the welding power supply through verbal commands. This includes a recognition unit and a computer, which is electrically connected to deliver power control signals to the welding power supply to thereby adjust the power delivered to the welding torch.
United States Patent 5209695 discloses a sound controllable apparatus particularly useful in controlling toys and robots, said apparatus including a microphone, a processor for analyzing the received sound commands and for determining the number of space-separated words or other interrupted sounds, such as beeps, hand-claps and whistles in a received sound command.
United States Patent 5226090 discloses a voice-operated remote control system which transmits a remote control signal in response to a voice command and has a detector for detecting whether a voice command is received or not.
United States Patent 7426414 discloses a method for deriving a carrier that may be used to stimulate a cochlea based on a signal representative of sound, the method comprising: computing a frequency spectrum of the
signal representative of sound; arranging the frequency spectrum into a plurality of channels such that each channel corresponds to a range of frequencies that lie within the frequency spectrum; selecting a frequency from within the range of frequencies that corresponds to a channel of the plurality of channels; and deriving a waveform for the carrier, the waveform having a frequency that corresponds to the selected frequency, the waveform having a modulation depth that decreases as the selected frequency increases; wherein the modulation depth is based on a rate at which each of the plurality of channels is updated, wherein selecting the frequency comprises deriving a waveform having a frequency that is based on the frequency at which a spectral peak is located within the range of frequencies that corresponds to the channel.
United States Patent 5095904 illustrates a multi-speak speech procession i.e. an improved pulsatile system for a cochlear prosthesis is disclosed. A multi-spectral peak coding strategy is employed and presented to various electrodes of the implant, thus determining the stimulus intensity.
While each of the aforementioned systems and apparatus, incorporates a voice-activated or a sound-activated system for control of associated or auxiliary devices, the exact description and theories for avoiding inaccuracy, rather to make the system and apparatus considerably receptive to only the sound or voice or sound/voice pattern that it is intended to operate upon, is missing in each of the above.
And hence, their working in a practical environment which includes random sounds, interruptions, and disturbances is questionable. This means that betterment in the existing systems is possible to negate the erroneous signals and depart from near-ideal scenarios in which most sound/voice activated signals work and hence aim to achieve efficiency of working in a practical, random, non-noiseless scenario.
Hence, there is a need for an universal sound pattern and system adapted to recognize such a sound pattern which overrides the fluctuations in tone, pitch, rhythm, scale, volume, and is unaffected by dialects or accents.
OBJECT OF THE INVENTION
An object of the present invention is to provide a user friendly system for controlling electrical and electronic devices using voice-activated commands.
Another object of the present invention is to provide a system whose operation is independent of the line-of-sight with the devices thus, enabling users to operate the devices from any angle.
Yet another object of the present invention is to provide a system which is easily installable and requires low maintenance.
Still another object of the present invention is to provide a language independent system for controlling electrical and electronic devices.
Further, another object of the present invention is to provide a system which can be used by the physically challenged to conveniently operate and control the electrical and electronic devices.
Still further, another object of the present invention is to provide a system which works efficiently in practical, random and noisy scenarios.
Another object of the present invention is to provide a voice-activated system which is unaffected by dialects or accents present in users voices.
SUMMARY OF THE INVENTION
In accordance with the present invention, there is provided a system for mute control of electrical and electronic devices based on human voice generated cue, the system comprising:
• sensing means adapted to sense the voice cue;
• receiver means adapted to receive the voice cue signal from the sensing means;
• pre-processing means co-operating with the receiver means, the pre-processing comprising:
i. sampling means adapted to sample the voice cue signal at a sampling rate greater than 8000 samples/sec;
ii. frame creation means adapted to create a set of overlapping frames for the sampled signal;
iii. first conversion means adapted to convert each sampled frame to the spectral domain;
iv. computational means adapted to detect the spectral maxima and determine the amplitude and location in frequency for representing each of the converted frames and further adapted to compute the roll off value for each of the converted frames;
v. second conversion means adapted to smoothen the computed spectral roll off value across each of the converted frames;
• detection means adapted to receive the pre-processing attributes of the converted frames and further adapted to detect the voice cue 'Shhh' from the processed signal; and
• controlling means adapted to receive the detected voice cue and further adapted to mute the electrical/electronic device.
Typically, the sensing means is a directional microphone attached to the electrical/electronic device.
Alternatively, the sensing means is a microphone of a handheld device including remote controls and mobile phones.
Preferably, the sensing means includes a voice activity detector to conserver power.
Typically, the sensing means communicates the voice cue signal to the receiver means using radio frequency.
Preferably, the receiver means is hosted inside the electrical / electronic device.
Typically, the pre-processing means is selected from a group consisting of a microprocessor and a Digital Signal Processor.
Typically, the sampling means is an analog to digital converter.
Preferably, the frame creation means is further adapted to create frames which overlap each other typically by 50%.
Typically, each of the sampled frames is converted to the spectral domain using Discrete Fourier Transform.
Typically, the spectral roll off representation of each of the converted frames is smoothened using typically five point averaging technique over duration of 1.5 seconds.
Typically, the detection means detects the voice cue 'Shhh' using thresholding techniques.
In accordance with the present invention, there is provided a method for mute control of electrical and electronic devices based on human voice generated cue, the method comprising the following steps:
• sensing the voice cue;
• receiving the sensed voice cue signal;
• sampling the sensed voice cue signal at a sampling rate greater than 8000 samples/sec;
• creating overlapping frames for the sampled signal;
• converting each sampled frame to the spectral domain;
• detecting the spectral maxima for each of the converted frames and determining the amplitude and location in frequency for representing each of the converted frames;
• computing the roll off value for each of the converted frames;
• smoothening the computed spectral roll off value across each of the converted frames;
• checking if there are any overlapping frames remaining to be processed from the sampled signal, if yes then processing the frames;
• recognizing the voice cue 'Shhh' from the processed signal; and
• controlling the mute operation of the electrical/electronic devices.
In accordance with this invention, the step of sensing the voice cue includes the step of sensing the voice cue using a directional microphone transmitter attached to the electrical and/or electronic device.
Alternatively, the step of sensing the voice cue includes the step of sensing the voice cue using the microphone transmitter of hand-held devices including remote controls and mobile phones.
Typically, the step of creating overlapping frames includes the steps of converting the sampled audio cue segment into sets of frames overlapping each other by 50%.
In accordance with this invention, the step of converting each of the frames into the spectral domain includes the step of applying Discrete Fourier Transform to each frame from the set of frames.
Preferably, the step of detecting the spectral maxima and determining the amplitude and location in frequency for each of the converted frames includes the step of computing the maxima and its amplitude and location in frequency using the first and second derivative tests on the spectrum.
In accordance with this invention, the step of computing the roll off value for each of the converted frames includes the step of calculating a point in the spectrum at which a predetermined fixed percentage of the total sum of the spectral energies of the frame is reached.
Typically, the step of smoothening the computed spectral roll off value includes the step of smoothening the spectrum roll off value using five point averaging technique.
Preferably, the step of recognizing the voice cue 'Shhh' includes the steps of:
• determining if the amplitudes and locations of peaks of the signal are higher than a pre-computed threshold 'HTh';
• computing the width from the signal;
• comparing the immediate left and right neighbours of the peak with a given left threshold and right threshold 'HTL' and 'HTR' respectively;
• comparing the thresholds with the next two neighbours of the peak, if no further neighbours are present then going to the next step;
• calculating the width of the given peak as the distance between the last left and right neighbours thus obtained;
• finding the peak with the largest width; and
• detecting the audio cue 'Shhh' as the width which is greater than a pre-defined threshold 'WTh'.
BRIEF DESCRIPTION OF THE ACCOMPANYING DRAWINGS
Other aspects of the invention will become apparent by consideration of the accompanying drawings and their description stated below, which is merely illustrative of a preferred embodiment of the invention and does not limit in any way the nature and scope of the invention.
FIGURE 1 illustrates a schematic of the system for mute control of electrical and electronic devices based on human voice generated cue;
FIGURE 2 illustrates a functional block diagram of the pre-processing means for detection of human voice generated cue;
FIGURE 3A illustrates a graph showing Roll off of spectral maxima computed from DFT magnitude spectrum per frame of speech/audio data;
FIGURE 3B illustrates a graph obtained by smoothening of Roll off of spectral maxima computed from DFT magnitude spectrum per frame of speech/audio data; and
FIGURE 4 is a flowchart showing the steps for mute control of electrical and electronic devices based on human voice generated cue.
DETAILED DESCRIPTION
The present invention envisages a system and a method for controlling electrical and electronic devices based on a human voice generated cue. Particularly, the present invention envisages the use of universality of the "Shhh" sound for controlling a device, more specifically for mute control of an audio/video device such as a television.
The human generated voice cue sounding "Shhh" is typically the least distortable of the sounds and is universally recognized in all languages and communities for asking someone to become quiet. The detection of the particular sound 'Shhh' is attractive as it is a natural sound produced by humans in a variety of situations. Further, the usefulness of the mentioned technique is demonstrated by a voice-activated control typically for use in the set-top box of a television set, wherein a user can remotely turn on/off the audio signal.
The present invention aims at developing a system and method which uses a spectral peak based representation of the audio, hence making the system unaffected to dialects and/or accent of the users. Spectral peaks provide important information of the underlying speech in a given speech signal. Spectral peaks located in the Fourier magnitude spectrum of a speech signal are termed as spectral maxima and are characterized by their amplitude and frequency location. Such spectral maxima information is utilized for
robustly detecting an acoustic signature of the sound 'Shhh' in accordance with this invention.
The method comprises determining the spectral peak information of the current pattern data, wherein the current pattern data comprises speech/audio data, broken into small frames and processed frame wise. Further, the current speech/audio signal frame is converted to the Fourier spectral domain and the spectral maxima, which are the peaks of the Fourier magnitude spectrum, computed and represented by their amplitude and frequency location. The method further determines the spectral roll off of the located spectral maxima within the current speech/audio frame. The spectral roll off across the frames is further converted to a smoothened representation, typically by using the five point averaging technique over duration of 1.5 seconds. The method further detects the pattern of the sound 'shhh' by applying suitable thresholding technique on the above mentioned smoothened representation.
The method does not require any prior-training of the user as well as the system does not require any training from the user. The method does not require that the user hold a microphone, thus facilitating natural speaking.
Referring to the accompanying drawings, FIGURE 1 is a block diagram of the system for mute control of electrical and electronic devices based on human voice generated cue.
The system consists of a sensing means 100 typically a microphone transmitter that captures the audio i.e. the voice cue. The sensing means 100
can be a microphone which is directional and is attached to the electronic device. Alternatively, the sensing means 100 can be the microphone of a handheld device such as remote control, mobile phone having the capability to work as a remote control.
The sensing means 100 has the capability to distinguish between the human audio cue and false triggering from almost all of other un-intended sources by implementing the following approaches:
• one approach is to have a directional microphone attached to the device and facing the users hence it discards any similar sound generated by the electrical and electronic devices;
• second approach is to have the maximum and minimum duration of the 'Shhh' audio cue pre-set in such a way that it can filter out similar non human generated sounds, for instance, the whistle of a cooker; and
• One more approach proposed by the present invention is the use of two 'Shhh' audio cues in a sequence. For the first 'Shhh* audio cue the electronic device will pause for a confirmation and the second 'Shhh' audio cue will provide the confirmation.
The sensing means 100 always stays in the listen mode. Whenever a 'Shhh' sound occurs it gets detected and an event for action is raised. To conserve power in a continuous listen mode, a voice activity detector can be used to trigger the microphone operation. This voice activity detector may be a simple audio intensity thresholding circuit.
The sensing means 100 communicates the voice cue signal captured by the microphone to a receiver means 102 using radio frequency. The receiver means 102 is hosted inside the electronic and electrical devices and passes the sensed audio cue to the pre-processing means 104.
FIGURE 2 shows a schematic of the pre-processing means.
The pre-processing means 104 comprises a sampling means 200 for digitizing the audio cue using the conventional sampling rates for speech/audio processing typically, above 8000 samples/sec. A fixed length of audio (e.g. one second) is taken for processing at a time by the preprocessing means 104. The frame creation means 202 converts the digitized audio segment into a set of frames overlapped by 50%; the frame length is chosen in accordance with the conventional speech/audio processing techniques.
The first conversion means 204 converts each frame from the set of the digitized audio cue to the spectral domain using Discrete Fourier Transform. The converted frames are further processed by the computational means 206. The computational means 206 detects the spectral maxima and determines the amplitude and location in frequency for representing each of the converted frames and further computes the roll off value for each of the converted frames.
The spectral maximum is computed by the computational means 206 using a function f(x) having a relative (or local) maximum at x0 if there is some interval (r,x) containing x0 for which f(x0) > f(x) for all x between r and s for
which f(x) is defined. Similarly, f(x) having a relative (or local) minimum at x0 if there is an interval (r,s) containing x0 for which f(x0) < f(x) for all x between r and s for which f(x) is defined. Relative extremum means either a relative maximum or a relative minimum.
By the first derivative test, relative extrema occur where f (x) changes sign. Once the extrema are found, the local maximum or minimum can be found by the second derivative test. f(x) has a local minimum at x0 if, f(xo) = Oandf'(x0)>0
Similarly, f(x) has a local maximum at x0 if, f(x0) = 0andf'(x0)<0
For a given discrete sequence, the first and second derivatives are approximated by their first and second order differences.
The maxima computed from the Fourier magnitude spectrum are the spectral maxima corresponding to the given speech/audio spectral frame. They are characterized by their magnitude and location in frequency and are computed by the first and second derivative tests on the spectrum. Such spectral maxima provide important information of the underlying speech/audio and in addition are robust in the presence of additive noise, being high signal-to-noise ratio (SNR) points of the spectrum.
The location and amplitudes of these spectral maxima for every frame are used for further processing after discarding other values in the spectrum.
For every frame, a roll off value is calculated which is the point of the spectrum at which a fixed percentage (typically between 40-60) of the total sum of the spectral energies of the frame is reached, summing from the zero frequency towards higher frequencies. The selection of the mentioned percentage value depends on the microphone characteristics and can be treated heuristically. The maximum of the roll off values is designated as the threshold 'DThThe graph of FIGURE 3a shows computed roll-off value signal across consecutive frames. Further, the roll-off value signal is smoothened using second conversion means 208 using five point averaging, which is shown in FIGURE 3B.
The detection means 106 detects the audio cue "Shhh" from the smoothened signal as follows:
1. The amplitudes and locations of peaks of the signal as seen in
FIGURE 3A are noted and if the peak is higher than pre-computed
threshold *HTh' then for each of the peak, the width is computed from
the signal as seen in FIGURE 3B as below:
la. Consider the immediate left and right neighbours of the peak. If
and only if they are above a given threshold 'HTL' and 'HTR'
respectively, proceed to the next two neighbours of the peak else
stop.
lb. The width of the given peak is calculated as the distance between
the last left and right neighbours thus obtained.
2. In the given segment find the peak with the largest width.
3. If this width is greater than a pre-defined threshold 'WTh\ flag the detection of the audio cue "Shhh".
A signal processing module using the spectral peak information can be used to recognize the detection of the sound 'Shhh' against the background of a typical TV channel being played on. Thus detected sound 'Shhh' is used to trigger a flag/control in the set-top-box which turns off the audio signal. The technique has been observed to work satisfactorily for all typical TV channel backgrounds like music, news, movie, sports and the like.
Although the system has been described with reference to a TV set it can be used to control a variety of devices, such as for example a lighting fixture or an air conditioner or even a computer terminal.
For un-muting the electrical and electronic device the following approaches have been proposed by the present invention:
1. 'Shhh' audio cue will work in a same manner similar to the 'Mute' button of a remote control in toggle mode of operation, which means a second 'Shhh' audio cue will un-mute the devices.
2. Secondly 'Shhh' sound of two durations can be used. A 'Shhh' with longer duration mutes the device and subsequently a shorter 'Shh' will un-mute it.
In accordance with the present invention, there is provided a method for mute control of electrical and electronic devices based on human voice
generated cue, the method comprising the following steps as seen in FIGURE 4:
• sensing the voice cue, 1000;
• receiving the sensed voice cue signal, 1002;
• sampling the sensed voice cue signal at a sampling rate greater than 8000 samples/sec, 1004;
• creating overlapping frames for the sampled signal, 1006;
• converting each sampled frame to the spectral domain, 1008;
• detecting the spectral maxima for each of the converted frames and determining the amplitude and location in frequency for representing each of the converted frames, 1010;
• computing the roll off value for each of the converted frames, 1012;
• smoothening the computed spectral roll off value across each of the converted frames, 1014;
• checking if there are any overlapping frames remaining to be processed from the sampled signal, if yes then processing the remaining frames, 1016;
• recognizing the voice cue 'Shhh' from the processed signal, 1018; and
• controlling the mute operation of the electrical/electronic devices, 1020.
EXPERIMENTAL DETAILS
A personal computer with a microphone was used in the test setup. A video was running on the personal computer using a video player along with the system envisaged by the present invention. The system sampled the
microphone signals and once a 'Shhh' sound was detected, it sent a control signal to the video player for muting the sound. Another 'Shhh' sound toggled the state to Un-mute. This experiment was conducted with different types of video contents like news, movie, sports, music and these contents fine tuned the detection methodology for not getting triggered with any undesired sound other than human generated audio cue 'Shhh'.
In addition, the voice samples of various volunteers from different linguistic and cultural backgrounds were taken to provide a test utterance of 'Shhh' sound so that a good set of test data was available for fine tuning and testing the detection.
In a separate experimentation the methodology envisaged by the present invention was implemented on a set-to-box and tested with the above scenarios. In this experiment as well, the 'mute control' was functioning as
desired.
TECHNICAL ADVANTAGES
The technical advancements of the present invention include:
• providing a system for mute control of electrical and electronic devices based on human voice generated cue;
• providing a user friendly system for controlling electrical and electronic devices using voice cues;
• providing a system whose operation is independent of the line-of-sight with the devices thus, enabling users to operate the devices from
any angle within a solid angle determined by the directionality of microphone;
• providing a system which is easily installable and requires low maintenance;
• providing a language independent system for controlling electrical and electronic devices;
• providing a system which can be used by the physically challenged to conveniently operate and control the electrical and electronic devices;
• providing a system which works efficiently in a practical, random, noisy scenario;
• providing a voice-activated system which is unaffected by dialects or accents present in users voices;
• providing a spectral peak feature based robust detection system for detecting the designated audio cue;
• providing a system which does not require any prior-training of the user as well as the system does not require any training from the user; and
• providing a system which does not require that the user hold a microphone, thus facilitating natural speaking.
While considerable emphasis has been placed herein on the components and component parts of the preferred embodiments, it will be appreciated that many embodiments can be made and that many changes can be made in the preferred embodiments without departing from the principles of the invention. These and other changes in the preferred embodiment as well as
other embodiments of the invention will be apparent to those skilled in the art from the disclosure herein, whereby it is to be distinctly understood that the foregoing descriptive matter is to be interpreted merely as illustrative of the invention and not as a limitation.
We Claim:
1. A system for mute control of electrical and electronic devices based on human voice generated cue, the system comprising:
• sensing means adapted to sense the voice cue;
• receiver means adapted to receive the voice cue signal from the sensing means;
• pre-processing means co-operating with the receiver means, the pre-processing comprising:
i. sampling means adapted to sample the voice cue signal at a sampling rate greater than 8000 samples/sec;
ii. frame creation means adapted to create overlapping frames for the sampled signal;
iii. first conversion means adapted to convert each sampled frame to the spectral domain;
iv. computational means adapted to detect the spectral maxima and determine the amplitude and location in frequency for representing each of the converted frames and further adapted to compute the roll off value for each of the converted frames;
v. second conversion means adapted to smoothen the computed spectral roll off value across each of the converted frames;
• detection means adapted to receive the pre-processing attributes of the converted frames and further adapted to detect the voice cue 'Shhh' from the processed signal; and
• controlling means adapted to receive the detected voice cue and further adapted to mute the electrical/electronic device.
2. A system as claimed in claim 1, wherein the sensing means is a directional microphone attached to the electrical/electronic device.
3. A system as claimed in claim 1, wherein the sensing means is a microphone of a handheld device including remote controls and mobile phones.
4. A system as claimed in claim 1, wherein said sensing means includes a voice activity detector to conserver power.
5. A system as claimed in claim 1, wherein the sensing means communicates the voice cue signal to the receiver means using radio frequency.
6. A system as claimed in claim 1, wherein the receiver means is hosted inside the electrical / electronic device.
7. A system as claimed in claim 1, wherein the pre-processing means is selected from a group consisting of a microprocessor and a Digital Signal Processor.
8. A system as claimed in claim 1, wherein the sampling means is an Analog to Digital converter.
9. A system as claimed in claim 1, wherein the frame creation means is further adapted to create frames which overlap each other typically by 50%.
10.A system as claimed in claim 1, wherein each of the sampled frames is converted to the spectral domain using Discrete Fourier Transform.
11.A system as claimed in claim 1, wherein the spectral roll off representation of each of the converted frames is smoothened using typically five point averaging technique over duration of 1.5 seconds.
12.A system as claimed in claim 1, wherein the detection means detects the voice cue 'Shhh' using thresholding techniques.
13.A method for mute control of electrical and electronic devices based on human voice generated cue, the method comprising the following steps: • sensing the voice cue;
• receiving the sensed voice cue signal;
• sampling the sensed voice cue signal at a sampling rate greater than 8000 samples/sec;
• creating overlapping frames for the sampled signal;
• converting each sampled frame to the spectral domain;
• detecting the spectral maxima for each of the converted frames and determining the amplitude and location in frequency for representing each of the converted frames;
• computing the roll off value for each of the converted frames;
• smoothening the computed spectral roll off value across each of the converted frames;
• checking if there are any overlapping frames remaining to be processed from the sampled signal, if yes then processing the frames;
• recognizing the voice cue 'Shhh' from the processed signal; and
• controlling the mute operation of the electrical/electronic devices.
14. A method as claimed in claim 13, wherein the step of sensing the voice cue includes the step of sensing the voice cue using a directional microphone transmitter attached to the electrical and/or electronic device.
15. A method as claimed in claim 13, wherein the step of sensing the voice cue includes the step of sensing the voice cue using the microphone transmitter of hand-held devices including remote controls and mobile phones.
16. A method as claimed in claim 13, wherein the step of creating overlapping frames includes the steps of converting the sampled audio cue segment into sets of frames overlapping each other by 50%.
17. A method as claimed in claim 13, wherein the step of converting each of the frames into the spectral domain includes the step of applying Discrete Fourier Transform to each frame from the set of frames.
18. A method as claimed in claim 13, wherein the step of detecting the spectral maxima and determining the amplitude and location in frequency for each of the converted frames includes the step of computing the maxima and the amplitude and location in frequency using the first and second derivative tests on the spectrum.
19. A method as claimed in claim 13, wherein the step of computing the roll off value for each of the converted frames includes the step of calculating a point in the spectrum at which a predetermined fixed percentage of the total sum of the spectral energies of the frame is reached.
20. A method as claimed in claim 13, wherein the step of smoothening the
computed spectral roll off value includes the step of smoothening the
spectrum roll off value using 5 point averaging technique.
21.A method as claimed in claim 13, wherein the step of recognizing the voice cue 'Shhh' includes the steps of:
• determining if the amplitudes and locations of peaks of the signal are higher than a pre-computed threshold 'HTh';
• computing the width from the signal;
• comparing the immediate left and right neighbours of the peak with a given left threshold and right threshold 'HTL' and 'HTR' respectively;
• comparing the thresholds with the next two neighbours of the peak, if no further neighbours are present then going to the next step;
• calculating the width of the given peak as the distance between the last left and right neighbours thus obtained;
• finding the peak with the largest width; and
• detecting the audio cue 'Shhh' as the width which is greater than a pre-defined threshold 'WTh
| Section | Controller | Decision Date |
|---|---|---|
| 15 | NALINI KANTA MOHANTY | 2018-02-19 |
| 15 | NALINI KANTA MOHANTY | 2018-02-19 |
| # | Name | Date |
|---|---|---|
| 1 | 2524-MUM-2008-FORM 18(18-11-2010).pdf | 2010-11-18 |
| 2 | 2524-MUM-2008-CORRESPONDENCE(18-11-2010).pdf | 2010-11-18 |
| 3 | 2524-MUM-2008-FORM-26 [05-01-2018(online)].pdf | 2018-01-05 |
| 4 | 2524-MUM-2008-Written submissions and relevant documents (MANDATORY) [19-01-2018(online)].pdf | 2018-01-19 |
| 5 | 2524-MUM-2008-PatentCertificate20-02-2018.pdf | 2018-02-20 |
| 6 | 2524-MUM-2008-IntimationOfGrant20-02-2018.pdf | 2018-02-20 |
| 7 | abstract1.jpg | 2018-08-09 |
| 8 | 2524-MUM-2008_EXAMREPORT.pdf | 2018-08-09 |
| 9 | 2524-MUM-2008-Power of Attorney-071215.pdf | 2018-08-09 |
| 10 | 2524-MUM-2008-Other Patent Document-071215.pdf | 2018-08-09 |
| 11 | 2524-MUM-2008-ORIGINAL UNDER RULE 6 (1A)-050118.pdf | 2018-08-09 |
| 12 | 2524-MUM-2008-MARKED COPY-071215.pdf | 2018-08-09 |
| 13 | 2524-MUM-2008-HearingNoticeLetter.pdf | 2018-08-09 |
| 14 | 2524-MUM-2008-FORM 5(1-12-2009).pdf | 2018-08-09 |
| 15 | 2524-mum-2008-form 3.pdf | 2018-08-09 |
| 16 | 2524-mum-2008-form 26.pdf | 2018-08-09 |
| 17 | 2524-mum-2008-form 2.pdf | 2018-08-09 |
| 19 | 2524-mum-2008-form 2(title page).pdf | 2018-08-09 |
| 20 | 2524-MUM-2008-FORM 2(TITLE PAGE)-(1-12-2009).pdf | 2018-08-09 |
| 21 | 2524-mum-2008-form 2(1-12-2009).pdf | 2018-08-09 |
| 22 | 2524-mum-2008-form 1.pdf | 2018-08-09 |
| 23 | 2524-MUM-2008-FORM 1(23-6-2010).pdf | 2018-08-09 |
| 24 | 2524-MUM-2008-Examination Report Reply Recieved-071215.pdf | 2018-08-09 |
| 25 | 2524-mum-2008-drawing.pdf | 2018-08-09 |
| 26 | 2524-MUM-2008-DRAWING(1-12-2009).pdf | 2018-08-09 |
| 27 | 2524-mum-2008-discription(provisional).pdf | 2018-08-09 |
| 29 | 2524-MUM-2008-DESCRIPTION(COMPLETE)-(1-12-2009).pdf | 2018-08-09 |
| 30 | 2524-mum-2008-correspondence.pdf | 2018-08-09 |
| 31 | 2524-MUM-2008-CORRESPONDENCE(23-6-2010).pdf | 2018-08-09 |
| 32 | 2524-MUM-2008-CORRESPONDENCE(1-12-2009).pdf | 2018-08-09 |
| 34 | 2524-MUM-2008-Claims-071215.pdf | 2018-08-09 |
| 35 | 2524-MUM-2008-CLAIMS(1-12-2009).pdf | 2018-08-09 |
| 37 | 2524-MUM-2008-ABSTRACT(1-12-2009).pdf | 2018-08-09 |
| 38 | 2524-MUM-2008-ORIGINAL UNDER RULE 6 (1A)-240118.pdf | 2019-03-01 |
| 39 | 2524-MUM-2008-RELEVANT DOCUMENTS [23-03-2019(online)].pdf | 2019-03-23 |
| 40 | 2524-MUM-2008-RELEVANT DOCUMENTS [29-03-2020(online)].pdf | 2020-03-29 |
| 41 | 2524-MUM-2008-RELEVANT DOCUMENTS [29-03-2020(online)]-1.pdf | 2020-03-29 |
| 42 | 2524-MUM-2008-RELEVANT DOCUMENTS [29-09-2021(online)].pdf | 2021-09-29 |
| 43 | 2524-MUM-2008-RELEVANT DOCUMENTS [26-09-2022(online)].pdf | 2022-09-26 |
| 44 | 2524-MUM-2008-RELEVANT DOCUMENTS [28-09-2023(online)].pdf | 2023-09-28 |
| 45 | 2524-MUM-2008-FORM 4 [22-05-2024(online)].pdf | 2024-05-22 |