the present invention relates to a method for audio encoding/decoding using non-uniform delta modulation by dividing sampled signal in to frames and sub-frames, computing maximum of the signal from each of said sub-frames to obtain temporal envelope, temporal envelop is obtained by convolution, up-sampling the signal after reducing dynamic range of spectrum, computing gain parameters to obtain normalized signal from the up-sampled signal, performing nonlinear mapping on said normalized signal using frequency modulation; and packing binary information of the frequency modulated signal, the temporal envelope parameter and gain parameter into binary file thereafter compressing the file.
FIELD OF THE INVENTION
The present invention relates to Signal Compression Technology for high-quality multi-channel listening, compression of CD quality music/speech using the principles of non-uniform deltamodulation, noise-shaping, wideband frequency modulation and spectrum band replication.
BACKGROUND OF THE INVENTION AND PRIOR ART
The field of audio signal compression is about four decades old. The sampling rates are high (for
example: CD quality sampling rate is 44.1kHz) in order to preserve the bandwidth. In addition,
the precision (number of bits per sample) is also high in order to minimize the noise due to
quantization. Typically, for CD quality music, the number of bits per sample is 16. The large
bandwidth and the high precision make it expensive to store volumes of music. It is also
cumbersome to transmit over bandwidth-limited communication channels. Hence, there has been
a great demand for reducing the bit rate and thereby the storage space/required transmission rate.
However, it is desired that the reduction in bit rate be achieved with only a minimal loss of the
perceived audio quality. In applications where high compression factors are desired, a mild
reduction in the quality is tolerable. In general, it is desired that the quality degrade gracefully
with reduction in bit rate or increase in compression factor.
Yet another requirement of signal compression technology is the establishment of ownership of a
given piece of music. This can be achieved using proprietary audio coding techniques and not
open-source solutions.
The present invention relates to a new signal processing technique that is competitive with the
existing technologies. At the same time, its proprietary nature is a powerful means of retaining
ownership of music.
Existing technologies and their drawbacks:
Over the past few decades, several transform coders have been proposed and they have had
varying degrees of success - technologically and commercially. The notable ones are the
following:
(1)MP3
(2) MPEG-2 AAC
(3) Ogg Vorbis
(4) MP3 Surround
(5) AAC Plus
Of the above, MP3 has had the greatest global impact in the last one decade primarily due to the advances in peer-to-peer computer communication technology. However, the MP3 technology has also given rise to the wide-spread piracy of music causing severe financial losses to recording industries world-wide. Recording industries have always been looking for controlled distribution of music with full ownership - a feature that only a competitive proprietary audio coding algorithm can provide. The other formats such as the MPEG-2 AAC succumbed to the popularity of MP3 and could not gain wide recognition even though they were superior in performance. It was only recently that MPEG2-AAC gained popularity after M/s. Apple Computer Inc. recognized it as its iTunes music data format (with additional security).
The following references will be useful in understanding the invention. References
[1] C. Faller and F. Baumgarte, "Binaural cue coding applied to stereo and multichannel audio compression", 112th Convention of the Audio Engineering Society, May 2002. [2] C. Faller and F. Baumgarte, "Binaural cue coding - part I: psychoacoustic fundamentals and design principles", IEEE Transactions on Speech and Audio Processing, Vol. 11, No. 6, Nov. 2003.
[3] C. Faller and F. Baumgarte, "Binaural cue coding - part II: schemes and applications", IEEE Transactions on Speech and Audio Processing, Vol. 11, No. 6, Nov. 2003. [4] S. Haykin, "Communication Systems", 3rd edition, John Wiley and Sons, New York, 2000. [5] J. Herre, "Temporal noise shaping, quantization and coding methods in perceptual audio coding: A tutorial introduction", AES 17th International conference on high quality audio coding.
[6] Khalid Sayood, "Introduction to data compression", (2nd edition), Morgan Kaufmann Publishers Inc., San Francisco, USA, 2000.
[7] B. P. Lathi, Modern digital and analog communication systems, Third edition, Oxford University Press, 1998.
[8] J. Makhoul, "Linear prediction: a tutorial review", Proceedings of the IEEE, 63:561-580, April 1975.
[9] Mark Nelson and G. J-Loup, "The data compression book", (2nd edition), M &T books,
New York, 1995.
[10] TV Sreenivas, C. Thomas and K. Sandeep, "Audio spatialization through quasi-stereo",
International conference on signal processing applications and technology (ICSPAT), San
Diego 1997.
[11] TG Thomas and S Chandra Sekhar, Communication Theory, First edition, Tata Mc.Graw
Hill Publishing Co., Jan 2006.
[12] TG Thomas and S Chandra Sekhar, Analog Communication, First edition, Tata Mc.Graw
Hill PublishingCo., Dec 2006.
[13] R. Wiley, H. Schwarzlander and D.Weiner, "Demodulation procedure for very wideband
FM", IEEE Transactions on Communications, Vol. 25, Issue 3, pp. 318-327, March 1977.
Prior art:
The 'representative' prior art in the area comprises of:
(1) M. Dietz, L. Liljeryd, K. Kjorling and O. Kunz, Spectral Band Replication, a novel approach in audio coding, Proc. AES 112th Convention, 10-13 May, 2002.
(2) L. Liljeryd, R. A. Ekstrand, L. F. Henn and H. M. K. Kjorling, Source coding enhancement using spectral-band replication, United States Patent Application Publication, Pub. No. US 2004/0125878, 1 Jul. 2004.
(3) C. Faller, Parametric coding of spatial audio, Ph.D thesis, Ecole Polytechnique Federale de Lausanne, 2004.
(4) Bosi et al., ISO/IEC MPEG-2 Advanced Audio Coding, Papers, J. Audio Eng. Soc, vol. 45 (No. 10), p. 789-814 (Oct. 1997).
(5) James D. Johnston, Perceptual Transform Coding of Wideband Stereo Signals, ICASSP-89, vol. 3, 1989, pp. 1993-1996.
(6) James D. Johnston, Transform Coding of Audio Signals Using Perceptual Noise Criteria, IEEE Journal on Selected Areas in Communications, vol. 6, No. 2, Feb. 1988, pp. 314-323.
Representative patent search hits from the USPTO:
Keyword: Scalable audio coding
6,947,886 Scalable compression of audio and other signals
6,934,679 Error resilient scalable audio coding
6,876,623 Tuning scheme for code division multiplex broadcasting system
6,789,123 System and method for delivery of dynamically scalable audio/video content over a
network
6,735,339 Multi-stage encoding of signal components that are classified according to component
value
6,654,716 Perceptually improved enhancement of encoded acoustic signals.
6,529,604 Scalable stereo audio encoding/decoding method and apparatus
6,526,384 Method and device for limiting a stream of audio data with a scaleable bit rate
6,502,069 Method and a device for coding audio signals and a method and a device for decoding
a bit stream
6,446,037 Scalable coding method for high quality audio
6,438,525 Scalable audio coding/decoding method and apparatus
6,370,507 Frequency-domain scalable coding without upsampling filters
6,349,284 Scalable audio encoding/decoding method and apparatus
6,182,031 Scalable audio coding system
6,178,405 Concatenation compression method
6,148,288 Scalable audio coding/decoding method and apparatus
6,122,618 Scalable audio coding/decoding method and apparatus
6,115,688 Process and device for the scalable coding of audio signals
6,108,625 Scalable audio coding/decoding method and apparatus without overlap of information
between various layers
6,094,636 Scalable audio coding/decoding method and apparatus
6,058,361 Two-stage Hierarchical subband coding and decoding system, especially for a
digitized audio signal
5,953,506 Method and apparatus that provides a scalable media delivery system
Keyword: Deltamodulation
6,002,352 Method of sampling, downconverting, and digitizing a bandpass signal using a digital
predictive coder
4,071,825 Adaptive delta modulation system
4,208,740 Adaptive delta modulation system
Keyword: Audio Coding
6,980,933 Coding techniques using estimated spectral magnitude and phase derived from MDCT
coefficients
6,978,236 Efficient spectral envelope coding using variable time/frequency resolution and
time/frequency switching
6,977,877 Compressed audio data reproduction apparatus and compressed audio data
reproducing method
6,975,254 Methods and devices for coding or decoding an audio signal or bit stream
5 6,968,564 Multi-band spectral audio encoding
6,947,886 Scalable compression of audio and other signals
OBJECTS OF THE INVENTION
The primary objective of the invention is the development of a proprietary audio coding technique that is competitive with the existing technologies in terms of the perceived audio quality. The coding technique should also be fast and scalable in terms of the bit rate.
Yet another object of the invention is a High-speed audio compression that does not use an explicit perceptual/psychoacoustic model for quantization.
Still another object of the invention is the introduction of non-uniform delta-modulation as a technique for compressing music/speech signals.
Still another object of the invention is the development of a new technique for generating stereo signals from mono signals.
Still another object of the invention is a High-quality multi-channel surround-sound generation algorithm.
Still another object of the invention is to provide Quantization noise reduction through out-of-band noise shaping, Low complexity encoder without complex bit allocation, Scalable complexity encoder, Streaming music server-multiple client technology configuration using the invention and User-specified metadata incorporation into the audio stream.
STATEMENT OF THE INVENTION
The present invention relates to A method for audio encoding using non-uniform delta modulation comprising steps of: dividing sampled signal in to frames and sub-frames, computing maximum of the signal from each of said sub-frames to obtain temporal envelope, temporal envelop is obtained by convolution, up-sampling the signal after reducing dynamic range of spectrum, computing gain parameters to obtain normalized signal from the up-sampled signal, performing nonlinear mapping on said normalized signal using frequency modulation; and packing binary information of the frequency modulated signal, the temporal envelope parameter and gain parameter into binary file thereafter compressing the file;
A method for decoding a non-uniform delta modulation encoded signal, where the coded signal contains binary information of frequency modulated signal, gain parameters and quantized temporal envelope parameters, said method comprising steps of: decompressing and unpacking the temporal envelope and the gain parameters corresponding to each sub-frame, estimating zero-crossings from the binary information of the frequency modulated signal, demodulating the frequency modulated signal using estimated zero-crossings, filtering the demodulated signal to reject out-of-band noise, down-sampling the filtered signal thereafter restoring spectrum dynamic range, computing time-domain envelope of the restored signal using the temporal envelop and gain parameters and shaping temporal envelope of the signal; A method to create Quasi-stereo sound from monophonic sound comprising steps of: converting mono to stereo, processing each stereo channel with reverberation and head-related transfer functions and feeding the processed output to multi-channel;
A method for audio encoding using non-uniform delta modulation comprising steps of: dividing sampled signal in to frames and sub-frames, computing maximum of the signal from each of said sub-frames to obtain temporal envelope, temporal envelop is obtained by convolution, applying spectral band replication algorithm to the convolved signal after reducing dynamic range of spectrum, computing gain parameters to obtain normalized signal from spectral band replicated
signal, performing nonlinear mapping on said normalized signal using frequency modulation; and packing binary information of the frequency modulated and the temporal envelope and spectral band replication parameters into binary file thereafter compressing the file; A method for decoding a non-uniform delta modulation encoded signal, where the coded signal contains binary information of frequency modulated signal, gain parameters, spectral band replication parameters and quantized temporal envelope parameters, said method comprising steps of: decompressing and unpacking the temporal envelope and gain parameters corresponding to each sub-frame, estimating zero-crossings of the binary information of the frequency modulated signal, demodulating the frequency modulated signal using estimated zero-crossings, filtering the demodulated signal to reject out-of-band noise, applying SBR algorithm to.the filtered signal thereafter restoring spectrum dynamic range, computing time-domain envelope of the restored signal using the temporal envelop and gain parameters and shaping temporal envelope of the signal; and also
A method to create spectral band replication in time-domain comprising steps of: cosine modulation of low pall filtered signal, removing base band by high pass filtering of the cosine modulated signal and replacing the signal by scaled addition.
BRIEF DESCRIPTION OF ACCOMPANYING DRAWINGS
Figure 1 shows Block diagram of the TRANG encoder with fixed noise-shaping filter.
Figure 2 shows Maxima Selection technique for temporal envelope computation.
Figure 3 shows Temporal envelope computation from the sub-frame maxima.
Figure 4 shows Block diagram of the TARANG decoder with fixed noise-shaping and spectrum dynamic range restoration.
Figure 5 shows Block diagram of the TARANG encoder with signal-dependent noise-shaping filter.
Figure 6 shows Block diagram of the TARANG decoder with signal-dependent noise-shaping and spectrum dynamic range restoration.
Figure 7 shows Block diagram of the TRANG-BCC encoder.
Figure 8 shows Block diagram of the TRANG-BCC decoder.
Figure 9 shows Block diagram of the Q-stereo generator.
Figure 10 shows Multi-channel audio listening setup for TARANG Q-stereo surround music.
Figure 11 shows Mono to Q-stereo conversion and Q-stereo to stereo conversion.
Figure 12 shows The TARANG-QS encoder: TARANG Quasi-stereo surround system encoder for audio signals.
Figure 13 shows The TARANG-QS decoder: TARANG Quasi-stereo surround system decoder for audio signals.
Figure 14 shows The TARANG-SBR encoder: TARANG with spectrum band replication.
Figure 15 shows The TARANG-SBR decoder: TARANG with spectrum band replication.
Figure 16 shows The TARANG-QS-SBR encoder: TARANG with spectrum band replication.
Figure 17 shows The TARANG-QS-SBR decoder: TARANG with spectrum band replication and quasi-stereo surround listening.
Figure 18 shows To illustrate linear interpolation of the autoregressive filter coefficients.
Figure 19 shows (a) Time-varying analysis filter and (b) Time-varying synthesis filter.
Figure 20 shows To illustrate temporal envelope computation.
Figure 21 shows The spectra of the message signal and the FM carrier.
Figure 22 shows A frequency-modulated carrier signal.
Figure 23 shows The sign (+1/-1) of the frequency-modulated carrier signal.
Figure 24 shows The actual zero-crossing and the estimated zero-crossing.
Figure 25 shows To illustrate the quantization error between the actual zero-crossing and the quantized zero-crossing.
Figure 26 shows To illustrate the principle of Spectrum Band Replication (SBR); (a) Full spectrum, (b) Spectrum segment encoded, (c) Generation of high-frequency band at the decoder, (d) Adjustment of spectral envelope to full spectrum.
Figure 27: shows TARANG-SBR encoder: TARANG encoder modified for spectrum band replication (SBR) at the decoder. The WBFM encoder is the same as the TARANG encoder.
Figure 28 shows Spectrum plots at different stages of the TARANG encoder with SBR./denotes frequency. X (/), R(/) and Rj(/) are the Fourier transforms of x (ri), r (ri) and ri (/^respectively. See Figure 27 to understand the signals denoted by x (ri), r (ri) and ri (ri).
Figure 29 shows TARANG-SBR decoder: TARANG with spectrum band replication at the decoder. The de-emphasis filter is the time-varying synthesis filter (inverse of the time-varying analysis filter at the encoder). The de-emphasis filter restores the spectral envelope of the signal (see Figure 19).
Figure 30 shows Spectrum plots at different stages of the TRANG decoder with SBR. (a) WBFM decoder output signal, (b) Spectrum of the modulated signal, (c) highpass filter output, (d) band
replication, (e) Spectrum after spectrum envelope restoration./denotes frequency. Rx(f)9 iu(/),
A A t\ ^
R-h (/)' ^ (/)> % if) are ^e Fourier transforms of rt (ri), ft (ri), fh (ri), f (ri) and x (ri) respectively. See Figure 29 to understand the signals denoted by rt (ri), ft (n\ rh (ri), rt (ri) and jc (ri).
Figure 31 shows real-time system for audio encoding/decoding of signal.
DETAILED DESCRIPTION OF THE INVENTION
The primary embodiment of the invention is a method for audio encoding using non-uniform delta modulation comprising steps of: dividing sampled signal in to frames and sub-frames, computing maximum of the signal from each of said sub-frames to obtain temporal envelope, temporal envelop is obtained by convolution, up-sampling the signal after reducing dynamic range of spectrum, computing gain parameters to obtain normalized signal from the up-sampled signal, performing nonlinear mapping on said normalized signal using frequency modulation; and packing binary information of the frequency modulated signal, the temporal envelope parameter and gain parameter into binary file thereafter compressing the file.
In yet another embodiment of the present invention is the sampling is done at 44.1 kHz with each frame having 1088 samples wherein number of sub-frames is 32.
In still another embodiment of the present invention is normalized-sum window is used for the convolution.
In still another embodiment of the present invention is the dynamic range of the spectrum is reduced by using a fixed order auto-regressive model.
In still another embodiment of the present invention is the parameters are obtained such that normalized signal takes values in the range 0-0.45.
In still another embodiment of the present invention is the integer value of interpolating factor for CD quality signal input during up-sampling is greater than or equal to 2.
In still another embodiment of the present invention is the frequency modulation is wide band.
In still another embodiment of the present invention is the frequency modulated signal is binary amplitude limited.
In still another embodiment of the present invention is retaining zero-crossing information of the frequency modulated signal.
In still another embodiment of the present invention is the file is losslessly compressed using Lempel-Ziv compression algorithm.
In still another embodiment of the present invention is compression factor is 1.4-1.7.
In still another embodiment of the present invention is a method for decoding a non-uniform delta modulation encoded signal, where the coded signal contains binary information of frequency modulated signal, gain parameters and quantized temporal envelope parameters, said method comprising steps of: decompressing and unpacking the temporal envelope and the gain parameters corresponding to each sub-frame, estimating zero-crossings from the binary information of the frequency modulated signal, demodulating the frequency modulated signal using estimated zero-crossings, filtering the demodulated signal to reject out-of-band noise, down-sampling the filtered signal thereafter restoring spectrum dynamic range, computing time-
domain envelope of the restored signal using the temporal envelop and gain parameters and shaping temporal envelope of the signal.
In still another embodiment of the present invention is the restoring spectrum dynamic range is by inverse of autoregressive IIR filter used at the encoder with quantized spectrum envelop parameters.
In still another embodiment of the present invention is a method to create Quasi-stereo sound from monophonic sound comprising steps of: converting mono to stereo, processing each stereo channel with reverberation and head-related transfer functions and feeding the processed output to multi-channel.
In still another embodiment of the present invention is the multi-channel is 4 or 2, loudspeaker or headphone.
In still another embodiment of the present invention is the encoding of the quasi-stereo is achieved.
In still another embodiment of the present invention is the decoding of the quasi-stereo is
achieved.
In still another embodiment of the present invention is a method for audio encoding using nonuniform delta modulation comprising steps of: dividing sampled signal in to frames and sub-frames, computing maximum of the signal from each of said sub-frames to obtain temporal envelope, temporal envelop is obtained by convolution, applying spectral band replication algorithm to the convolved signal after reducing dynamic range of spectrum, computing gain parameters to obtain normalized signal from spectral band replicated signal, performing nonlinear mapping on said normalized signal using frequency modulation; and packing binary information of the frequency modulated and the temporal envelope and spectral band replication parameters into binary file thereafter compressing the file.
In still another embodiment of the present invention is the sampling is done at 44.1 kHz with each frame having 1088 samples wherein number of sub-frames is 32.
In still another embodiment of the present invention is normalized-sum window is used for the convolution.
In still another embodiment of the present invention is the dynamic range of the spectrum is reduced by using a fixed-order auto-regressive model.
In still another embodiment of the present invention is the parameters are obtained such that normalized signal takes values in the range 0-0.45.
In still another embodiment of the present invention is the frequency modulation is wide band.
In still another embodiment of the present invention is the frequency modulated signal is binary amplitude limited.
In still another embodiment of the present invention is retaining zero-crossing information of the frequency modulated signal.
In still another embodiment of the present invention is the file is losslessly compressed using Lempel-Ziv compression algorithm.
In still another embodiment of the present invention is the total compression factor is about 2.8-
3.
In still another embodiment of the present invention is compression factor is about 5.6-6.
In still another embodiment of the present invention is a method for decoding a non-uniform delta modulation encoded signal, where the coded signal contains binary information of frequency modulated signal, gain parameters, spectral band replication parameters and quantized temporal envelope parameters, said method comprising steps of: decompressing and unpacking the temporal envelope and gain parameters corresponding to each sub-frame, estimating zero-crossings of the binary information of the frequency modulated signal, demodulating the frequency modulated signal using estimated zero-crossings, filtering the demodulated signal to reject out-of-band noise, applying SBR algorithm to the filtered signal thereafter restoring
spectrum dynamic range, computing time-domain envelope of the restored signal using the temporal envelop and gain parameters and shaping temporal envelope of the signal.
In still another embodiment of the present invention is the restoring spectrum dynamic range is by inverse of autoregressive IIR filter used at the encoder with quantized spectrum envelop parameters.
In still another embodiment of the present invention is the audio encoding/decoding is used with binaural cue coding.
In still another embodiment of the present invention is a method to create spectral band replication in time-domain comprising steps of: cosine modulation of low pall filtered signal, removing base band by high pass filtering of the cosine modulated signal and replacing the signal by scaled addition.
In still another embodiment of the present invention is a system for audio encoding using nonuniform delta modulation comprising steps of: means to divide sampled signal in to frames and sub-frames, means to compute maxima of the signal from each of said sub-frames to obtain temporal envelope, means to obtain temporal envelop by convolution, means to up-sample the signal after reducing dynamic range of spectrum, means to compute gain parameters to obtain normalized signal from the up-sampled signal, means to perform nonlinear mapping on said normalized signal using frequency modulation; and means to pack binary information of the frequency modulated signal, the temporal envelope parameter and gain parameter into binary file thereafter compressing the file.
In still another embodiment of the present invention is a system for decoding a non-uniform delta modulation encoded signal, where the coded signal contains binary information of frequency modulated signal, gain parameters and quantized temporal envelope parameters, said method comprising steps of: means to decompress and unpack the temporal envelope and the gain parameters corresponding to each sub-frame, means to estimate zero-crossings from the binary information of the frequency modulated signal, means to demodulate the frequency modulated signal using estimated zero-crossings, means to filter the demodulated signal to reject out-of-band noise, means to down-sample the filtered signal thereafter restoring spectrum dynamic
range, means to compute time-domain envelope of the restored signal using the temporal envelop and gain parameters and means to shape temporal envelope of the signal.
In still another embodiment of the present invention is a system to create Quasi-stereo sound from monophonic sound comprising steps of: means to convert mono to stereo, means to process each stereo channel with reverberation and head-related transfer functions and means to feed the processed output to multi-channel.
In still another embodiment of the present invention is a system for audio encoding using nonuniform delta modulation comprising steps of: means to divide sampled signal in to frames and sub-frames, means to compute maximum of the signal from each of said sub-frames to obtain temporal envelope, means to obtain temporal envelop by convolution, means to apply spectral band replication algorithm to the convolved signal after reducing dynamic range of spectrum, means to compute gain parameters to obtain normalized signal from spectral band replicated signal, means to perform nonlinear mapping on said normalized signal using frequency modulation; and means to pack binary information of the frequency modulated signal, the temporal envelope parameter and gain parameter into binary file thereafter compressing the file.
In still another embodiment of the present invention is a system for decoding a non-uniform delta modulation encoded signal, where the coded signal contains binary information of frequency modulated signal, gain parameters, spectral band replication parameters and quantized temporal envelope parameters, said method comprising steps of: means to decompress and unpack the temporal envelope and gain parameters corresponding to each sub-frame, means to estimate zero-crossings of the binary information of the frequency modulated signal, means to demodulate the frequency modulated signal using estimated zero-crossings, means to filter the demodulated signal to reject out-of-band noise, means to apply SBR algorithm to the filtered signal thereafter restoring spectrum dynamic range, means to compute time-domain envelope of the restored signal using the temporal envelop and gain parameters and means to shape temporal envelope of the signal.
In still another embodiment of the present invention is a system to create spectral band replication in time-domain comprising steps of: means to cosine modulation of low pall filtered
signal, means to remove base band by high pass filtering of the cosine modulated signal and means to replace the signal by scaled addition.
List of abbreviations:
AR - auto-regressive
BCC - Binaural Cue Coding
CD - compact disc
dB - decibels (unit for measuring sound intensity)
FM - Frequency Modulation
GUI - graphical user interface
HRTF - Head-related Transfer Function
L - Left channel
Ls - Surround left channel
L+R - Sum of left and right channels
L-R - difference of left and right channels
M - mid-channel (L+R)
PWM - pulse width modulation
QS - Quasi-Stereo
R - Right channel
Rs - Surround right channel
SBR - Spectrum Band Replication
TARANG - Name of the ESQUBE proprietary audio coding solution (TARANG is NOT an
acronym)
WBFM - Wideband frequency modulation
TARANG with fixed noise shaping filter
In Figure. 1, we show the block diagram of the novel algorithm for the TARANG system of encoding music. The basic TARANG technique takes a single channel input. The input is divided into frames of 1088 samples at 44.1 kHz sampling rate (CD quality music). The choice of the frame size is not crucial. However, we may note that too long frames will result in encoding delay. Too short a frame increases the side information.
Encoder
Block 1.1: Temporal envelope computation block. The time-domain envelope of the frame is computed using a novel approach described in Figures 2 and 3. The input signal is divided into M (typically 32) sub-frames. The maximum of the signal is computed in each sub-frame. The M maxima are the temporal envelope parameters input to Block 1.6 by Block 1.1. The temporal envelope of the frame is obtained by replicating the maximum value over the sub-frame. This yields a staircase approximation to the actual temporal envelope. The staircase waveform thus obtained is smoothed by convolving with a normalized-sum (equal to unity) window. The resultant is a smoothed estimate of the temporal envelope of the frame (see Figure 20 and Section 8.3).
The need for incorporating the temporal envelope arises from the need to preserve temporal dynamics of transient signals (for preserving sharpness of the attacks, if any) accurately without smearing.
Block 1.2: The signals (music/speech) typically have a spectrum dynamic range of 40-50 dB which can even extend to 70dB for some audio signals. The dynamic range of the spectrum is reduced by using a standard auto-regressive model (see Reference [8]). A variant of this technique to incorporate a forward-adaptive autoregressive model is shown in Figure.5.
Block 1.3: The signal with a reduced dynamic range is up-sampled before further processing. We have experimentally found that the minimum value of the interpolating factor for CD quality signal input is 2.
Block 1.4: The maximum value and the mean value of the up-sampled and reduced spectrum dynamic range signal are the gain parameters conveyed to Block 1.6 for further processing. The parameters are obtained such that the normalized signal takes values in the range [0, 0.45]. Thus, the normalized signal is always positive.
The normalized signal is input to Block 1.5 where in lies the crux of the TARANG technique.
Block 1.5: The basic principle of the technique is wideband frequency modulation (see Section 8.4 for the signal processing details. Also consult references [7, 11, and 12]). The positive signal input is treated as a frequency waveform that modulates a digital carrier of normalized frequency 0.25. The output of the frequency modulator is hard-limited to binary amplitude (+1/-1). Only one bit is need to represent as a binary signal. Thus, only the real zero-crossings of the frequency modulated carrier are retained in the process. The zero-crossings contain the modulating signal information. The sequence of operations is nonlinear in nature and hence the block is called the nonlinear mapping block.
Block 1.6: The information that is required for recovering the signal at the receiver/ decoder comprises of the following:
(a) Main information: Binary information of the frequency modulated carrier.
(b) Side information: Quantized temporal envelope parameters (sub-frame maxima.
The main information and side information bits are packed into a binary file. Block 1.7: The binary file is subjected to lossless compression. We used the Lempel- Ziv compression algorithm (see references [6, 9]). The typical compression factor that can be achieved using the LZW algorithm is about 1.4-1.5.
The resulting output is called the TARANG format audio file. The bit patterns and the parameters can be understood only by a compatible version of the TARANG decoder.
In Figure.4, we show the decoder (discussed below) for the encoder in Figure. 1.
Decoder
Block 4.1: The bits in the TARANG format audio file are unpacked. The temporal envelope and gain parameters corresponding to each frame are taken separately and passed to appropriate blocks i.e. the temporal envelope parameters are passed to the temporal envelope restoration block (Block 4.7). The gain parameters are passed to the normalizing unit within this block.
Block 4.2: The zero-crossings of the binary-amplitude-limited carrier wave are estimated by noting the sign changes between successive samples.
Block 4.3: The zero-crossings are used to demodulate the frequency modulated signal. This is termed inverse non-linear mapping.
Block 4.4: .The signal from Block 4.3 is low-pass filtered to reject out-of-band noise.
Block 4.5: The low-pass filtered signal is down sampled by the same factor with which it was interpolated at the encoder (typically 2).
Block 4.6: The down-sampled signal is passed through the spectrum dynamic range restoration block. This is nothing but an IIR filtering of the signal using the autoregressive model coefficients used at the encoder.
Block 4.7: The temporal envelope restoration block computes the time-domain envelope as depicted in Figure.3.
The spectrum dynamic range is restored by the inverse of the autoregressive filter used at the encoder. The effect of this filter is to shape the quantization noise in the frequency domain whereas the temporal dynamic range restoration filter shapes the quantization noise in the time domain.
TARANG with adaptive noise-shaping
We also propose a variation of the above encoding-decoding system which contains a signal-dependent spectrum dynamic range restoration filter. This is achieved by using an AR model for every frame. Typically, we use a 20th-order model. The associated encoder and decoder block diagrams are shown in Figures 5 and 6 respectively.
Except the newly incorporated adaptive autoregressive model, the structure of the encoder and decoder is the same as that in Figures 1 and 2. The spectrum envelope parameters are quantized
and conveyed to Block 5.6 for purpose of transmission and spectrum dynamic range restoration at the decoder. Updating the spectrum envelope parameters in every frame requires high side-information (the AR parameters). The updating can be done once in every 5 or 6 frames to reduce the side-information by the same factor. Typically, in many coding techniques, adaptive AR models call for overlap between successive frames. The overlap increases the bit rate. This is avoided by using a time-varying linearly interpolated analysis and synthesis filter implementation. Essentially, this involves linear interpolation of the AR filter coefficients between successive frames that do not overlap. The linear interpolation of coefficients for time-varying filter implementation is depicted pictorially in Figure 18. Also see Figure 19 and Section 8.1.
TARANG-BCC: TARANG with binaural cue coding (BCC)
In Sections 1 and 2, only single channel encoding/decoding techniques have been presented. Efficient two-channel encoding can be achieved by deriving a single channel signal from the stereo signal and binaural cues from the two channel signal. Using these, the stereo image of the sound can be built at the decoder. This can also be extended to multiple channels (typically five). The binaural cue estimation and quantization can be done by using the techniques described in References [1, 2, 3]. The single channel can be encoded by using the TARANG encoder described in Figures 1 and 5. The block diagram of the TARANG-BCC encoding system is shown in Figure.7. The mixing algorithm is essentially any technique that takes the left (L) and the right (R) channels and provides a composite output. The simplest and most commonly used technique is to add the L and R channels to produce the L+R signal. The L+R operation cancels out the out-of-phase components in L and R. Only the in-phase components are retained. The corresponding decoder is shown in Figure 8.
TARANG - BCC encoding mechanism is a novel way of encoding the down-mixed channel along with binaural cues to regenerate the stereo image at the decoder.
TARANG-QS: TARANG with quasi-stereo
For higher compression application or when the original signal itself is monophonic, it is possible to create quasi-stereo, i.e. we can artificially synthesize the stereo image from the single channel instead of quantizing two channels at the encoder. The synthesized stereo image would be different from that of the original. The created stereo is referred to as Quasi-stereo (Q-stereo). The algorithm for Q-stereo is ESQUBE-proprietary and is shown in the form of a block diagram in Figure.9.
The technical details can be found in reference [10].
The quasi-left and quasi-right channels can be processed with reverberation and head-related transfer functions to spatially spread the sound image. The resulting effect for a listener is that of immersion in a rich sound field. The four channel output is fed to a multi-channel listening setup as shown in Figure. 10. The subwoofer is used to enhance the low frequency components since they require higher intensities for adding richness to the perceived quality. The input to the woofer can be derived by low-pass filtering (j 1 kHz) the L+R channel that is input to the Quasi-stereo generator.
In the absence of a multi-channel listening setup at the consumer end, the four channels can be coalesced to two channels as shown in Figure.ll. The TARANG encode-decode system for Q-stereo listening is shown in Figures 12 and 13 respectively. The advantages of the TARANG-QS system are that it doubles the compression ratio achieved by the TARANG system from R to 2R. The TARANG-BCC system yields a factor that is greater than R but less than 2R because the binaural cues have to be encoded and transmitted for which additional bits are required. A typical value of R is 8.
TARANG-SBR: TARANG with spectrum band replication
A technique that became popular in the recent past to enhance the performance of perceptual audio coders is that of spectral extrapolation [Prior Arts 1 and 2]. The spectrum band replication (SBR) technique essentially captures the redundancy in the spectral envelope. The long-term periodicities in the signal manifest in both low-frequency and high-frequency bands of the signal. One of the features of this invention is to show the utility of the SBR technique for non-perceptual coders. It has been mentioned in [Prior Art 2] that the SBR technique can be applied
to any existing perceptual coding solution but there exists no prior art showing its applicability to non-perceptual coding techniques. The present invention is the first instance demonstrating the application of SBR to non-perceptual coders. Also, since the TARANG coding mechanism operates in the time domain, a novel implementation of spectrum band replication is proposed. The signal processing elements of SBR implementation in TARANG are explained in Section 10 and illustrated in Figures 26, 27, 28, 29 and 30.
The TARANG encoder-decoder system with SBR incorporated is shown in Figures 14 and 15. We note that the TARANG-SBR system of coding audio will yield compression factors greater than R but less than 2R where R is the compression ratio with the TARANG system alone.
TARANG hybrid systems
At this point, it is not hard to imagine a combination of the several techniques that we have at hand to devise a novel technique for compression that can yield compression factors up to 4R where R is the compression factor offered by the basic TARANG codec. For example, TARANG-BCC can be combined with SBR to achieve compression factors close to 4R (but less than 4R). The combination is denoted by TARANG-BCC-SBR. Similarly, TARANG-QS can also be combined with SBR to achieve compression equal to 4R. The combination is denoted by TARANG-QS-SBR. For R typically in the range 7-8, the net compression factor thus achieved is about 30. The block diagram of the encoder and decoder is shown in Figures 16 and 17. The TARANG-QS-SBR decoder in Figure 17 is shown for multi-channel listening.
Metadata
We can also incorporate metadata into the TARANG audio files. The novelty in our technique lies in the fact that the metadata can be entered and manipulated by the user unlike the standard encoder-specified metadata. In several applications, the user would like to incorporate his/her description of the audio content. After the first few bytes of TARANG file header information, we allow 1024 bytes of text data. The TARANG graphical user interface (GUI) developed by M/s. ESQUBE Communication. Solutions Pvt. Ltd. displays the content of the text (1024 byte capacity) in a separate window. The user has the option to edit the text, to add or delete a
descriptor and save the description. When the user switches to a next TARANG file to play or quits the GUI, the TARANG file is saved with the update metadata.
The advantages of the metadata update feature are many:
(a) To add descriptors to the TARANG files. The user can avail the option to add any kind of personalized information about music.
(b) The metadata can be used to store ownership and distributor information.
(c) The TARANG GUI system can be sold with a username and password. The music files that are purchased by one user can be prevented from being played by another TARANG subscriber who has not purchased the same music. This is a very high-level of secure access to personalized/proprietary distribution of music files. For this purpose, the segment of the metadata that contains the ownership information may be encrypted.
(d) The metadata can contain the name of the singer, music composer etc. The TARANG GUI can be modified to feature an option to search the metadata in the TARANG files. This will enable the user to segregate or group music files depending on the singer, music composer etc. This feature is lacking in the state-of-the art music players/managers.
(e) In a client-server configuration for internet-enabled music, we can use the metadata to search for a specific song by using any of its descriptors.
The signal processing elements
We describe the various signal processing blocks that make up TARANG and its variants below. We give the signal processing models and equations that govern the encoding and decoding mechanism.
Encoder
The various elements of the encoder are explained below. 8.1 Time-varying analysis and synthesis filter
Let x(n) be the discrete time sequence to be compressed. Let N be the analysis frame length. The
ith frame of x(n) is given by {x[n], iN + 1 _ n _ (i + 1)N} .
The (i + l)st frame of x(n) is given by {x[n], (i + 1)N + 1< n< (i + 2)N}.
Let a'k9 1< k > p be the optimum p* order autoregressive model coefficients for the frame.
Similarly, let a'k+l9 k , 1 < k > p be the optimum pth order autoregressive model coefficients for the (i+ If frame.
The prediction residue corresponding to the i frame is computed as the output of the linear time-varying filter as follows:
The coefficients { a\ (n), 1 < k > p} are obtained by the linear interpolation of
{a\, 1 < k > p} and {alk+l p} (see Figure 18). Mathematically, we can write as:
Note that the time-varying filter coefficients are not optimized directly but are computed by linear interpolation of the coefficients in two successive frames. Our experiments showed that this is a computationally efficient way of designing time varying autoregressive filters. The time-varying analysis filter takes x(n) as input and gives r(n) as the output. The corresponding time-varying synthesis filter takes r(n) as input and gives x(n) as the output. In the absence of quantization of the residue, the time-varying analysis filter given above and its equivalent time-varying synthesis filter form a perfect-reconstruction pair. The coefficient interpolation is
illustrated in Figure 18. The time-varying analysis and synthesis filter block diagrams are shown in Figure 19.
Transient detection
If a particular signal frame is transient, we need to preserve its temporal structure too. Towards this we need to devise a simple test for detecting transient frames. The frame is divided into M sub-frames. Let the maximum of |r(n)| in each sub-frame be denoted by {me, 0M-1}. The number of sub-frames is best determined experimentally. In our experiments, we found that at a sampling rate of 44.1 kHz, and a frame size of 1024 samples, M = 32 sub-frames proved to be satisfactory. We perform transient detection by computing the ratio of the maximum of to the minimum i.e., we compute the max-to-min ratio:
If the above ratio is greater than a preset threshold, then we consider the frame as transient. Otherwise, we declare it as a non-transient frame. The threshold of detection has to be determined experimentally. Typically, a threshold value of 20dB (on the logarithmic scale) was found to be an optimum value. Increasing the threshold value will enable us to detect only frames with large temporal dynamic range. If the frame is detected to be a transient frame, we need to compute and transmit the temporal envelope parameters. The technique for temporal envelope computation is described next.
Temporal envelope computation
The temporal envelope is an important attribute in the temporal structure of the signal. To compute the temporal envelope, we propose a novel and computationally efficient technique that uses the temporal maxima as landmark points. As in Section 8.2, the frame is divided into M sub-frames. The sub-frame maximum value is replicated N/M times which is the sub-frame length. Repeating this for every sub-frame gives us a new sequence q(n). We recognize this sequence as the piecewise constant approximation to the envelope of the frame.
We smooth the piecewise constant envelone hv convolving with a Hamming window w(n)„ normalized to have unity sum i.e..
n
Stating mathematically, let u(n) denote the unit step function. Define the rectangular window
The rectangular window is of length N/M and is needed for the definition of the piecewise constant envelope. The piecewise constant envelope is given by
The length of the smoothing Hamming window has to be determined experimentally. Typically a value of half the sub-frame size was found to be satisfactory. The temporal envelope e(n) is obtained by convolution of q(n) and w(n):
e(n) = w(n) * q(n) (6)
An example staircase envelope and the smoothed envelope are shown in Figure 20. In the literature, a standard technique to compute the temporal envelope is spectrum-domain linear prediction (see reference [5]). Spectrum-domain linear prediction is the dual of time-domain linear prediction (see reference [8]). The technique proposed above is novel and computationally efficient.
Next, we normalize the signal in the frame by the temporal envelope as follows:
We note that the normalized residue r(n) that is obtained from each frame is the residue after spectral and temporal envelope normalization. The residue is quantized using the novel wideband frequency modulation technique.
Wideband frequency modulation technique
If 7 (n) is used to modulate the frequency of the carrier signal c(n) = sin(27tfcn), the information about 7 (n) will be contained in the zero-crossings of the modulated carrier (see references [4, 7, 11, 12]). To recover 7 (n) from the modulated carrier, its bandwidth must be much lower than the carrier frequency fc. This is referred to as narrowband frequency modulation (FM). However, we consider wideband FM in which the bandwidth of 7 (n) is comparable to fc. Wideband FM is realized by upsampling 7 (n) by a factor I, to yield the signal 7 (n/I). The carrier frequency is chosen as the mid-point of the normalized carrier frequency range [0, 0.5]. By selecting the carrier frequency in this manner, we can ensure symmetric positive and negative frequency swing. On the normalized scale, the signal 7 (n/I) has the bandwidth 0.5/1. For 7 (n/I) to play the role of the instantaneous frequency (IF) of the carrier, it must be mean-shifted and scaled such that the resulting IF always lies between 0 and 0.5 (on normalized frequency scale). This is shown in Figure 21. The instantaneous frequency of the frequency-modulated carrier is given by
where i denotes the frame index, a is the frequency modulation scale factor; px is the (direct current) DC offset term. The discrete-time approximation to the instantaneous phase is given by
th i
The frequency-modulated carrier signal of the i frame is given by sin {(f> (n)). An example is shown in Figure 22.
Zero-crossings of the wideband FM signal
From the classical theory of frequency modulation, we know that the zero-crossings of the frequency modulated carrier signal contain information about the underlying message signal. In
principle, with the knowledge of exact zero-crossings, and appropriate a prior information (such as bandlimitedness) about the message signal, one can compute the underlying message. However, from the point of view of compression, it suffices to give approximate zero-crossing information to the receiver (or decoder). We achieve this in a very simple manner, by a hard-threshold operation (achieved by using a binary-valued amplitude limiter) on the frequency modulated signal. To preserve the zero-crossing information, we use an amplitude limiter that outputs the sign of the input (+1 or -1 according as the input is positive or negative). The input to the amplitude limiter is sin(^' (n)) and the output is its sign. Denote the sign by b'(n) . An
example is shown in Figure 23. Thus, we see that the amplitude limiter converts the frequency-modulated sine wave to a square wave. The square wave has the same zero-crossings as the input FM signal. The sources of error in the zero-crossing are due to amplitude limiting (FM signal to square wave conversion) and sampling. The exact and quantized zero-crossings are illustrated in Figures 24 and 25 respectively. Note that the quantization of zero-crossing information has been achieved indirectly by quantizing the frequency modulated carrier using its sign information.
Encoder bit stream
The set of parameters that represent (compressed representation) the input frame {x (n), iN + 1 < n > (i + 1)N} are the following:
(a) b1 (n): the binary sequence (the same as the square wave sequence),
(b) {a1, p1}: the scale and DC parameters, collectively referred to as the gain parameters,
(c) a\, 1 < k < p: the autoregressive model parameters
(d) Transient/non-transient flag
(e) a conditional field which exists if (d) above is a transient flag. If the frame is transient, then the sub-frame maxima {m^, 1 < I > M} are also included in addition to (a)-(d) above.
The above parameters are written into a file on a frame-by-frame basis, until all the input signal frames have been processed. To this file, at the start, we also need to write appropriate header information to enable proper decoding and play out. The header contains information about the original audio such as the sampling frequency, number of audio channels, and the information needed for decoding such as the number of frames, the frame size etc. The resulting file is subjected to lossless compression using the LZW technique to achieve further reduction in bit
rate. To the compressed file, we add header information regarding the TARANG version number, user license information and the LZW look-up table. The resulting file is called the TARANG format file.
Decoder
The LZW compressed file is restored using the LZW decoder. Now, we need to decode each frame and reconstruct the signal. Since we have only quantized information, we can only reconstruct an approximation to the signal and not the signal itself. We use the hat on the signals to denote that the reconstruction is not error-free.
FM signal demodulation
The quantized zero-crossing information of bi(n) , along with the scale factors must be used to reconstruct the signal {x(n), iN+1 < n < (i+l)N} . The reconstructed signal is denoted by {x(n),
iN +1 < n < (i+l)N}. To describe the reconstruction of the sequence, we resort to the continuous-time signal representation for ease of analysis. We also drop the frame indices in the interest of brevity of notation.
Consider the frequency modulation signa The instantaneous frequency
is given by f{t)~fc+P + a[7{r)dr where 7{t) is the message signal. The instantaneous
phase is given hy{t) -2n if(t)dr. Let tj and tj+i be two successive zero-crossings of the
signal s(t). We can write$>(*,) =i;r and 0(f,)=(i + I)**"- Using the expression for $(t)above, we can write:
Subtracting the first equation above from the second one and re-arranging the terms, we can write
Therefore is a measure of the average value of r(t) over the interval (ti, tj+i). We can
thus obtain an average-value approximation over every zero-crossing interval. Thus, we have a staircase approximation to the signal 7(t). The staircase approximation can be filtered to yield a
bandlimited estimate of~F(t). For more details of wideband FM signal demodulation, refer to
[12] and [13]. In the discrete-time case, we compute the zero-crossing from b*(n). A change of sign between two successive samples of b'(n) indicates the presence of a zero-crossing between them. Assume that the two samples b!(m) and b!(m+l) have opposite sign. This indicates the presence of a zero-crossing in the interval {m, m+1}. The zero-crossing can be approximately computed as m + Vi. Since the zero-crossing tj and tj+i have been computed approximately the
fraction cannot be computed exactly. Hence, we can compute an approximately to
r(n/I). From the approximation, we can compute 7{ri) approximately. Let us denote it by rfy).
Computation of r(n)
The sequence x{ri) for the frame is computed according to the interpolated time varying synthesis filter model (see Figure 19 and Section 8.1).
Incorporation of spectrum band replication
The spectrum band replication (SBR) technique proposed by Lans Lilgeryd et. al. (Prior Arts 1 and 2) is a transform domain technique. The principle behind SBR is that the upper half band of frequencies can be generated by the lower half band of frequencies to generate a full bandwidth signal. The resulting full bandwidth signal sounds richer than the signal obtained by retaining the lower half alone. The SBR procedure is illustrated in Figure 26. Thus, one can
obtain about 50% reduction in bit rate for nearly the same quality, since the upper half band need not be encoded.
In a typical transform coder, the spectrum envelope parameters are quantized separately. However, in the WBFM codec, the LP parameters are used to represent the envelope. Since the wideband frequency modulation (WBFM) technique is a time-domain technique, we develop a time domain technique for spectrum band replication. The block diagram of the encoder to be compatible with a SBR decoder is shown in Figure 27. The generation of high-frequency band from the low frequency band is achieved by modulating with a cosine and subsequent highpass filtering. The de-emphasis filter is the time-varying synthesis filter (inverse of the time-varying AR filter at the encoder, see Figure 19). It restores the spectral envelope of the signal. The cosine modulation is a time-domain technique to implement spectrum replication. In many coders that implement SBR (such as the AAC Plus) the spectrum replication is done in the transform domain. The spectra of the signal at various points in the encoder are illustrated in Figure 28. For the sake of illustration, a stylized spectrum has been chosen. The SBR decoder for the compatible TARANG encoder is shown in Figure 29. Figure 30 shows the spectrum of the signal at various points in the TARANG-SBR decoder.
Figure 31 describes a real-time system for implementing the present invention in which A/D converters the analog signal into discrete samples which are then buffered to realize frames and sub-frames. The sampling rate could be at the rate of 44.1 Khz. The CPU performs the envelop detection, spectral dynamic range reduction, normalization, non-linear mapping using wideband frequency modulation. The CPU also helps to determine maxima of each sub-frame. It also provides the required parameters to decode the signal at the decoder. Information of the frequency modulated signal, temporal envelope parameters and gain parameters are given out for further compression. Further compression of the parameters is achieved by Lempel-Ziv encoder. Lempel-Ziv decoder is used at the receiving end to decode and get Information of the frequency modulated signal, temporal envelope parameters and gain parameters. CPU does the unpacking of the signal which is packed at the encoder. It estimates the zero-crossing of the binary information to demodulate the frequency modulated signal. This is used to reproduce the signal. D/A converter convert the digital signal back to analog.
We claim:
1. A method for audio encoding using non-uniform delta modulation comprising steps
of:
a) dividing sampled signal in to frames and sub-frames,
b) computing maximum of the signal from each of said sub-frames to obtain temporal envelope,
c) temporal envelop is obtained by convolution,
d) up-sampling the signal after reducing dynamic range of spectrum,
e) computing gain parameters to obtain normalized signal from the up-sampled signal,
f) performing nonlinear mapping on said normalized signal using frequency modulation; and
g) packing binary information of the frequency modulated signal, the temporal envelope parameter and gain parameter into binary file thereafter compressing the file.
2. The method as claimed in claim 1, wherein the sampling is done at 44.1 kHz with each frame having 1088 samples wherein number of sub-frames is 32.
3. The method as claimed in claim 1, wherein normalized-sum window is used for the convolution.
4. The method as claimed in claim 1, wherein the dynamic range of the spectrum is reduced by using a fixed order auto-regressive model.
5. The method as claimed in claim 1, wherein the parameters are obtained such that normalized signal takes values in the range 0-0.45.
6. The method as claimed in claim 1, wherein the integer value of interpolating factor for CD quality signal input during up-sampling is greater than or equal to 2.
7. The method as claimed in claim 1, wherein the frequency modulation is wide band.
8. The method as claimed in claim 1, wherein the frequency modulated signal is binary amplitude limited.
9. The method as claimed in claims 1 and 8, wherein retaining zero-crossing information of the frequency modulated signal.
10. The method as claimed in claim 1, wherein the file is losslessly compressed using Lempel-Ziv compression algorithm.
11. The method as claimed in claims 1 and 10, wherein compression factor is 1.4-1.7.
12. A method for decoding a non-uniform delta modulation encoded signal, where the coded signal contains binary information of frequency modulated signal, gain parameters and quantized temporal envelope parameters, said method comprising steps of:
a) decompressing and unpacking the temporal envelope and the gain parameters corresponding to each sub-frame,
b) estimating zero-crossings from the binary information of the frequency modulated signal,
c) demodulating the frequency modulated signal using estimated zero-crossings,
d) filtering the demodulated signal to reject out-of-band noise,
e) down-sampling the filtered signal thereafter restoring spectrum dynamic range,
f) computing time-domain envelope of the restored signal using the temporal envelop and gain parameters and
g) shaping temporal envelope of the signal.
13. The method as claimed in claim 12, wherein the restoring spectrum dynamic range is by inverse of autoregressive IIR filter used at the encoder with quantized spectrum envelop parameters.
14. A method to create Quasi-stereo sound from monophonic sound comprising steps of:
a. converting mono to stereo,
b.processing each stereo channel with reverberation and head-related transfer
functions and c. feeding the processed output to multi-channel.
15. The method as claimed in claim 14 wherein, the multi-channel is 4 or 2, loudspeaker or headphone.
16. The method as claimed in claim 1 and 14, wherein the encoding of the quasi-stereo is achieved.
17. The method as claimed in claim 12 and 14, wherein the decoding of the quasi-stereo is achieved.
18. A method for audio encoding using non-uniform delta modulation comprising steps of:
a) dividing sampled signal in to frames and sub-frames,
b) computing maximum of the signal from each of said sub-frames to obtain temporal envelope,
c) temporal envelop is obtained by convolution,
d) applying spectral band replication algorithm to the convolved signal after reducing dynamic range of spectrum,
e) computing gain parameters to obtain normalized signal from spectral band replicated signal,
f) performing nonlinear mapping on said normalized signal using frequency modulation; and
g) packing binary information of the frequency modulated and the temporal envelope and spectral band replication parameters into binary file thereafter compressing the file.
19. The method as claimed in claim 18, wherein the sampling is done at 44.1 kHz with
each frame having 1088 samples wherein number of sub-frames is 32.
20. The method as claimed in claim 18, wherein normalized-sum window is used for the convolution.
21. The method as claimed in claim 18, wherein the dynamic range of the spectrum is reduced by using a fixed-order auto-regressive model.
22. The method as claimed in claim 18, wherein the parameters are obtained such that normalized signal takes values in the range 0-0.45.
23. The method as claimed in claim 18, wherein the frequency modulation is wide band.
24. The method as claimed in claim 18, wherein the frequency modulated signal is binary amplitude limited.
25. The method as claimed in claim 18 and 24, wherein retaining zero-crossing information of the frequency modulated signal.
26. The method as claimed in claim 18, wherein the file is losslessly compressed using Lempel-Ziv compression algorithm.
27. The method as claimed in claims 18 and 26, wherein the total compression factor is about 2.8-3.
28. The method as claimed in claim 21 and 26, compression factor is about 5.6-6.
29. A method for decoding a non-uniform delta modulation encoded signal, where the coded signal contains binary information of frequency modulated signal, gain parameters, spectral band replication parameters and quantized temporal envelope parameters, said method comprising steps of:
a) decompressing and unpacking the temporal envelope and gain parameters corresponding to each sub-frame,
b) estimating zero-crossings of the binary information of the frequency modulated signal,
c) demodulating the frequency modulated signal using estimated zero-crossings,
d) filtering the demodulated signal to reject out-of-band noise,
e) applying SBR algorithm to the filtered signal thereafter restoring spectrum dynamic range,
f) computing time-domain envelope of the restored signal using the temporal envelop and gain parameters and
g) shaping temporal envelope of the signal.
30. The method as claimed in claim 29, wherein the restoring spectrum dynamic range is by inverse of autoregressive IIR filter used at the encoder with quantized spectrum envelop parameters.
31. The method as claimed in claims 1 or 12, and 18 or 29, wherein the audio encoding/decoding is used with binaural cue coding.
32. A method to create spectral band replication in time-domain comprising steps of:
a. cosine modulation of low pall filtered signal,
b. removing base band by high pass filtering of the cosine modulated signal
and
c. replacing the signal by scaled addition.
33. A system for audio encoding using non-uniform delta modulation comprising steps
of:
a) means to divide sampled signal in to frames and sub-frames,
b) means to compute maxima of the signal from each of said sub-frames to obtain temporal envelope,
c) means to obtain temporal envelop by convolution,
d) means to up-sample the signal after reducing dynamic range of spectrum,
e) means to compute gain parameters to obtain normalized signal from the up-sampled signal,
f) means to perform nonlinear mapping on said normalized signal using frequency modulation; and
g) means to pack binary information of the frequency modulated signal, the temporal envelope parameter and gain parameter into binary file thereafter compressing the file.
34. A system for decoding a non-uniform delta modulation encoded signal, where the
coded signal contains binary information of frequency modulated signal, gain
parameters and quantized temporal envelope parameters, said method comprising
steps of:
a) means to decompress and unpack the temporal envelope and the gain parameters corresponding to each sub-frame,
b) means to estimate zero-crossings from the binary information of the frequency modulated signal,
c) means to demodulate the frequency modulated signal using estimated zero-crossings,
d) means to filter the demodulated signal to reject out-of-band noise,
e) means to down-sample the filtered signal thereafter restoring spectrum dynamic range,
f) means to compute time-domain envelope of the restored signal using the temporal envelop and gain parameters and
g) means to shape temporal envelope of the signal.
35. A system to create Quasi-stereo sound from monophonic sound comprising steps of:
a. means to convert mono to stereo,
b.means to process each stereo channel with reverberation and head-related
transfer functions and c.means to feed the processed output to multi-channel.
36. A system for audio encoding using non-uniform delta modulation comprising steps
of:
a) means to divide sampled signal in to frames and sub-frames,
b) means to compute maximum of the signal from each of said sub-frames to obtain temporal envelope,
c) means to obtain temporal envelop by convolution,
d) means to apply spectral band replication algorithm to the convolved signal after reducing dynamic range of spectrum,
e) means to compute gain parameters to obtain normalized signal from spectral band replicated signal,
f) means to perform nonlinear mapping on said normalized signal using frequency modulation; and
g) means to pack binary information of the frequency modulated signal, the temporal envelope parameter and gain parameter into binary file thereafter compressing the file.
37. A system for decoding a non-uniform delta modulation encoded signal, where the coded signal contains binary information of frequency modulated signal, gain parameters, spectral band replication parameters and quantized temporal envelope parameters, said method comprising steps of:
a) means to decompress and unpack the temporal envelope and gain parameters corresponding to each sub-frame,
b) means to estimate zero-crossings of the binary information of the frequency modulated signal,
c) means to demodulate the frequency modulated signal using estimated zero-crossings,
d) means to filter the demodulated signal to reject out-of-band noise,
e) means to apply SBR algorithm to the filtered signal thereafter restoring spectrum dynamic range,
f) means to compute time-domain envelope of the restored signal using the temporal envelop and gain parameters and
g) means to shape temporal envelope of the signal.
38. A system to create spectral band replication in time-domain comprising steps of:
a. means to cosine modulation of low pall filtered signal,
b. means to remove base band by high pass filtering of the cosine modulated
signal and
c. means to replace the signal by scaled addition.
39. A method and system for audio coding/decoding substantially as herein described
with reference to the accompanying drawings.
/
| # | Name | Date |
|---|---|---|
| 1 | 96-CHE-2006 FORM-18 20-01-2010.pdf | 2010-01-20 |
| 2 | 096-che-2006-form 5.pdf | 2011-09-02 |
| 3 | 096-che-2006-form 3.pdf | 2011-09-02 |
| 4 | 096-che-2006-form 26.pdf | 2011-09-02 |
| 5 | 096-che-2006-form 1.pdf | 2011-09-02 |
| 6 | 096-che-2006-drawings.pdf | 2011-09-02 |
| 7 | 096-che-2006-description(provisional).pdf | 2011-09-02 |
| 8 | 096-che-2006-description(complete).pdf | 2011-09-02 |
| 9 | 096-che-2006-correspondnece-others.pdf | 2011-09-02 |
| 10 | 096-che-2006-claims.pdf | 2011-09-02 |
| 11 | 096-che-2006-abstract.pdf | 2011-09-02 |
| 12 | OTHERS [23-06-2016(online)].pdf | 2016-06-23 |
| 13 | Examination Report Reply Recieved [23-06-2016(online)].pdf | 2016-06-23 |
| 14 | Description(Complete) [23-06-2016(online)].pdf | 2016-06-23 |
| 15 | Claims [23-06-2016(online)].pdf | 2016-06-23 |
| 16 | Abstract [23-06-2016(online)].pdf | 2016-06-23 |
| 17 | 96-CHE-2006-AbandonedLetter.pdf | 2017-07-04 |
| 18 | 96-CHE-2006-HearingNoticeLetter.pdf | 2017-08-10 |
| 19 | 96-CHE-2006-REQUEST FOR ADJOURNMENT OF HEARING UNDER RULE 129A [05-09-2017(online)].pdf | 2017-09-05 |
| 20 | 96-che-2006-ExtendedHearingNoticeLetter_17Oct2017.pdf | 2017-09-11 |
| 21 | 96-CHE-2006-REQUEST FOR ADJOURNMENT OF HEARING UNDER RULE 129A [13-10-2017(online)].pdf | 2017-10-13 |
| 22 | 96-che-2006-ExtendedHearingNoticeLetter_17Nov2017.pdf | 2017-10-17 |
| 23 | 96-CHE-2006-Written submissions and relevant documents (MANDATORY) [01-12-2017(online)].pdf | 2017-12-01 |
| 24 | 96-CHE-2006-FORM-26 [01-12-2017(online)].pdf | 2017-12-01 |
| 25 | Correspondence by Agent_General Power of Attorney_08-12-2017.pdf | 2017-12-08 |
| 26 | Marked Up Copy_Granted 298255_29-06-2018.pdf | 2018-06-29 |
| 27 | Drawings_Granted 298255_29-06-2018.pdf | 2018-06-29 |
| 28 | Description_Granted 298255_29-06-2018.pdf | 2018-06-29 |
| 29 | Claims_Granted 298255_29-06-2018.pdf | 2018-06-29 |
| 30 | Abstract_Granted 298255_29-06-2018.pdf | 2018-06-29 |
| 31 | 96-CHE-2006-PatentCertificate29-06-2018.pdf | 2018-06-29 |
| 32 | 96-CHE-2006-IntimationOfGrant29-06-2018.pdf | 2018-06-29 |
| 33 | 96-CHE-2006-RELEVANT DOCUMENTS [28-03-2019(online)].pdf | 2019-03-28 |
| 34 | 96-CHE-2006-FORM 4 [28-03-2019(online)].pdf | 2019-03-28 |