Encoding Device And Encoding Method

< Back

Encoding Device And Encoding Method

Abstract: Provided is an encoding device which can obtain a sound quality preferable for auditory sense even if the number of information bits is small. The encoding device includes a shape quantization unit (111) having: a section search unit (121) which searches for a pulse for each of bands into which a predetermined search section is divided; and a whole search unit (122) which performs search for a pulse over the entire search section. The shape of an input spectrum is quantized by a small number of pulse positions and polarities. A gain quantization unit (112) calculates a gain of the pulse searched by the shape quantization unit (111) and quantizes the gain for each of the bands.

Get Free WhatsApp Updates!
Notices, Deadlines & Correspondence

Patent Information

Application #

Filing Date

01 September 2009

Publication Number

52/2009

Publication Type

INA

Invention Field

ELECTRONICS

Status

mail@lexorbis.com

Parent Application

Patent Number

Legal Status

Grant Date

2018-11-19

Renewal Date

Applicants

PANASONIC CORPORATION

1006, OAZA KADOMA, KADOMA-SHI, OSAKA 571-8501, JAPAN.

Inventors

1. MORII TOSHIYUKI

C/O. PANASONIC CORPORATION, 1006, OAZA KADOMA, KADOMA-SHI, OSAKA 571-8501, JAPAN.

2. OSHIKIRI MASAHIRO

C/O. PANASONIC CORPORATION, 1006, OAZA KADOMA, KADOMA-SHI, OSAKA 571-8501, JAPAN.

3. YAMANASHI TOMOFUMI

C/O. PANASONIC CORPORATION, 1006, OAZA KADOMA, KADOMA-SHI, OSAKA 571-8501, JAPAN.

Specification

FORM 2 THE PATENTS ACT, 1970 (39 of 1970) & THE PATENTS RULES, 2003 COMPLETE SPECIFICATION [See section 10, Rule 13] ENCODING DEVICE AND ENCODING METHOD; PANASONIC CORPORATION, A COMPANY ORGANIZED AND EXISTING UNDER THE LAWS OF JAPAN, WHOSE ADDRESS IS 1006, OAZA KADOMA, KADOMA-SHI, OSAKA 571-8501, JAPAN. THE FOLLOWING SPECIFICATION PARTICULARLY DESCRIBES THE INVENTION AND THE MANNER IN WHICH IT IS TO BE PERFORMED. Technical Field The present invention relates to a coding apparatus and coding method for encoding speech signals and audio signals. Background Art In mobile communications, it is necessary to compress and encode digital information such as speech and images for efficient use of radio channel capacity and storage media for radio waves, and many coding and decoding schemes have been developed so far. Among these, the performance of speech coding technology has been improved significantly by the fundamental scheme of "CELP (Code Excited Linear Prediction)," which skillfully adopts vector quantization by modeling the vocal tract system of speech. Further, the performance of sound coding technology such as audio coding has been improved significantly by transform coding techniques (such as MPEG-standard ACC and MP3). On the other hand, a scalable codec, the standardization of which is in progress by ITU-T (International Telecommunication Union - Telecommunication Standardization Sector) and others, is designed to cover from the conventional speech band (300 Hz to 3.4 kHz) to wideband (up to 7 kHz), with its bit rate set as high as up to approximately 32 kbps. That is, a wideband codec has to even apply a certain degree of coding to audio and therefore cannot be supported by only conventional, low-bit- rate speech coding methods based on the human voice model, such as CELP. Now, ITU-T standard G.729.1, declared earlier as a recommendation, uses an audio codec coding scheme of transform coding, to encode speech of wideband and above. Patent Document 1 discloses a coding scheme utilizing spectral parameters and pitch parameters, whereby an orthogonal transform and coding of a signal acquired by inverse-filtering a speech signal are performed based on spectral parameters, and furthermore discloses, as an example of coding, a coding method based on codebooks of algebraic structures. Patent Document 2 discloses a coding scheme of dividing a signal into the linear prediction parameters and the residual components, performing quadrature transform of the residual components and normalizing the residual waveform by the power, and then quantizing the gain and the normalized residue. Further, Patent Document 2 discloses vector quantization as a quantization method for normalized residue. Non-Patent Document 1 discloses a coding method based on an algebraic codebook formed with improved excitation spectrums in TCX (i.e. a fundamental coding scheme modeled with an excitation subjected to transform coding and filtering of spectral parameters), and this coding method is adopted in ITU-T standard G.729.1. Non-Patent Document 2 discloses description of the MPEG-standard scheme, "TC-WVQ." This scheme is also used to transform linear prediction residue into a spectrum and perform vector quantization of the spectrum, using the DCT (Discrete Cosine Transform) as the orthogonal transform method. By means of the above four prior arts, it is possible to apply, to coding, quantization of spectral parameters such as linear prediction parameters, which is part of a useful coding technique of speech signals, thereby enabling the efficiency and low rate of audio coding to be realized. Patent Document 1: Japanese Patent Application Laid-Open No.HEIlO-260698 Patent Document 2: Japanese Patent Application Laid-Open No.HEI07-261800 Non-Patent Document 1: Xie, Adoul, "EMBEDDED ALGEBRAIC VECTOR QUANTIZERS (EAVQ) WITH APPLICATION TO WIDEBAND SPEECH CODING" ICASSP'96 Non-Patent Document 2: Moriya, Honda, "Transform Coding of Speech Using a Weighted Vector Quantizer" IEEE journal on selected areas in communications, Vol.6, No.2, February 1988 Disclosure of Invention Problems to be Solved by the Invention However, the number of bits to be assigned by a scalable codec is small especially in a relatively lower layer, and, consequently, the performance of excitation transform coding is not sufficient. For example, in ITU-T standard G.729.1, although a bit rate is 12 kbps in the second or lower layer supporting the telephone band (300 Hz to 3.4 kHz), only a bit rate of 2 kbps is assigned to the next, third layer supporting a wideband (50 Hz to 7 kHz). Thus, when there are few information bits, it is not possible to achieve sufficient perceptual performance by using a method of encoding a spectrum, which is acquired by an orthogonal transform, with vector quantization using a codebook. It is therefore an object of the present invention to provide a coding apparatus and coding method that can achieve good perceptual quality even if there are few information bits. Means for Solving the Problem The coding apparatus of the present invention employs a configuration having: a shape quantizing section that encodes a shape of a frequency spectrum; and a gain quantizing section that encodes a gain of the frequency spectrum, and in which the shape quantizing section includes: an interval search section that searches for a first fixed waveform in each of a plurality of bands dividing a predetermined search interval; and a thorough search section that searches for second fixed waveforms over an entirety of the predetermined search interval. The coding method of the present invention includes the steps of: a shape quantizing step of encoding a shape of a frequency spectrum; and a gain quantizing step of encoding a gain of the frequency spectrum, and in which the shape quantizing step includes: an interval searching step of searching for a first fixed waveform in a plurality of bands dividing a predetermined search interval; and a thorough searching step of searching for second fixed waveforms over an entirety of the predetermined search interval. Advantageous Effects of Invention According to the present invention, it is possible to accurately encode frequencies (positions) where energy is present, so that it is possible to improve qualitative performance, which is unique to spectrum coding, and produce good sound quality even at low bit rates. Brief Description of Drawings FIG.l is a block diagram showing the configuration of a speech coding apparatus according to an embodiment of the present invention; FIG.2 is a block diagram showing the configuration of a speech decoding apparatus according to an embodiment of the present invention; FIG.3 is a flowchart showing the search algorithm in an interval search section according to an embodiment of the present invention; FIG.4 is a diagram showing an example of a spectrum represented by pulses searched in an interval search section according to an embodiment of the present invention; FIG.5 is a flowchart showing the searching algorithm in a thorough search section according to an embodiment of the present invention; FIG.6 is a flowchart showing the searching algorithm in a thorough search section according to an embodiment of the present invention; FIG.7 is a diagram showing an example of a spectrum represented by pulses searched in an interval search section and thorough search section according to an embodiment of the present invention; FIG.8 is a flowchart showing the decoding algorithm in a spectrum decoding section according to an embodiment of the present invention. Best Mode for Carrying Out the Invention In speech signal coding based on the CELP scheme and others, a speech signal is often represented by an excitation and synthesis filter. If a vector having a similar shape to an excitation signal, which is a time domain vector sequence, can be decoded, it is possible to produce a waveform similar to input speech through a synthesis filter, and achieve good perceptual quality. This is the qualitative characteristic that has lead to the success of the algebraic codebook used in CELP. On the other hand, in the case of frequency spectrum (vector) coding, a synthesis filter has spectral gains as its components, and therefore the distortion of the frequencies (i.e. positions) of components of large power is more significant than the distortion of these gains. That is, by searching for positions of high energy and decoding the pulses at the positions of high energy, rather than decoding a vector having a similar shape to an input spectrum, it is more likely to achieve good perceptual quality. The present inventors focused on this point and arrived at the present invention. That is, based on a model of encoding a frequency spectrum by a small number of pulses, the present invention transforms a speech signal to encode (i.e. time domain vector sequence) into a frequency domain signal by an orthogonal transform, divides the frequency interval of the coding target into a plurality of bands, and searches for one pulse in each band, and, in addition, searches for several pulses over the entire frequency interval of the coding target. Further, the present invention separates shape (form) quantization and gain (amount) quantization, and, in shape quantization, assumes an ideal gain and searches for pulses having an amplitude "1" and a polarity "+" or "-," in an open loop. Here, especially upon a search over the entire frequency interval of the coding target, the present invention does not allow two pulses to occur in the same position and allows combinations of the positions of a plurality of pulses to be encoded as transmission information about pulse positions. An embodiment of the present invention will be explained below using the accompanying drawings. FIG.l is a block diagram showing the configuration of the speech coding apparatus according to the present embodiment. The speech coding apparatus shown in FIG.l is provided with LPC analyzing section 101, LPC quantizing section 102, inverse filter 103, orthogonal transform section 104, spectrum coding section 105 and multiplexing section 106. Spectrum coding section 105 is provided with shape quantizing section 111 and gain quantizing section 112. LPC analyzing section 101 performs a linear prediction analysis of an input speech signal and outputs a spectral envelope parameter to LPC quantizing section 102 as an analysis result. LPC quantizing section 102 performs quantization processing of the spectral envelope parameter (LPC: Linear Prediction Coefficient) outputted from LPC analyzing section 101, and outputs a code representing the quantization LPC, to multiplexing section 106. Further, LPC quantizing section 102 outputs decoded parameters acquired by decoding the code representing the quantized LPC, to inverse filter 103. Here, the parameter quantization may employ vector quantization ("VQ"), prediction quantization, multi-stage VQ, split VQ and other modes. Inverse filter 103 inverse-filters input speech using the decoded parameters and outputs the resulting residual component to orthogonal transform section 104. Orthogonal transform section 104 applies a match window, such as a sine window, to the residual component, performs an orthogonal transform using MDCT, and outputs a spectrum transformed into a frequency domain spectrum (hereinafter "input spectrum"), to spectrum coding section 105. Here, the orthogonal transform may employ other transforms such as the FFT, KLT and Wavelet transform, and, although their usage varies, it is possible to transform the residual component into an input spectrum using any of these. Here, the order of processing between inverse filter 103 and orthogonal transform section 104 may be reversed. That is, by dividing input speech subjected to an orthogonal transform by the frequency spectrum of an inverse filter (i.e. subtraction in logarithmic axis), it is possible to produce the same input spectrum. Spectrum coding section 105 divides the input spectrum by quantizing the shape and gain of the spectrum separately, and outputs the resulting quantization codes to multiplexing section 106. Shape quantizing section 111 quantizes the shape of the input spectrum using a small number of pulse positions and polarities, and gain quantizing section 112 calculates and quantizes the gains of the pulses searched out by shape quantizing section 111, on a per band basis. Shape quantizing section 111 and gain quantizing section 112 will be described later in detail. Multiplexing section 106 receives as input a code representing the quantization LPC from LPC quantizing section 102 and a code representing the quantized input spectrum from spectrum coding section 105, multiplexes these information and outputs the result to the transmission channel as coding information. FIG.2 is a block diagram showing the configuration of the speech decoding apparatus according to the present embodiment. The soeech decoding apparatus shown in FIG.2 is provided with demultiplexing section 201, parameter decoding section 202, spectrum decoding section 203, orthogonal transform section 204 and synthesis filter 205. In FIG.2, coding information is demultiplexed into individual codes in demultiplexing section 201. The code representing the quantized LPC is outputted to parameter decoding section 202, and the code of the input spectrum is outputted to spectrum decoding section 203. Parameter decoding section 202 decodes the spectral envelope parameter and outputs the resulting decoded parameter to synthesis filter 205. Spectrum decoding section 203 decodes the shape vector and gain by the method supporting the coding method in spectrum coding section 105 shown in FIG.l, acquires a decoded spectrum by multiplying the decoded shape vector by the decoded gain, and outputs the decoded spectrum to orthogonal transform section 204. Orthogonal transform section 204 performs an inverse transform of the decoded spectrum outputted from spectrum decoding section 203 compared to orthogonal transform section 104 shown in FIG.l, and outputs the resulting, time-series decoded residual signal to synthesis filter 205. Synthesis filter 205 produces output speech by applying synthesis filtering to the decoded residual signal outputted from orthogonal transform section 204 using the decoded parameter outputted from parameter decoding section 202. Here, to reverse the order of processing between inverse filter 103 and orthogonal transform section 104 shown in FIG.l, the speech decoding apparatus in FIG.2 multiplies the decoded spectrum by a frequency spectrum of the decoded parameter (i.e. addition in the logarithmic axis) and performs an orthogonal transform of the resulting spectrum. Next, shape quantizing section 1 1 1 and gain quantizing section 112 will be explained in detail. Shape quantizing section 111 is provided with interval search section 121 that searches for pulses in each of a plurality of bands a predetermined search interval is divided into, and thorough search section 122 that searches for pulses over the entire search interval. Following equation 1 provides a reference for search. Here, in equation 1, E is the coding distortion, s, is the input spectrum, g is the optimal gain, 5 is the delta function, and p is the pulse position, [1] From equation 1 above, the pulse position to minimize the cost function is the position in which the absolute value |sp| of the input spectrum in each band is maximum, and its polarity is the polarity of the value of the input spectrum value at the position of that pulse. An example case will be explained below where the vector length of an input spectrum is eighty samples, the number of bands is five, and the spectrum is encoded using eight pulses, one pulse from each band and three pulses from the entire band. In this case, the length of each band is sixteen samples. Further, the amplitude of pulses to search for is fixed to "1," and their polarity is "+" or "-." Interval search section 121 searches for the position of the maximum energy and the polarity (+/-) in each band, and allows one pulse to occur per band. In this example, the number of bands is five, and each band requires four bits to show the pulse position (entries of positions: 16) and one bit to show the polarity (+/-), requiring twenty five information bits in total. The flow of the search algorithm of interval search section 121 is shown in FIG.3. Here, the symbols used in the flowchart of FIG.3 stand for the following contents. i: position b: band number max: maximum value c: counter pos[b]: search result (position) pol[b]: search result (polarity) s[i]: input spectrum As shown in FIG.3, interval search section 121 calculates the input spectrum s[i] of each sample (0

Documents

Application Documents

#	Name	Date
1	1655-MUMNP-2009-ENGLISH TRANSLATION-(29-03-2016).pdf	2016-03-29
2	1655-MUMNP-2009-CORRESPONDENCE-(29-03-2016).pdf	2016-03-29
3	Petition Under Rule 137 [09-06-2016(online)].pdf	2016-06-09
4	Form 3 [09-06-2016(online)].pdf	2016-06-09
4	1655-mumnp-2009-abstract.doc	2018-08-10
5	Examination Report Reply Recieved [09-06-2016(online)].pdf	2016-06-09
6	Description(Complete) [09-06-2016(online)].pdf	2016-06-09
6	1655-mumnp-2009-claims.doc	2018-08-10
7	Correspondence [09-06-2016(online)].pdf	2016-06-09
8	OTHERS [20-06-2016(online)].pdf	2016-06-20
9	Examination Report Reply Recieved [20-06-2016(online)].pdf	2016-06-20
10	Description(Complete) [20-06-2016(online)].pdf	2016-06-20
11	Claims [20-06-2016(online)].pdf	2016-06-20
12	Abstract [20-06-2016(online)].pdf	2016-06-20
13	Other Patent Document [05-10-2016(online)].pdf	2016-10-05
14	Power of Attorney [18-05-2017(online)].pdf	2017-05-18
15	Other Document [18-05-2017(online)].pdf	2017-05-18
16	Form 6 [18-05-2017(online)].pdf	2017-05-18
17	Form 13 [18-05-2017(online)].pdf	2017-05-18
18	Assignment [18-05-2017(online)].pdf	2017-05-18
19	1655-MUMNP-2009-ORIGINAL UNDER RULE 6 (1A)-23-05-2017.pdf	2017-05-23
20	Form 3 [28-06-2017(online)].pdf	2017-06-28
21	1655-MUMNP-2009-Correspondence to notify the Controller (Mandatory) [15-01-2018(online)].pdf	2018-01-15
22	1655-MUMNP-2009-Written submissions and relevant documents (MANDATORY) [01-02-2018(online)].pdf	2018-02-01
23	1655-MUMNP-2009-PETITION UNDER RULE 137 [01-02-2018(online)].pdf	2018-02-01
24	1655-MUMNP-2009-Response to office action (Mandatory) [18-05-2018(online)].pdf	2018-05-18
25	Specification - Final - compressed.pdf_4.pdf	2018-08-10
26	Specification - Final - compressed.pdf	2018-08-10
27	POA,FORM-1,2.pdf	2018-08-10
28	Office actions & examination reports-1.pdf_1.pdf	2018-08-10
29	Office actions & examination reports-1.pdf	2018-08-10
30	Marked copies - Stacked.pdf_5.pdf	2018-08-10
31	Marked copies - Stacked.pdf	2018-08-10
32	letter.pdf_2.pdf	2018-08-10
33	letter.pdf	2018-08-10
34	FORM-6.pdf	2018-08-10
35	FER Response.pdf_8.pdf	2018-08-10
36	FER Response.pdf	2018-08-10
37	Comp spec.pdf_3.pdf	2018-08-10
38	Comp spec.pdf	2018-08-10
39	Claims - Clean copy.pdf_6.pdf	2018-08-10
40	Claims - Clean copy.pdf	2018-08-10
41	ASSIGNMENT.pdf	2018-08-10
42	abstract1.jpg	2018-08-10
43	Abstract - Clean copy.pdf_7.pdf	2018-08-10
44	Abstract - Clean copy.pdf	2018-08-10
45	1655-MUMNP-2009_EXAMREPORT.pdf	2018-08-10
46	1655-mumnp-2009-wo international publication report a1.pdf	2018-08-10
47	1655-MUMNP-2009-POWER OF ATTORNEY(4-9-2009).pdf	2018-08-10
48	1655-mumnp-2009-pct-isa-210.pdf	2018-08-10
49	1655-mumnp-2009-pct-ib-306.pdf	2018-08-10
50	1655-mumnp-2009-pct-ib-304.pdf	2018-08-10
51	1655-mumnp-2009-pct other.pdf	2018-08-10
52	1655-MUMNP-2009-HearingNoticeLetter.pdf	2018-08-10
53	1655-mumnp-2009-form 5.pdf	2018-08-10
54	1655-mumnp-2009-form 3.pdf	2018-08-10
55	1655-MUMNP-2009-FORM 3(16-2-2010).pdf	2018-08-10
56	1655-mumnp-2009-form 2.pdf	2018-08-10
58	1655-mumnp-2009-form 2(title page).pdf	2018-08-10
59	1655-MUMNP-2009-FORM 18(8-2-2011).pdf	2018-08-10
60	1655-mumnp-2009-form 1.pdf	2018-08-10
61	1655-MUMNP-2009-English Translation-220316.pdf	2018-08-10
62	1655-mumnp-2009-drawing.pdf	2018-08-10
63	1655-mumnp-2009-description(complete).pdf	2018-08-10
65	1655-mumnp-2009-correspondence.pdf	2018-08-10
66	1655-MUMNP-2009-Correspondence-220316.pdf	2018-08-10
67	1655-MUMNP-2009-CORRESPONDENCE(8-2-2011).pdf	2018-08-10
68	1655-MUMNP-2009-CORRESPONDENCE(4-9-2009).pdf	2018-08-10
69	1655-MUMNP-2009-CORRESPONDENCE(16-2-2010).pdf	2018-08-10
70	1655-mumnp-2009-claims.pdf	2018-08-10
72	1655-mumnp-2009-abstract.pdf	2018-08-10
74	1655-MUMNP-2009-PatentCertificate19-11-2018.pdf	2018-11-19
75	1655-MUMNP-2009-IntimationOfGrant19-11-2018.pdf	2018-11-19
76	1655-MUMNP-2009-RELEVANT DOCUMENTS [28-01-2019(online)].pdf	2019-01-28