Abstract: There are proposed techniques (e.g., in apparatus, methods, programs) for selecting pitch lag. An apparatus (10, 60a, 110) for encoding an information signal including a plurality of frames. The apparatus may comprise a first estimator (11) configured to obtain a first estimate (14, Ti), the first estimate being an estimate of a pitch lag for a current frame (13). The apparatus may comprise a second estimator (12) configured to obtain a second estimate (16, T2), the second estimate being another estimate of a pitch lag for the current frame (13). A selector (17) may be configured to choose (S103) a selected value (19, Tbest) by performing a selection between the first estimate (14, T1) and the second estimate (16, T2) on the basis of a first and a second correlation measurements (23, 25).
Description
Examples of methods and apparatus are here provided which are capable of performing a low complexity pitch detection procedure, e.g., for long term postfiltering, LTPF, encoding.
For example, examples are capable of selecting a pitch lag for an information signal, e.g. audio signal, e.g., for performing LTPF.
1.1. Background
Transform-based audio codecs generally introduce inter-harmonic noise when processing harmonic audio signals, particularly at low delay and low bitrate. This inter-harmonic noise is generally perceived as a very annoying artefact, significantly reducing the performance of the transform-based audio codec when subjectively evaluated on highly tonal audio material.
Long Term Post Filtering (LTPF) is a tool for transform-based audio coding that helps at reducing this inter-harmonic noise. It relies on a post-filter that is applied on the time-domain signal after transform decoding. This post-filter is essentially an infinite impulse response (IIR) filter with a comb-like frequency response controlled by two parameters: a pitch lag and a gain.
For better robustness, the post-filter parameters (a pitch lag and/or a gain per frame) are estimated at the encoder-side and encoded in a bitstream when the gain is non-zero. The case of the zero gain is signalled with one bit and corresponds to an inactive post-filter, used when the signal does not contain a harmonic part.
LTPF was first introduced in the 3GPP EVS standard [1] and later integrated to the MPEG-H 3D-audio standard [2]. Corresponding patents are [3] and [4].
A pitch detection algorithm estimates one pitch lag per frame. It is usually performed at a low sampling rate (e.g. 6.4kHz) in order to reduce the complexity. It should ideally provide an accurate, stable and continuous estimation.
When used for LTPF encoding, it is most important to have a continuous pitch contour, otherwise some instability artefacts could be heard in the LTPF filtered output signal. Not having a true fundamental frequency F0 (for example by having a multiple of it) is of less importance, because it does not result in severe artefacts but instead results in a slight degradation of the LTPF performance.
Another important characteristic of a pitch detection algorithm is its computational complexity. When implemented in an audio codec targeting low power devices or even ultra-low power devices, its computational complexity should be as low as possible.
1.2. Prior art
There is an example of a LTPF encoder that can be found in the public domain. It is described in the 3GPP EVS standard [1]. This implementation is using a pitch detection algorithm described in Sec. 5.1.10 of the standard specifications. This pitch detection algorithm has a good performance and works nicely with LTPF because it gives a very stable and continuous pitch contour. Its main drawback is however its relatively high complexity.
Even though they were never used for LTPF encoding, other existing pitch detection algorithms could in theory be used for LTPF. One example is YIN [6], a pitch detection algorithm often recognized as one of the most accurate. YIN is however very complex, even significantly more than the one in [1].
Another example worth mentioning is the pitch detection algorithm used in the 3GPP AMR-WB standard [7], which has a significantly lower complexity than the one in [1], but also worse performance, it particularly gives a less stable and continuous pitch contour. The prior art comprises the following disclosures:
[1] 3GPP TS 26.445; Codec for Enhanced Voice Services (EVS); Detailed algorithmic description.
[2] ISO/IEC 23008-3:2015; Information technology -- High efficiency coding and media delivery in heterogeneous environments -- Part 3: 3D audio.
[3] Ravelli et al. "Apparatus and method for processing an audio signal using a harmonic post-filter." U.S. Patent Application No. 2017/0140769 A1. 18 May. 2017.
[4] Markovic et al. "Harmonicity-dependent controlling of a harmonic filter tool." U.S. Patent Application No. 2017/0133029 A1. 11 May. 2017.
[5] ITU-T G.718 : Frame error robust narrow-band and wideband embedded variable bit-rate coding of speech and audio from 8-32 kbit/s.
[6] De Cheveigne, Alain, and Hideki Kawahara. "YIN, a fundamental frequency estimator for speech and music." The Journal of the Acoustical Society of America 111.4 (2002): 1917-1930.
[7] 3GPP TS 26.190; Speech codec speech processing functions; Adaptive Multi-Rate -Wideband (AMR-WB) speech codec; Transcoding functions.
There are some cases, however, for which the pitch lag estimation should be ameliorated. Current low complexity pitch detection algorithms (like the one in [7]) have a performance which is not satisfactory for LTPF, particularly for complex signals, like polyphonic music. The pitch contour can be very unstable, even during stationary tones. This is due to jumps between the local maxima of the weighted autocorrelation function.
Therefore, there is the need of obtaining pitch lag estimations which better adapt to complex signals, with the same or lower complexity than the prior art.
2. Summary of the invention
In accordance to examples, there is provided an apparatus for encoding an information signal including a plurality of frames, the apparatus comprising:
a first estimator configured to obtain a first estimate, the first estimate being an estimate of a pitch lag for a current frame:
a second estimator configured to obtain a second estimate, the second estimate being another estimate of a pitch lag for the current frame,
a selector configured to choose a selected value by performing a selection between the first estimate and the second estimate on the basis of a first and a second correlation measurements,
wherein the second estimator is conditioned by the pitch lag selected at the previous frame so as to obtain the second estimate for the current frame, characterized in that the selector is configured to:
perform a comparison between:
a downscaled version of a first correlation measurement associated to the current frame and obtained at a lag corresponding to the first estimate; and
a second correlation measurement associated to the current frame and obtained at a lag corresponding to the second estimate,
so as to select the first estimate when the second correlation measurement is less than the downscaled version of the first correlation measurement, and/or
to select the second estimate when the second correlation measurement is greater than the downscaled version of the first correlation measurement,
wherein at least one of the first and second correlation measurement is an autocorrelation measurement and/or a normalized autocorrelation measurement.
in accordance to examples, there is provided an apparatus for encoding an information signal into a bitstream (63) including a plurality of frames, the apparatus (60a) comprising:
a detection unit comprising:
a first estimator configured to obtain a first estimate, the first estimate being an estimate of a pitch lag for a current frame;
a second estimator configured to obtain a second estimate, the second estimate being another estimate of a pitch lag for the current frame, wherein the second estimator is conditioned by the pitch lag selected at the previous frame so as to obtain the second estimate for the current frame;
a selector configured to choose a selected value by performing a selection between the first estimate and the second estimate on the basis of at least one correlation measurement, wherein the selector is configured to perform a comparison between:
a second correlation measurement associated to the current frame and obtained at a lag corresponding to the second estimate; and
a pitch lag selection threshold,
so as to select the second estimate when the second correlation measurement is greater than the pitch lag selection threshold; and/or to select the first estimate when the second correlation measurement is lower than the pitch lag selection threshold; and
a long-term post filtering, LTPF, tool configured to encode data useful for performing LTPF at the decoder, the data useful for performing LTPF including the selected value.
In accordance to examples, there is provided an apparatus for encoding an information signal including a plurality of frames, the apparatus comprising:
a first estimator configured to obtain a first estimate, the first estimate being an estimate of a pitch lag for a current frame;
a second estimator configured to obtain a second estimate, the second estimate being another estimate of a pitch lag for the current frame,
a selector configured to choose a selected value by performing a selection between the first estimate and the second estimate on the basis of at least one correlation measurement,
wherein the second estimator is conditioned by the pitch lag selected at the previous frame so as to obtain the second estimate for the current frame.
In accordance to examples, the selector is configured to perform a comparison between:
a second correlation measurement associated to the current frame and obtained at a lag corresponding to the second estimate; and
a pitch lag selection threshold,
so as to select the second estimate when the second correlation measurement is greater than the pitch lag selection threshold; and/or
to select the first estimate when the second correlation measurement is lower than the pitch lag selection threshold.
In accordance to examples, the selector is configured to perform a comparison between:
a first correlation measurement associated to the current frame and obtained at a lag corresponding to the first estimate; and
a second correlation measurement associated to the current frame and obtained at a lag corresponding to the second estimate,
so as to select the first estimate when the first correlation measurement is at least greater than the second correlation measurement, and/or
to select the second estimate when the first correlation measurement is at least lower than the second correlation measurement.
In accordance to examples, the selector is configured to:
perform a comparison between:
a downscaled version of a first correlation measurement associated to the current frame and obtained at a lag corresponding to the first estimate; and
a second correlation measurement associated to the current frame and obtained at a lag corresponding to the second estimate,
so as to select the first estimate when the second correlation measurement is less than the downscaled version of the first correlation measurement, and/or
to select the second estimate when the second correlation measurement is greater than the downscaled version of the first correlation measurement.
In accordance to examples, at least one of the first and second correlation measurement is an autocorrelation measurement and/or a normalized autocorrelation measurement.
A transform coder to generate a representation of the information signal or a processed version thereof may be implemented.
In accordance to examples, the second estimator is configured to:
obtain the second estimate by searching the lag which maximizes a second correlation function in a second subinterval which contains the pitch lag selected for the previous frame.
In accordance to examples, the second subinterval contains lags within a distance less than a pre-defined lag number threshold from the pitch lag selected for the previous frame.
In accordance to examples, the second estimator is configured to:
search for a maximum value among the second correlation function values to associate the second estimate to the lag associated to the maximum value among the second correlation function values.
In accordance to examples, the first estimator is configured to:
obtain the first estimate as the lag that maximizes a first correlation function associated to the current frame.
In accordance to examples, the first correlation function is restricted to lags in a first subinterval.
In accordance to examples, the first subinterval contains a number of lags greater than the second subinterval, and/or at least some of the lags in the second subinterval are comprised in the first subinterval.
In accordance to examples, the first estimator) is configured to:
weight the correlation measurement values of the first correlation function using a monotonically decreasing weight function before searching for the lag that maximizes the first correlation function.
In accordance to examples, at least one of the second and first correlation function is an autocorrelation function and/or a normalized autocorrelation function.
In accordance to examples, the first estimator is configured to obtain the first estimate T1 by performing at least some of the following operations:
Claims
1. An apparatus (10, 60a, 110) for encoding an information signal including a plurality of frames, the apparatus comprising:
a first estimator (11) configured to obtain a first estimate (14, T1), the first estimate being an estimate of a pitch lag for a current frame (13);
a second estimator (12) configured to obtain a second estimate (16, T2), the second estimate being another estimate of a pitch lag for the current frame (13), a selector (17) configured to choose (S103) a selected value (19, Tbest) by performing a selection between the first estimate (14, Τ1) and the second estimate (16, T2) on the basis of a first and a second correlation measurements (23, 25), wherein the second estimator (12) is conditioned by the pitch lag (51, 19") selected at the previous frame so as to obtain the second estimate (16, T2) for the current frame (13),
characterized in that the selector (17) is configured to:
perform a comparison between:
a downscaled version (24) of a first correlation measurement (23) associated to the current frame (13) and obtained at a lag corresponding to the first estimate (14, T1); and
a second correlation measurement (25) associated to the current frame (13) and obtained at a lag corresponding to the second estimate (16, T2),
so as to select the first estimate (14, Τ1) when the second correlation measurement (25) is less than the downscaled version (24) of the first correlation measurement (23), and/or
to select the second estimate (16, T2) when the second correlation measurement (25) is greater than the downscaled version (24) of the first correlation measurement (23),
wherein at least one of the first and second correlation measurements (23, 25) is an autocorrelation measurement and/or a normalized autocorrelation measurement.
2. An apparatus (60a) for encoding an information signal into a bitstream (63) including a plurality of frames, the apparatus (60a) comprising:
a detection unit (10, 65) comprising:
a first estimator (11) configured to obtain a first estimate (14, T1), the first estimate being an estimate of a pitch lag for a current frame (13);
a second estimator (12) configured to obtain a second estimate (16, T2), the second estimate being another estimate of a pitch lag for the current frame (13), wherein the second estimator (12) is conditioned by the pitch lag (51 , 19") selected at the previous frame so as to obtain the second estimate (16, T2) for the current frame (13);
a selector (17) configured to choose (S103) a selected value (19, Tbest) by performing a selection between the first estimate (14, T1) and the second estimate (16, T2) on the basis of at least one correlation measurement (23, 25), wherein the selector (17) is configured to perform a comparison (27) between:
a second correlation measurement (25) associated to the current frame (13) and obtained at a lag corresponding to the second estimate (16, T2); and
a pitch lag selection threshold (24),
so as to select (S103) the second estimate (16, T2) when the second correlation measurement (25) is greater than the pitch lag selection threshold (24); and/or
to select (S103) the first estimate (14, T1) when the second correlation measurement (25) is lower than the pitch lag selection threshold (24); and
a long-term post filtering, LTPF, tool (66) configured to encode data useful for performing LTPF at the decoder (60b), the data useful for performing LTPF including the selected value (19. Tbest).
3. The apparatus of claim 2, wherein the comparison is between:
a first correlation measurement (23) associated to the current frame (13) and obtained at a lag corresponding to the first estimate (14, T1), which represents the pitch lag selection threshold (24); and
the second correlation measurement (25).
4. The apparatus of claim 2 or 3, wherein the comparison is between:
a downscaled version (24) of a first correlation measurement (23) associated to the current frame (13) and obtained at a lag corresponding to the first estimate (14, T1), which represents the pitch lag selection threshold (24); and
the second correlation measurement (25).
5. The apparatus of any of claims 2-4, wherein:
at least one of the first and second correlation measurements (23, 25) is an autocorrelation measurement and/or a normalized autocorrelation measurement.
6. The apparatus of any of claims 2-5, configured to compare the selected value (19, Tbest) with a predetermined LTPF threshold, so as to avoid to encode the selected value (19, Tbest) in case the selected value (19, Tbest) is below the predetermined threshold.
7. The apparatus of any of the preceding claims, wherein the second estimator (12) is configured to:
obtain the second estimate (16) by searching the lag which maximizes a second correlation function in a second subinterval (52) which contains the pitch lag (51, 19") selected for the previous frame.
8. The apparatus of claim 7, wherein:
the second subinterval (52) contains lags (T) within a distance less than a pre-defined lag number threshold from the pitch lag (51 , 19") selected for the previous frame.
9. The apparatus of any of claims 7 or 8, wherein the second estimator (12) is configured to:
search for a maximum value among the second correlation function values to associate the second estimate (16) to the lag (T2) associated to the maximum value among the second correlation function values.
10. The apparatus of any of the preceding claims, wherein the first estimator (12) is configured to:
obtain the first estimate (14) as the lag (T1) that maximizes a first correlation function associated to the current frame (13).
11. The apparatus of claim 10, wherein the first correlation function is restricted to lags in a first subintervai.
12. The apparatus of claim 11 , wherein the first subintervai contains a number of lags greater than the second subintervai (52), and/or at least some of the lags in the second subintervai (52) are comprised in the first subintervai.
13. The apparatus of any of the preceding claims, wherein the first estimator (11) is configured to:
weight the correlation measurement values of a first correlation function using a monotonicaliy decreasing weight function before searching for the lag (T1) that maximizes the first correlation function.
14. The apparatus of any of claims 7-13, wherein:
at least one of the second and first correlation function is an autocorrelation function and/or a normalized autocorrelation function.
15. The apparatus of any of the preceding claims, wherein the first estimator (11) is configured to obtain the first estimate T1 by performing at least some of the following operations:
Rw (k) = R(k)w(k) for k = kmin.. kmax
w(k) being a weighting function, kmin and kmax being associated to a minimum lag and a maximum lag, R being an autocorrelation measurement value estimated on the basis of the information signal or a processed version thereof and N being the frame length.
16. The apparatus of any of the preceding claims, wherein the second estimator
(12) is configured to obtain the second estimate T2 by performing:
with
being the selected estimate in the preceding frame, and δ is a distance from Tprev, kmin and kmax being associated to a minimum lag and a maximum lag.
17. The apparatus of any of the preceding claims, wherein the selector (17) is configured to perform a selection of the pitch lag estimate Tcurr in terms of
with T1 being the first estimate, T2 being the second estimate, x being a value of the information signal or a processed version thereof, normcorr(x, N, T) being the normalized correlation measurement of the signal x of length N at lag T, α being a downscaling coefficient.
18. The apparatus of any of the preceding claims, further comprising, downstream to the selector (17), a long term postfiltering, LTPF, tool (66) for controlling a long term postfilter (67) at a decoder apparatus (60b).
19. The apparatus of any of the preceding claims, wherein the information signal is an audio signal.
20. The apparatus of any of the preceding claims, configured to obtain the first and second correlation measurements using the same correlation function up to a weighting function.
21. The apparatus of any of the preceding claims, configured to obtain the first correlation measurement as the normalized version of the first estimate up to a weighting function.
22. The apparatus of any of the preceding claims, configured to obtain the second correlation measurement as the normalized version of the second estimate.
23. The apparatus of any of the preceding claims, further comprising a transform coder (62) configured to generate a representation (63a) of the information signal (61) or a processed version thereof.
24. A system (60) comprising an encoder side (10, 60a) and a decoder side (60b), the encoder side comprising the apparatus according to any of the
preceding claims, the decoder side comprising a long term postfiltering tool (67) controlled on the basis of the pitch lag estimate selected by the selector (17).
25. A method (100) for determining a pitch lag for a signal divided into frames, comprising:
performing a first estimation for a current frame (S101);
performing a second estimation for the current frame (S102); and
selecting between the first estimate (14, T1) obtained at the first estimation and the second estimate (16, T2) obtained at the second estimation on the basis of correlation measurements (S103),
wherein performing the second estimation is obtained on the basis of the result of a selecting step performed at the previous frame,
characterized in that selecting includes performing a comparison between:
a downscaled version (24) of a first correlation measurement (23) associated to the current frame (13) and obtained at a lag corresponding to the first estimate (14, T1);
a second correlation measurement (25) associated to the current frame (13) and obtained at a lag corresponding to the second estimate (16, T2); and
selecting the first estimate (14, T1) when the second correlation measurement (25) is less than the downscaled version of the first correlation measurement (23), and/or selecting the second estimate (16, T2) when the second correlation measurement (25) is greater than the downscaled version of the first correlation measurement (23),
wherein at least one of the first and second correlation measurements (23,
25) is an autocorrelation measurement and/or a normalized autocorrelation measurement.
26. The method of claim 25, further comprising using the selected lag for long term postfiltering, LTPF.
27. A method (100) for encoding a bitstream for a signal divided into frames, comprising:
performing a first estimation for a current frame (S101);
performing a second estimation for the current frame (S102); and selecting between the first estimate (14, T1) obtained at the first estimation and the second estimate (16, T2) obtained at the second estimation on the basis of at least one correlation measurement (S103),
wherein performing the second estimation is obtained on the basis of the result of a selecting step performed at the previous frame,
wherein selecting includes performing a comparison (27) between:
a second correlation measurement (25) associated to the current frame (13) and obtained at a lag corresponding to the second estimate (16, T2); and
a pitch lag selection threshold (24),
selecting (S103) the second estimate (16, T2) when the second correlation measurement (25) is greater than the pitch lag selection threshold (24) and/or selecting (S103) the first estimate (14, T1) when the second correlation measurement (25) is lower than the pitch lag selection threshold (24); and
the method further comprising encoding data useful for performing LTPF at the decoder (60b) the selected value (19, Tbest).
28. The method of any of claims 25-27, further comprising using the selected lag for packet lost concealment, PLC.
29. A program comprising instructions which, when executed by a processor (111), cause the processor to perform a method according to any of claims 25-28.
| # | Name | Date |
|---|---|---|
| 1 | 202037018651.pdf | 2020-05-01 |
| 2 | 202037018651-STATEMENT OF UNDERTAKING (FORM 3) [01-05-2020(online)].pdf | 2020-05-01 |
| 3 | 202037018651-FORM 1 [01-05-2020(online)].pdf | 2020-05-01 |
| 4 | 202037018651-FIGURE OF ABSTRACT [01-05-2020(online)].pdf | 2020-05-01 |
| 5 | 202037018651-DRAWINGS [01-05-2020(online)].pdf | 2020-05-01 |
| 6 | 202037018651-DECLARATION OF INVENTORSHIP (FORM 5) [01-05-2020(online)].pdf | 2020-05-01 |
| 7 | 202037018651-COMPLETE SPECIFICATION [01-05-2020(online)].pdf | 2020-05-01 |
| 8 | 202037018651-FORM 18 [24-06-2020(online)].pdf | 2020-06-24 |
| 9 | 202037018651-FORM-26 [22-07-2020(online)].pdf | 2020-07-22 |
| 10 | 202037018651-FORM-26 [04-08-2020(online)].pdf | 2020-08-04 |
| 11 | 202037018651-Information under section 8(2) [18-09-2020(online)].pdf | 2020-09-18 |
| 12 | 202037018651-Proof of Right [02-11-2020(online)].pdf | 2020-11-02 |
| 13 | 202037018651-Information under section 8(2) [23-03-2021(online)].pdf | 2021-03-23 |
| 14 | 202037018651-Information under section 8(2) [07-09-2021(online)].pdf | 2021-09-07 |
| 15 | 202037018651-FORM 3 [07-09-2021(online)].pdf | 2021-09-07 |
| 16 | 202037018651-FER.pdf | 2021-10-18 |
| 17 | 202037018651-OTHERS [15-12-2021(online)].pdf | 2021-12-15 |
| 18 | 202037018651-Information under section 8(2) [15-12-2021(online)].pdf | 2021-12-15 |
| 19 | 202037018651-FER_SER_REPLY [15-12-2021(online)].pdf | 2021-12-15 |
| 20 | 202037018651-ENDORSEMENT BY INVENTORS [15-12-2021(online)].pdf | 2021-12-15 |
| 21 | 202037018651-DRAWING [15-12-2021(online)].pdf | 2021-12-15 |
| 22 | 202037018651-CLAIMS [15-12-2021(online)].pdf | 2021-12-15 |
| 23 | 202037018651-ABSTRACT [15-12-2021(online)].pdf | 2021-12-15 |
| 24 | 202037018651-Information under section 8(2) [21-01-2022(online)].pdf | 2022-01-21 |
| 25 | 202037018651-Information under section 8(2) [07-04-2022(online)].pdf | 2022-04-07 |
| 26 | 202037018651-FORM 3 [07-04-2022(online)].pdf | 2022-04-07 |
| 27 | 202037018651-Information under section 8(2) [05-08-2022(online)].pdf | 2022-08-05 |
| 28 | 202037018651-FORM 3 [05-08-2022(online)].pdf | 2022-08-05 |
| 29 | 202037018651-Information under section 8(2) [01-02-2023(online)].pdf | 2023-02-01 |
| 30 | 202037018651-FORM 3 [11-02-2023(online)].pdf | 2023-02-11 |
| 31 | 202037018651-Information under section 8(2) [13-03-2023(online)].pdf | 2023-03-13 |
| 32 | 202037018651-FORM 3 [08-08-2023(online)].pdf | 2023-08-08 |
| 33 | 202037018651-Information under section 8(2) [14-08-2023(online)].pdf | 2023-08-14 |
| 34 | 202037018651-US(14)-HearingNotice-(HearingDate-10-01-2024).pdf | 2023-12-22 |
| 35 | 202037018651-US(14)-ExtendedHearingNotice-(HearingDate-11-01-2024).pdf | 2023-12-22 |
| 36 | 202037018651-FORM-26 [05-01-2024(online)].pdf | 2024-01-05 |
| 37 | 202037018651-Correspondence to notify the Controller [05-01-2024(online)].pdf | 2024-01-05 |
| 38 | 202037018651-Written submissions and relevant documents [24-01-2024(online)].pdf | 2024-01-24 |
| 39 | 202037018651-FORM 3 [24-01-2024(online)].pdf | 2024-01-24 |
| 40 | 202037018651-PatentCertificate10-02-2024.pdf | 2024-02-10 |
| 41 | 202037018651-IntimationOfGrant10-02-2024.pdf | 2024-02-10 |
| 1 | Search_Strategy_202037018651E_09-03-2021.pdf |