ENGINEERING COMMITTEE Data Standards Subcommittee AMERICAN NATIONAL STANDARD ANSI/SCTE 24-23 2007 BV32 Speech Codec Specification for Voice over IP Applications in Cable Telephony NOTICE The Society of Cable Telecommunications Engineers (SCTE) Standards are intended to serve the public interest by providing specifications, test methods and procedures that promote uniformity of product, interchangeability and ultimately the long term reliability of broadband communications facilities. These documents shall not in any way preclude any member or non-member of SCTE from manufacturing or selling products not conforming to such documents, nor shall the existence of such standards preclude their voluntary use by those other than SCTE members, whether used domestically or internationally. SCTE assumes no obligations or liability whatsoever to any party who may adopt the Standards. Such adopting party assumes all risks associated with adoption of these Standards, and accepts full responsibility for any damage and/or claims arising from the adoption of such Standards. Attention is called to the possibility that implementation of this standard may require the use of subject matter covered by patent rights. By publication of this standard, no position is taken with respect to the existence or validity of any patent rights in connection therewith. SCTE shall not be responsible for identifying patents for which a license may be required or for conducting inquiries into the legal validity or scope of those patents that are brought to its attention. Patent holders who believe that they hold patents which are essential to the implementation of this standard have been requested to provide information about those patents and any related licensing terms and conditions. Any such declarations made before or after publication of this document are available on the SCTE web site at http://www.scte.org. All Rights Reserved © Society of Cable Telecommunications Engineers, Inc. 2007 140 Philips Road Exton, PA 19341 ii Contents 1 INTRODUCTION.................................................................................................................... 1 2 OVERVIEW OF THE BV32 SPEECH CODEC .................................................................. 1 2.1 Brief Introduction of Two-Stage Noise Feedback Coding (TSNFC) .......................................... 1 2.2 Overview of the BV32 Codec ......................................................................................................... 3 3 DETAILED DESCRIPTION OF THE BV32 ENCODER ................................................... 7 3.1 High-Pass Pre-Filtering .................................................................................................................. 7 3.2 Pre-emphasis Filtering .................................................................................................................... 7 3.3 Short-Term Linear Predictive Analysis ........................................................................................ 7 3.4 Conversion to LSP ........................................................................................................................ 10 3.5 LSP Quantization .......................................................................................................................... 12 3.6 Conversion to Short-Term Predictor Coefficients ..................................................................... 18 3.7 Short-Term Linear Prediction of Input Signal........................................................................... 19 3.8 Long-Term Linear Predictive Analysis (Pitch Extraction) ....................................................... 19 3.9 Long-Term Predictor Parameter Quantization ......................................................................... 27 3.10 Excitation Gain Quantization ...................................................................................................... 28 3.11 Excitation Vector Quantization ................................................................................................... 32 3.12 Bit Multiplexing............................................................................................................................. 35 4 DETAILED DESCRIPTION OF THE BV32 DECODER ................................................. 37 4.1 Bit De-multiplexing ....................................................................................................................... 37 4.2 Long-Term Predictor Parameter Decoding ................................................................................ 37 4.3 Short-Term Predictor Parameter Decoding ............................................................................... 37 4.4 Excitation Gain Decoding ............................................................................................................. 40 4.5 Excitation VQ Decoding and Scaling .......................................................................................... 43 4.6 Long-Term Synthesis Filtering .................................................................................................... 43 4.7 Short-Term Synthesis Filtering ................................................................................................... 43 4.8 De-emphasis Filtering ................................................................................................................... 44 4.9 Example Packet Loss Concealment ............................................................................................. 44 APPENDIX 1: GRID FOR LPC TO LSP CONVERSION ...................................................... 47 APPENDIX 2: FIRST-STAGE LSP CODEBOOK ................................................................... 49 APPENDIX 3: SECOND-STAGE LOWER SPLIT LSP CODEBOOK ................................. 52 APPENDIX 4: SECOND-STAGE UPPER SPLIT LSP CODEBOOK ................................... 53 APPENDIX 5: PITCH PREDICTOR TAB CODEBOOK ....................................................... 54 APPENDIX 6: GAIN CODEBOOK ........................................................................................... 55 iii APPENDIX 7: GAIN CHANGE THRESHOLD MATRIX ..................................................... 56 APPENDIX 8: EXCITATION VQ SHAPE CODEBOOK ....................................................... 57 Figures Figure 1 Basic codec structure of Two-Stage Noise Feedback Coding (TSNFC) ................... 2 Figure 2 Block diagram of the BV32 encoder ............................................................................ 4 Figure 3 Block diagram of the BV32 decoder ............................................................................ 5 Figure 4 BV32 short-term linear predictive analysis and quantization (block 10) ................ 8 Figure 5 BV32 LSP quantizer (block 16) ................................................................................. 13 Figure 6 BV32 long-term predictive analysis and quantization (block 20) ........................... 20 Figure 7 Prediction residual quantizer (block 30) ................................................................... 28 Figure 8 Filter structure used in BV32 excitation VQ codebook search ............................... 33 Figure 9 BV32 bit stream format .............................................................................................. 36 Figure 10 BV32 short-term predictor parameter decoder (block 120)................................. 38 Figure 11 Excitation gain decoder ............................................................................................ 41 Tables Table 1 Bit allocation of the BV32 codec .................................................................................... 6 iv 1 INTRODUCTION This document contains the description of the BV32 speech codec 1. BV32 compresses 16 kHz sampled wideband speech to a bit rate of 32 kb/s (kilobits per second) by employing a speech coding algorithm called Two-Stage Noise Feedback Coding (TSNFC), developed by Broadcom. The rest of this document is organized as follows. Section 2 gives a high-level overview of TSNFC and BV32. Sections 3 and 4 give detailed description of the BV32 encoder and decoder, respectively. The BV32 codec specification given in Sections 3 and 4 contain enough details to allow those skilled in the art to implement bit-stream compatible and functionally equivalent BV32 encoder and decoder. 2 OVERVIEW OF THE BV32 SPEECH CODEC In this section, the general principles of Two-Stage Noise Feedback Coding (TSNFC) are first introduced. Next, an overview of the BV32 algorithm is given. 2.1 Brief Introduction of Two-Stage Noise Feedback Coding (TSNFC) In conventional Noise Feedback Coding (NFC), the encoder modifies a prediction residual signal by adding a noise feedback signal to it. A scalar quantizer quantizes this modified prediction residual signal. The difference between the quantizer input and output, or the quantization error signal, is passed through a noise feedback filter. The output signal of this filter is the noise feedback signal added to the prediction residual. The noise feedback filter is used to control the spectrum of the coding noise in order to minimize the perceived coding noise. This is achieved by exploiting the masking properties of the human auditory system. Conventional NFC codecs typically only use a short-term noise feedback filter to shape the spectral envelope of the coding noise, and a scalar quantizer is used universally. In contrast, Broadcom’s Two-Stage Noise Feedback Coding (TSNFC) system uses a codec structure employing two stages of noise feedback coding in a nested loop: the first NFC stage performs short-term prediction and short-term noise spectral shaping (spectral envelope shaping), and the second nested NFC stage performs long-term prediction and long-term noise spectral shaping (harmonic shaping). Such a nested two-stage NFC structure is shown in Figure 1 below. 1 The “BV32 speech codec” specification is based on Broadcom Corporation’s BroadVoice®32 speech codec. Implementation of this standard may require a license of Broadcom patents; information regarding these patents, and a declaration of licensing intent, may be found at the SCTE web site. 1 Input signal s(n ) d (n ) + + + - u(n ) v (n ) + + Quantizer - + Ps (z ) short-term predictor + Nl (z) −1 + vq(n) uq(n ) + Output signal - Ps (z ) q(n ) short-term predictor long-term noise feedback filter + + sq(n ) Pl (z ) + - long-term predictor qs(n ) Fs (z ) short-term noise feedback filter Figure 1 Basic codec structure of Two-Stage Noise Feedback Coding (TSNFC) In Figure 1 above, the outer layer (including the two short-term predictors and the short-term noise feedback filter) follows the structure of the conventional NFC codec. The TSNFC structure in Figure 1 is obtained by replacing the simple scalar quantizer in the conventional (single-stage) NFC structure by a “predictive quantizer” that employs long-term prediction and long-term noise spectral shaping. This “predictive quantizer” is represented by the inner feedback loop in Figure 1, including the long-term predictor and long-term noise feedback filter. This inner feedback loop uses an alternative but equivalent conventional NFC structure, where N l (z ) represents the filter whose frequency response is the desired noise shape for long-term noise spectral shaping. In the outer layer, the short-term noise feedback filter Fs (z ) is usually chosen as a bandwidth-expanded version of the short-term predictor Ps (z ) . The choice of different NFC structures in the outer and inner layers is based on complexity consideration. By combining two stages of NFC in a nested loop, the TSNFC in Figure 1 can reap the benefits of both short-term and long-term prediction and also achieve short-term and long-term noise spectral shaping at the same time. It is natural and straightforward to use a scalar quantizer in Figure 1. However, to achieve better coding efficiency, a vector quantizer is used in BV32. In the Vector Quantization (VQ) codebook search, the u(n ) vector cannot be generated before the VQ codebook search starts. Due to the feedback structure in Figure 1, the elements of u(n ) from the second element on will depend on the vector-quantized version of earlier elements. Therefore, the VQ codebook search is performed by trying out each of the candidate codevectors in the VQ codebook (i.e. fixing a candidate uq(n ) vector first), calculating the corresponding u(n ) vector and the corresponding VQ error q( n ) = u( n ) − uq( n ) . The VQ codevector that minimizes the energy of q(n ) within the current 2 vector time span is chosen as the winning codevector, and the corresponding codebook index becomes part of the encoder output bit stream for the current speech frame. The TSNFC decoder structure is simply a quantizer decoder followed by the two feedback filter structures involving the long-term predictor and the short-term predictor, respectively, shown on the right half of Figure 1. Thus, the TSNFC decoder is similar to the decoders of other predictive coding techniques such as Adaptive Predictive Coding (APC), Multi-Pulse Linear Predictive Coding (MPLPC), and Code-Excited Linear Prediction (CELP). 2.2 Overview of the BV32 Codec The BV32 codec is a purely forward-adaptive TSNFC codec. It operates at an input sampling rate of 16 kHz and an encoding bit rate of 32 kb/s, or 2 bits per sample. The BV32 uses a frame size of 5 ms, or 80 samples. There is no look ahead. Therefore, the total algorithmic buffering delay is just the frame size itself, or 5 ms. The main design goal of BV32 is to make the coding delay and the codec complexity as low as possible, while maintaining essentially transparent output speech quality. Due to the small frame size, the parameters of the short-term predictor (also called the “LPC predictor”) and the long-term predictor (also called the “pitch predictor”) are both transmitted and updated once a frame. Each 5 ms frame is divided equally into two 2.5 ms sub-frames (40 samples each). The gain of the excitation signal is transmitted once every sub-frame. The excitation VQ uses a vector dimension of 4 samples. Hence, there are 10 excitation vectors in a sub-frame, and 20 vectors in a frame. Figure 2 shows a block diagram of the BV32 encoder. More detailed description of each functional block will be given in Section 3. 3 LSPI 10 20 95 Short-term predictive analysis & quantization Long-term predictive analysis & quantization PPI Output bit stream Bit multiplexer PPTI GI CI e(n) 45 s(n) + + + - 40 5 d(n) 30 55 v(n) 75 + + 85 Prediction residual quantizer u(n) - dq(n) + 65 70 Preemphasis filter + ltnf(n) q(n) Long-term noise feedback filter 60 ppv(n) Input signal High-pass pre-filter + 80 - + Shortterm predictor uq(n) 3 + stnf(n) + Longterm predictor 90 - 50 Short-term noise feedback filter qs(n) Figure 2 Block diagram of the BV32 encoder The BV32 encoder first passes the input signal through a fixed pole-zero high-pass pre-filter to remove possible DC bias or low frequency rumble. The filtered signal is then further passed through a fixed pole-zero pre-emphasis filter that provides a general high-pass spectral tilt. The resulting pre-emphasized signal is then used to derive the LPC predictor coefficients. To keep the complexity low, the BV32 uses a relatively low LPC predictor order of 8, and the LPC analysis window is only 10 ms (160 samples) long. The LPC analysis window is asymmetric, with the peak of the window located at the center of the current frame, and the end of the window coinciding with the last sample of the current frame. Autocorrelation LPC analysis based on Levinson-Durbin recursion is used to derive the coefficients of the 8th-order LPC predictor. The derived LPC predictor coefficients are converted to Line-Spectrum Pair (LSP) parameters, which are then quantized by an inter-frame predictive coding scheme. The inter-frame prediction of LSP parameters uses an 8th-order moving-average (MA) predictor. The MA predictor coefficients are fixed. The time span that this MA predictor covers is 8 × 5 ms = 40 ms. The inter-frame LSP prediction residual is quantized by a two-stage vector quantizer. The first stage employs an 8-dimensional vector quantizer with a 7-bit codebook. The second stage uses 4 a split vector quantizer with a 3-5 split and 5 bits each. That is, the first three elements are vector quantized to 5 bits, and the remaining 5 elements are also quantized to 5 bits. For long-term prediction, a three-tap pitch predictor with an integer pitch period is used. To keep the complexity low, the pitch period and the pitch taps are both determined in an open-loop fashion. The three pitch predictor taps are jointly quantized using a 5-bit vector quantizer. The distortion measure used in the codebook search is the energy of the open-loop pitch prediction residual. The 32 codevectors in the pitch tap codebook have been “stabilized” to make sure that they will not give rise to an unstable pitch synthesis filter. The excitation gain is also determined in an open-loop fashion to keep the complexity low. The average power of the open-loop pitch prediction residual within the current sub-frame is calculated and converted to the logarithmic domain. The resulting log-gain is then quantized using intersubframe MA predictive coding. The MA predictor order for the log-gain is 16, corresponding to a time span of 16 × 2.5 = 40 ms. Again, the log-gain MA predictor coefficients are fixed. The loggain prediction residual is quantized by a 5-bit scalar quantizer. The 4-dimensional excitation VQ codebook has a simple sign-shape structure, with 1 bit for sign, and 5 bits for shape. In other words, only 32 four-dimensional codevectors are stored, but the mirror image of each codevector with respect to the origin is also a codevector. In the BV32 decoder, the decoded excitation vectors are scaled by the excitation gain. The scaled excitation signal passes through a long-term synthesis filter and a short-term synthesis filter, and finally through a fixed pole-zero de-emphasis filter which is the inverse filter of the pre-emphasis filter in the encoder. Figure 3 shows the block diagram of the BV32 decoder. Long-term synthesis filter 100 Input bit stream 130 GI Bit demultiplexer CI Prediction residual uq(n) quantizer decoder 110 PPI PPTI 155 Short-term synthesis filter 170 150 sq(n) dq(n) + + 160 140 Longterm predictor Shortterm predictor Long-term predictive parameter decoder 120 Short-term predictive parameter decoder LSPI Figure 3 Block diagram of the BV32 decoder 5 175 180 Deemphasis filter Output signal Table 1 shows the bit allocation of BV32 in each 5 ms frame. The LSP parameters are encoded into 17 bits per frame, including 7 bits for the first-stage VQ, and 5 + 5 = 10 bits for the second-stage split VQ. The pitch period and pitch predictor taps are encoded into 8 and 5 bits, respectively. The two excitation gains in each frame are encoded into 5 + 5 = 10 bits. The 20 excitation vectors are each encoded with 1 bit for sign and 5 bits for shape, resulting in 120 bits per frame for excitation VQ. Including the other 40 bits of side information, the grand total is 160 bits per 80-sample frame, which is 2 bits/sample, or 32 kb/s. Parameter Bits per frame (80 samples) LSP 7 + (5 + 5) = 17 Pitch Period 8 3 Pitch Predictor 5 Taps 2 Excitation Gains 5 + 5 = 10 20 Excitation Vectors (1 + 5) × 20 = 120 Total 160 Table 1 Bit allocation of the BV32 codec 6 3 DETAILED DESCRIPTION OF THE BV32 ENCODER In this section, detailed description of each functional block of the BV32 encoder in Figure 2 is given. When necessary, certain functional blocks will be expanded into more detailed block diagrams. The description given in this section will be in sufficient detail that will allow those skilled in the art to implement a mathematically equivalent BV32 encoder. 3.1 High-Pass Pre-Filtering Refer to Figure 2. The input signal is assumed to be represented by 16-bit linear PCM. Block 3 is a high-pass pre-filter with fixed coefficients. It is a first-order pole-zero filter with the following transfer function. ( 255 / 256)(1 − z −1 ) H hpf ( z ) = 1 − (127 / 128) z −1 This high-pass pre-filter filters the input signal to remove undesirable low-frequency components, and passes the filtered signal to the pre-emphasis filter (block 5). 3.2 Pre-emphasis Filtering Block 5 is a first-order pole-zero pre-emphasis filter with fixed coefficients. It has the following transfer function. H pe ( z ) = 1 + 0.5z −1 1 + 0.75z −1 It filters the high-pass pre-filtered signal (the output signal of block 3) and gives an output signal denoted as s(n ) in Figure 2, where n is the sample index. 3.3 Short-Term Linear Predictive Analysis The high-pass filtered and pre-emphasized signal s(n ) is buffered at block 10, which performs short-term linear predictive analysis and quantization to obtain the coefficients for the short-term predictor 40 and the short-term noise feedback filter 50. This block 10 is further expanded in Figure 4. 7 To block 50 and block 21 To block 40 LSPI 10 a′i 17 18 a~i Bandwidth expansion r(i ) White noise correction & spectral smoothing LSP quantizer li Convert to LSP ai 12 11 Apply window & calculate autocorrelation Convert to predictor coefficients 15 16 ~ li 13 rˆ(i ) LevinsonDurbin recursion 14 âi Bandwidth expansion s(n) Figure 4 BV32 short-term linear predictive analysis and quantization (block 10) Refer to Figure 4. The input signal s(n ) is buffered in block 11, where a 10 ms asymmetric analysis window is applied to the buffered s(n ) signal array. The “left window” is 7.5 ms long, and the “right window” is 2.5 ms long. Let LWINSZ be the number of samples in the left window (LWINSZ = 120 for 16 kHz sampling), then the left window is given by wl ( n ) = 1⎡ nπ ⎞⎤ ⎛ 1 − cos⎜ ⎟⎥ , n = 1, 2, …, LWINSZ. ⎢ 2⎣ ⎝ LWINSZ + 1 ⎠⎦ Let RWINSZ be the number of samples in the right window. Then, RWINSZ = 40 for 16 kHz sampling. The right window is given by ⎛ ( n − 1)π ⎞ wr ( n ) = cos⎜ ⎟ , n = 1, 2, …, RWINSZ . ⎝ 2 RWINSZ ⎠ The concatenation of wl(n) and wr(n) gives the 10 ms asymmetrical analysis window, with the peak of the window located at the center of the current frame. When applying this analysis window, the last sample of the window is lined up with the last sample of the current frame. Therefore, the codec does not use any look ahead. More specifically, without loss of generality, let the sampling time index range of n = 1, 2, …, FRSZ corresponds to the current frame, where the frame size FRSZ = 80 for BV32. Then, the s(n) signal 8 buffer stored in block 11 is for n = -79, -78, …, -1, 0, 1, 2, …, 80. The asymmetrical LPC analysis window function can be expressed as ⎧ wl ( n + 80), w( n ) = ⎨ ⎩ wr( n − 40), n = −79, − 78, ..., 40 . n = 41, 42, ..., 80 The windowing operation is performed as follows. sw ( n ) = s( n )w( n ), n = -79, -78, …, -1, 0, 1, 2, …, 80. Next, block 11 calculates the autocorrelation coefficients as follows. r (i ) = 80 ∑s w n = −79 + i ( n ) sw ( n − i ) , i = 0, 1, 2, …, 8. The calculated autocorrelation coefficients are passed to block 12, which applies a Gaussian window to the autocorrelation coefficients to perform spectral smoothing. The Gaussian window function is given by ( − 2π iσ / f s gw(i ) = e )2 , i = 1, 2, …, 8, 2 where f s is the sampling rate of the input signal, expressed in Hz, and σ is 40 Hz. After multiplying the r(i) array by such a Gaussian window, block 12 then multiplies r(0) by a white noise correction factor of WNCF = 1 + ε , where ε = 0.0001. In summary, the output of block 12 is given by i=0 ⎧1.0001 × r(0), rˆ(i ) = ⎨ i = 1,2,...,8 ⎩ gw(i )r(i ), Block 13 performs the Levinson-Durbin recursion to convert the autocorrelation coefficients rˆ(i ) to the short-term predictor coefficients âi , i = 0, 1, …, 8. If the Levinson-Durbin recursion exits prematurely before the recursion is completed (for example, because the prediction residual energy E(i) is less than zero), then the short-term predictor coefficients of the last frame is also used in the current frame. To do the exception handling this way, there needs to be an initial value of the âi array. The initial value of the âi array is set to aˆ0 = 1 and âi = 0 for i = 1, 2, …, 8. The LevinsonDurbin recursion is performed in the following algorithm. 1. If rˆ(0) ≤ 0 , use the âi array of the last frame, and exit the Levinson-Durbin recursion. 2. E (0) = rˆ(0) 3. k1 = − rˆ(1) / rˆ(0) (1) 4. aˆ1 = k1 9 2 5. E (1) = (1 − k1 ) E (0) 6. If E (1) ≤ 0 , use the âi array of the last frame, and exit the Levinson-Durbin recursion. 7. For i = 2, 3, 4, …, 8, do the following i −1 ki = aˆi − rˆ(i ) − ∑ aˆ j rˆ(i − j ) j =1 E (i − 1) (i ) = ki (i ) = aˆ j aˆ j ( i −1) ( i −1) ( i −1) + ki aˆi − j , for j = 1, 2, ..., i − 1 2 E (i ) = (1 − ki ) E (i − 1) If E (i ) ≤ 0, use the aˆi array of the last frame, and exit the Levinson - Durbin recursion. If the recursion is exited pre-maturely, the âi array of the last frame is used as the output of block 13. If the recursion is completed successfully (which is normally the case), then the final output of block 13 is taken as aˆ0 = 1 (8 ) aˆi = aˆi , for i = 1, 2, …, 8 Block 14 performs bandwidth expansion as follows ai = (0.96852)i aˆi , for i = 0, 1, …, 8. 3.4 Conversion to LSP In Figure 4, block 15 converts the LPC coefficients ai , i = 1, 2,K, 8 of the prediction error filter given by 8 A( z ) = 1 + ∑ ai z − i i =1 to a set of 8 Line-Spectrum Pair (LSP) coefficients li , i = 1, 2,K, 8 . The LSP coefficients, also known as the Line Spectrum Frequencies (LSF), are the angular positions normalized to 1, i.e. 1.0 corresponds to the Nyquist frequency, of the roots of Ap ( z ) = A( z ) + z −9 A( z −1 ) and 10 Am ( z ) = A( z ) − z −9 A( z −1 ) on the upper half of the unit circle, z = e jω , 0 ≤ ω ≤ π , less the trivial roots in z = −1 and z = 1 of Ap (z ) and Am (z ) , respectively. Due to the symmetry and anti-symmetric of Ap (z ) and Am (z ) , respectively, the roots of interest can be determined as the roots of 4 G p (ω ) = ∑ g p ,i cos(iω ) i =0 and 4 Gm (ω ) = ∑ g m,i cos(iω ) i =0 where ⎧ f p |m , 4 g p |m,i = ⎨ ⎩2 f p |m , 4 − i i=0 i = 1,K,4 in which i=0 ⎧1.0 f p ,i = ⎨ ⎩ai + a9 − i − f p ,i −1 i = 1,K,4 and i=0 ⎧1.0 . f m,i = ⎨ ⎩ai − a9−i + f p ,i −1 i = 1,K ,4 The subscript "p|m" means dual versions of the equation exist, with either subscript "p" or subscript "m". The roots of Ap (z ) and Am (z ) , and therefore the roots of G p (ω ) and Gm (ω ) , are interlaced, with the first root belonging to G p (ω ) . The evaluation of the functions G p (ω ) and Gm (ω ) are carried out efficiently using Chebyshev polynomial series. With the mapping x = cos(ω ) , cos(mω ) = Tm (x ) where Tm ( x ) is the mth-order Chebyshev polynomial, the two functions G p (ω ) and Gm (ω ) can be expressed as 11 4 G p |m ( x ) = ∑ g p |m,i Ti ( x ) . i =0 Due to the recursive nature of Chebyshev polynomials the functions can be evaluated as G p|m ( x ) = b p|m,0 ( x ) − bp|m, 2 ( x ) + g p|m,0 2 where bp|m,0 ( x ) and bp|m, 2 ( x ) are calculated using the following recurrence b p | m , i ( x ) = 2 x b p | m , i +1 ( x ) − b p | m , i + 2 ( x ) + g p | m , i with initial conditions bp|m,5 ( x ) = bp|m,6 ( x ) = 0 . The roots of G p (x ) and Gm (x ) are determined in an alternating fashion starting with a root in G p (x ) . Each root of G p (x ) and Gm (x ) is located by identifying a sign change of the relevant function along a grid of 60 points, given in Appendix 1. The estimation of the root is then refined using 4 bisections followed by a final linear interpolation between the two points surrounding the root. It should be noted that the roots and grid points are in the cosine domain. Once the 8 roots xi = cos(ωi ), i = 1, 2,K, 8 are determined in the cosine domain, they are converted to the normalized frequency domain according to li = cos −1 ( xi ) π , i = 1, 2,K, 8 in order to obtain the LSP coefficients. In the rare event that less than 8 roots are found the routine returns the LSP coefficients of the previous frame, li ( k − 1), i = 1, 2,K, 8 , where the additional parameter k represents the frame index of the current frame. The LSP coefficients of the previous frame at the very first frame are initialized to li (0) = i / 9, 3.5 i = 1, 2,K, 8 . LSP Quantization Block 16 of Figure 4 vector quantizes and encodes the LSP coefficient vector, l = [l1 l2 K l8 ] T , to a total of 17 bits. The output LSP quantizer index array, LSPI = {LSPI 1 , LSPI 2 , LSPI 3 }, is passed 12 ~ ~ ~ ~ to the bit multiplexer (block 95), while the quantized LSP coefficient vector, l = [ l1 l2 K l8 ] T , is passed to block 17. The LSP quantizer is based on mean-removed inter-frame moving-average (MA) prediction with two-stage vector quantization (VQ) of the prediction error. The quantizer enables bit-error detection at the decoder by constraining the codevector selection at the encoder. It should be noted that the encoder must perform the specified constrained VQ in order to maintain interoperability properly. The first-stage VQ is searched using the simple mean-squared error (MSE) distortion criterion, while both lower and upper splits of the second-stage split VQ are searched using the weighted meansquare error (WMSE) distortion criterion. 1614 LSP spacing ( l ~ e2 + 163 l l̂ Input LSP vector + l 8th order MA prediction 1612 + - e1 + 162 Quantized LSP vector 1615 Second stage VQ 1613 Mean LSP vector ~ l + + Append subvectors 1611 First stage VQ 167 165 ê1 - + e2 Regular 8 dimensional MSE VQ 166 ~ e21- 164 + + e 22 ~ e22 , 2 e 22,1 168 Constrained 3 dimensional WMSE VQ Split vector into two sub-vectors with lower 3 and upper 5 elements e 22, 2 161 Calculate LSP weights ~e 22 ,1 1610 ~ e22 w 169 Regular 5 dimensional WMSE VQ w LSPI1 LSPI3 Index sub-quantizer 1 Index sub-quantizer 3 LSPI2 Index sub-quantizer 2 Figure 5 BV32 LSP quantizer (block 16) Block 16 is further expanded in Figure 5. The first stage VQ takes place in block 165, and the second stage split VQ takes place in block 1615. Except for the LSP quantizer indices LSPI1 , LSPI 2 , LSPI 3 and split vectors, all signal paths in Figure 5 are for vectors of dimension 8. Block 161 uses the unquantized LSP coefficient vector to calculate the weights to be used later in the second-stage WMSE VQs. The weights are determined as i =1 ⎧ 1 /( l2 − l1 ), ⎪ wi = ⎨ 1 / min( li − li −1 , li +1 − li ), 1 < i < 8 . ⎪ 1 /( l − l ), i =8 M M −1 ⎩ Basically, the i-th weight is the inverse of the distance between the i-th LSP coefficient and its nearest neighbor LSP coefficient. 13 Adder 162 subtracts the constant LSP mean vector, l = [0.0551453 0.1181030 0.2249756 0.3316040 0.4575806 0.5720825 0.7193298 0.8278198] T , from the unquantized LSP coefficient vector to get the mean-removed LSP vector, e1 = l − l . In Figure 5, block 163 performs 8th order inter-frame MA prediction of the mean-removed LSP vector e1 based on the ~e2 vectors in the previous 8 frames, where ~e2 is the quantized version of the inter-frame LSP prediction error vector 2. Let ~e2,i ( k ) denote the i-th element of the vector ~e2 in the frame that is k frames before the current frame. Let eˆ1,i be the i-th element of the inter-framepredicted mean-removed LSP vector ê1 . Then, block 163 calculates the predicted LSP vector according to T eˆ1,i = p LSP ,i ⋅ [~ e2,i (1) T ~ e2,i ( 2) ~ e2,i (3) ~ e2,i ( 4) ~ e2,i (5) ~ e2,i (6) ~ e2,i (7) ~ e2,i (8) ] , i = 1, 2,K, 8 , where p LSP ,i holds the 8 prediction coefficients for the i-th LSP coefficient and is given by p LSP ,1T = [ 0.7401123 0.6939697 0.6031494 0.5333862 0.4295044 0.3234253 0.2177124 0.1162720 ] p LSP ,2T = [ 0.7939453 0.7693481 0.6712036 0.5919189 0.4750366 0.3556519 0.2369385 0.1181030 ] p LSP ,3T = [ 0.7534180 0.7318115 0.6326294 0.5588379 0.4530029 0.3394775 0.2307739 0.1201172 ] p LSP ,4T = [ 0.7188110 0.6765747 0.5792847 0.5169067 0.4223022 0.3202515 0.2235718 0.1181030 ] p LSP ,5T = [ 0.6431885 0.6023560 0.5112305 0.4573364 0.3764038 0.2803345 0.2060547 0.1090698 ] p LSP ,6T = [ 0.5687866 0.5837402 0.4616089 0.4351196 0.3502808 0.2602539 0.1951294 0.0994263 ] p LSP ,7T = [ 0.5292969 0.4835205 0.3890381 0.3581543 0.2882080 0.2261353 0.1708984 0.0941162 ] . p LSP ,8T = [ 0.5134277 0.4365845 0.3521729 0.3118896 0.2514038 0.1951294 0.1443481 0.0841064 ] Adder 164 calculates the prediction error vector e 2 = e1 − ê1 , which is the input to the first-stage VQ. In block 165 the 8-dimensional prediction error vector, e 2 , { (0) (1) (127 ) } is vector quantized with the 128-entry, 8-dimensional codebook, CB1 = cb1 , cb1 ,K, cb1 , listed in Appendix 2. The codevector minimizing the MSE is denoted ~e21 and the corresponding index is denoted LSPI 1 : 2 At the first frame, the previous, non-existing, quantized interframe LSP prediction error vectors are set to zero-vectors. 14 {( ) (e (k ) T LSPI1 = arg min e 2 − cb1 k ∈{0 ,1,K,127} (k ) 2 − cb1 )}, ( LSPI ) ~ e21 = cb1 1 , where the notation I = arg min{D(i )} means that I is the argument that minimizes the entity D(i ) , i i.e. D( I ) ≤ D(i ) for all i . Adder 166 subtracts the first-stage codevector from the prediction error vector to form the quantization error vector of the first stage, e 22 = e 2 − ~ e21 . This is the input to the second-stage VQ, which is a two-split VQ. Block 167 splits the quantization error vector of the first stage into a lower sub-vector, e 22,1 , with the first 3 elements of e 22 , e 22,1 = [ e22,1 e22, 2 e22,3 ] T , and an upper sub-vector, e 22, 2 , with the last 5 elements of e 22 , e 22, 2 = [e22, 4 e22,5 e22,6 e22,7 e22,8 ] T . e22,1 and ~ e22, 2 , respectively. The two sub-vectors are quantized independently into ~ Block 168 performs a constrained VQ of the 3-dimensional vector, e 22,1 , using the 32-entry { (0) (1) ( 31) } codebook, CB 21 = cb 21 , cb 21 ,K, cb 21 , of Appendix 3. The codevector that minimizes the WMSE subject to the constraint that the 3 first elements of the intermediate quantized LSP vector, ( l = ˆl + ~ e2 e22,1 ⎤ ⎡~ = l + eˆ 1 + ~ e21 + ⎢ ~ ⎥ ⎣ e22, 2 ⎦ preserve the ordering property ( l1 ( l2 ( l3 ≥ 0 ( ≥ l1 , ( ≥ l2 15 , is selected as ~ e22,1 , and the corresponding index is denoted LSPI 2 . In the inequality above, the ( ( symbol li represents the i-th element of the vector l . The constrained WMSE VQ is given by LSPI 2 = arg min { ( ( ( ( ( k ∈ j l1( j ) ≥ 0 , l 2( j ) ≥ l1( j ) , l 3( j ) ≥ l 2( j ) , j∈{0 ,1,K, 31} {(e } ) (k ) T 22 ,1 − cb 21 ( (k ) W1 e 22,1 − cb 21 )} , ( LSPI ) ~ e22,1 = cb 21 2 , where ⎡w1 0 W1 = ⎢ 0 w2 ⎢ ⎢⎣ 0 0 0⎤ 0⎥ , ⎥ w3 ⎥⎦ (( j ) ( and li is the i-th element of the reconstructed LSP vector l that is generated by using the j-th codevector in CB 21 . In the highly unlikely, but possible, event that no codevector satisfy the ordering property of the intermediate quantized LSP vector, the quantizer selects the codevector (1) cb 21 and returns the index LSPI 2 = 1 . Block 169 performs an unconstrained WMSE VQ of the 5-dimensional vector, e 22, 2 , using the 32- { (0) (1) entry codebook, CB 22 = cb 22 , cb 22 ,K, cb 22 ( 31) }, given in Appendix 4, according to {( LSPI 3 = arg min e 22, 2 − cb 22 k ∈{0 ,1,K, 31} ) (k ) T ( W2 e 22, 2 − cb 22 (k ) )}, ( LSPI 3 ) ~ e22, 2 = cb 22 , where ⎡ w4 ⎢ W2 = ⎢ ⎢ ⎢ ⎣0 w5 0⎤ ⎥ ⎥. ⎥ O ⎥ w8 ⎦ The quantization is complete and the remaining operations of block 16 construct the quantized LSP vector from the codevectors, LSP mean, and MA prediction. Block 1610 concatenates the two quantized split vectors to obtain e22,1 ⎤ ⎡~ ~ e22 = ⎢ ~ ⎥ , ⎣ e22, 2 ⎦ 16 the quantized version of the quantization error vector of the first stage VQ. Adder 1611 calculates the quantized prediction error vector by adding the stage 1 and stage 2 quantized vectors, ~ e2 = ~ e21 + ~ e22 . Adder 1612 adds the mean LSP vector and the predicted mean-removed LSP vector to obtain the predicted LSP vector, ˆl = l + eˆ . 1 Adder 1613 adds the predicted LSP vector and the quantized prediction error vector to get the intermediate reconstructed LSP vector, ( l = ˆl + ~e2 . Block 1614 calculates the final quantized LSP coefficients by enforcing a minimum spacing of 100 Hz between adjacent LSP coefficients, as well as an absolute minimum of 12 Hz for the first LSP coefficient and an absolute maximum of 7982 Hz for the eighth LSP coefficient. The spacing constraints are given by ~ l1 ≥ 0.0015 ~ ~ li +1 − li ≥ 0.0125 i = 1, 2,K, 7 . ~ l8 ≤ 0.99775 The spacing is carried out as follows: (i) The elements of the intermediate reconstructed LSP vector are sorted such that ( ( ( l1 ≤ l 2 ≤ K ≤ l8 . (ii) (iii) (iv) Set lmax = 0.91025 . ( ~ If l1 < 0.0015 , set l1 = 0.0015 . ( ~ else if l1 > lmax , set l1 = lmax . ~ ( else set l1 = l1 . for i=2, 3, … , 8 do the following: ~ 1. Set lmin = li −1 + 0.0125 . 2. Set lmax ← lmax + 0.0125 . ( ~ 3. If li < lmin , set li = lmin . ( ~ else if li > lmax , set li = lmax . ~ ( else set li = li . 17 3.6 Conversion to Short-Term Predictor Coefficients ~ Refer back to Figure 4. In block 17, the quantized set of LSP coefficients { li }, which is determined once a frame, is converted to the corresponding set of linear prediction coefficients { a~i }, the quantized linear prediction coefficients for the current frame. With the notation x p ,i xm,i ~ = cos(π l2i −1 ), i = 1, 2, 3, 4 ~ = cos(π l2i ), i = 1, 2, 3, 4 the 4 unique coefficients of each of the two polynomials ApΔ ( z ) = Ap ( z ) /(1 + z −1 ) and AmΔ ( z ) = Am ( z ) /(1 − z −1 ) can be determined using the following recursion: For i = 1, 2, 3, 4, do the following : ( a Δp |m,i = 2 a Δp |m,i − 2 − x p |m,i a Δp |m,i −1 ) a Δp |m, j = a Δp |m, j + a Δp |m, j − 2 − 2 x p |m,i a Δp |m, j −1 , j = i − 1, i − 2,K, 1 with initial conditions a Δp|m,0 = 1 and a Δp|m, −1 = 0 . In the recursion above, {a Δp ,i } and {amΔ ,i } are the sets of four unique coefficients of the polynomials ApΔ (z ) and AmΔ (z ) , respectively. Similarly, let the two sets of coefficients {a p ,i } and {am,i } , each of 4 unique coefficients except for a sign on {am,i } , represent the unique coefficients of the polynomials Ap (z ) and Am (z ) , respectively. Then, {a p ,i } and {am,i } can be obtained from {a Δp ,i } and {amΔ ,i } as a p ,i = a Δp ,i + a Δp ,i −1 , i = 1, 2, 3, 4 am , i = amΔ ,i − amΔ ,i −1 , i = 1, 2, 3, 4 From Ap (z ) and Am (z ) , the polynomial of the prediction error filter is obtained as A p ( z ) + Am ( z ) ~ A( z ) = . 2 ~ In terms of the unique coefficients of Ap (z ) and Am (z ) , the coefficients {a~i } of A( z ) can be expressed as 18 ⎧1.0, i=0 ⎪ ~ ai = ⎨0.5 ( a p ,i + am,i ), i = 1, 2, 3, 4 ⎪0.5 ( a i = 5, 6, 7, 8 p , 9 − i − am , 9 − i ), ⎩ where the tilde signifies that the coefficients correspond to the quantized LSP coefficients. Note that 8 ~ A ( z ) = 1 − Ps ( z ) = 1 + ∑ a~i z − i , i =1 where 8 Ps ( z ) = −∑ a~i z − i i =1 is the transfer function of the short-term predictor block 40 in Figure 2. Block 18 performs further bandwidth expansion on the set of predictor coefficients { a~i } using a bandwidth expansion factor of γ 1 = 0.75. The resulting bandwidth-expanded set of filter coefficients is given by i ai′ = γ 1 a~i , for i = 1, 2, …, 8. This bandwidth-expanded set of filter coefficients { ai′ } are used to update the coefficients of the short-term noise feedback filter block 50 in Figure 2 and the coefficients of the weighted short-term synthesis filter block 21 in Figure 6 (to be discussed later). This completes the description of shortterm predictive analysis and quantization block 10 in Figure 2 and Figure 4. 3.7 Short-Term Linear Prediction of Input Signal Now refer to Figure 2. The short-term predictor block 40 predicts the input signal sample s(n) based on a linear combination of the preceding 8 samples. The adder 45 subtracts the resulting predicted value from s(n) to obtain the short-term prediction residual signal, or the difference signal, d(n). The combined operation of blocks 40 and 45 is summarized in the following difference equation. 8 d ( n ) = s( n ) + ∑ a~i s( n − i ) i =1 3.8 Long-Term Linear Predictive Analysis (Pitch Extraction) In Figure 2, the long-term predictive analysis and quantization block 20 uses the short-term prediction residual signal d(n) of the current frame and its quantized version dq(n) in the previous 19 frames to determine the quantized values of the pitch period and the pitch predictor taps. This block 20 is further expanded in Figure 6 below. 20 21 d(n) weighted short-term synthesis filter 23 22 Decimate to 2 kHz sampling rate Low-pass filter to 800 Hz dw(n) dwd(n) 24 First-stage pitch period search cpp 25 Second-stage pitch period search PPI ppt1 pp 27 Calculate long-term noise feedback filter coefficient 26 Pitch predictor taps quantizer λ PPTI ppt To block 65 To blocks 30, 60 & 65 dq(n) To block 30 & 60 Figure 6 BV32 long-term predictive analysis and quantization (block 20) Now refer to Figure 6. The short-term prediction residual signal d(n) passes through the weighted short-term synthesis filter block 21, whose output is calculated as 8 dw( n ) = d ( n ) − ∑ ai′dw( n − i ) i =1 The signal dw(n) is passed through a fixed low-pass filter block 22, which has a –3 dB cut off frequency at about 800 Hz. A 4th-order elliptic filter is used for this purpose. The transfer function of this low-pass filter is H lpf ( z ) = 0.0322952 − 0.1028824 z −1 + 0.1446838 z −2 − 0.1028824 z −3 + 0.0322952 z −4 1 − 3.5602306 z −1 + 4.8558478 z − 2 − 2.9988298 z − 3 + 0.7069277 z − 4 20 Block 23 down-samples the low-pass filtered signal to a sampling rate of 2 kHz. This represents an 8:1 decimation for the BV32 codec. The first-stage pitch search block 24 then uses the decimated 2 kHz sampled signal dwd(n) to find a “coarse pitch period”, denoted as cpp in Figure 6. The time lag represented by cpp is in terms of number of samples in the 2 kHz down-sampled signal dwd(n). A pitch analysis window of 15 ms is used. The end of the pitch analysis window is lined up with the end of the current frame. At a sampling rate of 2 kHz, 15 ms correspond to 30 samples. Without loss of generality, let the index range of n = 1 to n = 30 correspond to the pitch analysis window for dwd(n). Block 24 first calculates the following values 30 c( k ) = ∑ dwd ( n )dwd ( n − k ) , n =1 30 E ( k ) = ∑ [dwd ( n − k )] , 2 n =1 ⎧ c 2 ( k ), if c( k ) ≥ 0 c 2( k ) = ⎨ 2 ⎩− c ( k ), if c( k ) < 0 for all integers from k = MINPPD - 1 to k = MAXPPD + 1, where MINPPD and MAXPPD are the minimum and maximum pitch period in the decimated domain, respectively. For BV32, MINPPD = 1 sample and MAXPPD = 33 samples. Block 24 then searches through the range of k = MINPPD, MINPPD + 1, MINPPD + 2, …, MAXPPD to find all local peaks 3 of the array { c 2( k ) / E ( k ) } for which c(k) > 0. Let N p denote the number of such positive local peaks. Let k p ( j ) , j =1, 2, …, N p be the indices where c 2( k p ( j )) / E ( k p ( j )) is a local peak and c( k p ( j )) > 0, and let k p (1) < k p ( 2) < ... < k p ( N p ) . For convenience, the term c 2( k ) / E ( k ) will be referred to as the “normalized correlation square”. If N p = 0, the output coarse pitch period is set to cpp = MINPPD, and the processing of block 24 is terminated. If N p = 1, block 24 output is set to cpp = k p (1) , and the processing of block 24 is terminated. If there are two or more local peaks ( N p ≥ 2 ), then block 24 uses Algorithms 3.8.1, 3.8.2, 3.8.3, and 3.8.4 (to be described below), in that order, to determine the output coarse pitch period cpp. Variables calculated in the earlier algorithms will be carried over and used in the later algorithms. Block 24 first uses Algorithm 3.8.1 below to identify the largest quadratically interpolated peak around local peaks of the normalized correlation square c 2( k p ) / E ( k p ) . Quadratic interpolation is performed for c( k p ) , while linear interpolation is performed for E ( k p ) . Such interpolation is performed with the time resolution for the sampling rate of the input speech (16 kHz for BV32). In 3 A value is characterized as a local peak if both of the adjacent values are smaller. 21 the algorithm below, D denotes the decimation factor used when decimating dw(n) to dwd(n). Thus, D = 8 for BV32. Algorithm 3.8.1 Find largest quadratically interpolated peak around c 2( k p ) / E ( k p ) : (i) Set c2max = -1, Emax = 1, and jmax = 0. (ii) For j =1, 2, …, N p , do the following 12 steps: [ Set b = 0.5 [c( k ] ( j ) − 1)] 1. Set a = 0.5 c( k p ( j ) + 1) + c( k p ( j ) − 1) − c( k p ( j )) 2. p ( j ) + 1) − c( k p 3. Set ji = 0 4. Set ei = E ( k p ( j )) 5. Set c 2m = c 2( k p ( j )) 6. Set Em = E ( k p ( j )) 7. If c 2( k p ( j ) + 1) E ( k p ( j ) − 1) > c 2( k p ( j ) − 1) E ( k p ( j ) + 1) , do the remaining part of step 7: Δ = [ E ( k p ( j ) + 1) − ei ] D For k = 1, 2, … , D/2, do the following indented part of step 7: ci = a ( k / D ) 2 + b ( k / D ) + c( k p ( j )) ei ← ei + Δ If (ci ) 2 Em > (c 2m) ei , do the next three indented lines: ji = k c 2m = (ci ) 2 Em = ei 8. If c 2( k p ( j ) + 1) E ( k p ( j ) − 1) ≤ c 2( k p ( j ) − 1) E ( k p ( j ) + 1) , do the remaining part of step 8: Δ = [ E ( k p ( j ) − 1) − ei ] D For k = -1, -2, … , -D/2, do the following indented part of step 8: ci = a ( k / D ) 2 + b ( k / D ) + c( k p ( j )) ei ← ei + Δ If (ci ) 2 Em > (c 2m) ei , do the next three indented lines: ji = k c 2m = (ci ) 2 Em = ei 9. Set lag ( j ) = k p ( j ) + ji / D 10. Set c 2i ( j ) = c 2m 11. Set Ei ( j ) = Em 12. If c2m × Emax > c2max × Em, do the following three indented lines: jmax = j c2max = c2m 22 Emax = Em (iii) Set the first candidate for coarse pitch period as cpp = k p ( jmax ) . The symbol ← indicates that the parameter on the left-hand side is being updated with the value on the right-hand side 4. To avoid picking a coarse pitch period that is around an integer multiple of the true coarse pitch period, a search through the time lags corresponding to the local peaks of c 2( k p ) / E ( k p ) is performed to see if any of such time lags is close enough to the output coarse pitch period of block 24 in the last frame, denoted as cpplast 5. If a time lag is within 25% of cpplast, it is considered close enough. For all such time lags within 25% of cpplast, the corresponding quadratically interpolated peak values of the normalized correlation square c 2( k p ) / E ( k p ) are compared, and the interpolated time lag corresponding to the maximum normalized correlation square is selected for further consideration. The following algorithm performs the task described above. The interpolated arrays c2i( j) and Ei( j) calculated in Algorithm 3.8.1 above are used in this algorithm. Algorithm 3.8.2 Find the time lag maximizing interpolated c 2( k p ) / E ( k p ) among all time lags close to the output coarse pitch period of the last frame: (i) Set index im = -1 (ii) Set c2m = -1 (iii) Set Em = 1 (iv) For j =1, 2, …, N p , do the following: If | k p ( j ) − cpplast | ≤ 0.25 × cpplast , do the following: If c 2i ( j ) × Em > c2m × Ei ( j ) , do the following three lines: im = j c 2m = c 2i ( j ) Em = Ei ( j ) Note that If there is no time lag k p ( j ) within 25% of cpplast, then the value of the index im will remain at –1 after Algorithm 3.8.2 is performed. If there are one or more time lags within 25% of cpplast, the index im corresponds to the largest normalized correlation square among such time lags. Next, block 24 determines whether an alternative time lag in the first half of the pitch range should be chosen as the output coarse pitch period. Basically, block 24 searches through all interpolated time lags lag( j) that are less than 16, and checks whether any of them has a large enough local peak of normalized correlation square near every integer multiple of it (including itself) up to 32. If there 4 5 An equal sign is not applicable due to a potential mathematical conflict. For the first frame cpplast is initialized to 12. 23 are one or more such time lags satisfying this condition, the smallest of such qualified time lags is chosen as the output coarse pitch period of block 24. Again, variables calculated in Algorithms 3.8.1 and 3.8.2 above carry their final values over to Algorithm 3.8.3 below. In the following, the parameter MPDTH is 0.06, and the threshold array MPTH(k) is given as MPTH(2) = 0.7, MPTH(3) = 0.55, MPTH(4) = 0.48, MPTH(5) = 0.37, and MPTH(k) = 0.30, for k > 5. Algorithm 3.8.3 Check whether an alternative time lag in the first half of the range of the coarse pitch period should be chosen as the output coarse pitch period: For j = 1, 2, 3, …, N p , in that order, do the following while lag( j) < 16: (i) If j ≠ im, set threshold = 0.73; otherwise, set threshold = 0.4. (ii) If c2i( j) × Emax ≤ threshold × c2max × Ei( j), disqualify this j, skip step (iii) for this j, increment j by 1 and go back to step (i). (iii) If c2i( j) × Emax > threshold × c2max × Ei( j), do the following: a) For k = 2, 3, 4, … , do the following while k × lag( j) < 32: 1. s = k × lag( j) 2. a = (1 – MPDTH) s 3. b = (1 + MPDTH) s 4. Go through m = j+1, j+2, j+3, …, N p , in that order, and see if any of the time lags lag(m) is between a and b. If none of them is between a and b, disqualify this j, stop step (iii), increment j by 1 and go back to step (i). If there is at least one such m that satisfies a < lag(m) ≤ b and c2i(m) × Emax > MPTH(k) × c2max × Ei(m), then it is considered that a large enough peak of the normalized correlation square is found in the neighborhood of the k-th integer multiple of lag( j); in this case, stop step (iii) a) 4., increment k by 1, and go back to step (iii) a) 1. b) If step (iii) a) is completed without stopping prematurely, that is, if there is a large enough interpolated peak of the normalized correlation square within ±100×MPDTH% of every integer multiple of lag( j) that is less than 32, then stop this algorithm and stop the operation of block 24, and set cpp = k p ( j ) as the final output coarse pitch period of block 24. 24 If Algorithm 3.8.3 above is completed without finding a qualified output coarse pitch period cpp, then block 24 examines the largest local peak of the normalized correlation square around the coarse pitch period of the last frame, found in Algorithm 3.8.2 above, and makes a final decision on the output coarse pitch period cpp using the following algorithm. Algorithm 3.8.4 performs this final decision. Again, variables calculated in Algorithms 3.8.1 and 3.8.2 above carry their final values over to Algorithm 3.8.4 below. In the following, the parameters are SMDTH = 0.095 and LPTH1= 0.78. Algorithm 3.8.4: Final decision of the output coarse pitch period (i) If im = -1, that is, if there is no large enough local peak of the normalized correlation square around the coarse pitch period of the last frame, then use the cpp calculated at the end of Algorithm 3.8.1 as the final output coarse pitch period of block 24, and exit this algorithm. (ii) If im = jmax, that is, if the largest local peak of the normalized correlation square around the coarse pitch period of the last frame is also the global maximum of all interpolated peaks of the normalized correlation square within this frame, then use the cpp calculated at the end of Algorithm 3.8.1 as the final output coarse pitch period of block 24, and exit this algorithm. (iii) If im < jmax, do the following indented part: If c2m × Emax > 0.43 × c2max × Em, do the following indented part of step (iii): a) If lag(im) > MAXPPD/2, set block 24 output cpp = k p (im) and exit this algorithm. b) Otherwise, for k = 2, 3, 4, 5, do the following indented part: 1. s = lag(jmax) / k 2. a = (1 – SMDTH) s 3. b = (1 + SMDTH) s 4. If lag(im) > a and lag(im) < b, set block 24 output cpp = k p (im) and exit this algorithm. (iv) If im > jmax, do the following indented part: If c2m × Emax > LPTH1 × c2max × Em, set block 24 output cpp = k p (im) and exit this algorithm. (v) If algorithm execution proceeds to here, none of the steps above have selected a final output coarse pitch period. In this case, just accept the cpp calculated at the end of Algorithm 3.8.1 as the final output coarse pitch period of block 24. 25 Block 25 takes cpp as its input and performs a second-stage pitch period search in the undecimated signal domain to get a refined pitch period pp. Block 25 first converts the coarse pitch period cpp to the undecimated signal domain by multiplying it by the decimation factor D, where D = 8 for BV32. Then, it determines a search range for the refined pitch period around the value cpp × D. The lower bound of the search range is lb = max(MINPP, cpp × D – D + 1) , where MINPP = 10 samples is the minimum pitch period. The upper bound of the search range is ub = min(MAXPP, cpp × D + D – 1), where MAXPP is the maximum pitch period, which is 264 samples for BV32. Block 25 maintains a signal buffer with a total of MAXPP + 1 + FRSZ samples, where FRSZ is the frame size, which is 80 samples for BV32. The last FRSZ samples of this buffer are populated with the open-loop short-term prediction residual signal d(n) in the current frame. The first MAXPP + 1 samples are populated with the MAXPP + 1 samples of quantized version of d(n), denoted as dq(n), immediately preceding the current frame. For convenience of writing equations later, the symbol dq(n) will be used to denote the entire buffer of MAXPP + 1 + FRSZ samples, even though the last FRSZ samples are really d(n) samples. Again, let the index range from n = 1 to n = FRSZ denotes the samples in the current frame. After the lower bound lb and upper bound ub of the pitch period search range are determined, block 25 calculates the following correlation and energy terms in the undecimated dq(n) signal domain for time lags k within the search range [lb, ub]. c~( k ) = FRSZ ∑ dq(n)dq(n − k ) n =1 FRSZ ~ E ( k ) = ∑ dq( n − k ) 2 n =1 ~ The time lag k ∈ [lb, ub] that maximizes the ratio c~ 2 (k ) / E (k ) is chosen as the final refined pitch period. That is, ⎡ c~ 2 ( k ) ⎤ pp = arg max ⎢ ~ ⎥ . k ∈[ lb , ub ] ⎣ E ( k ) ⎦ Once the refined pitch period pp is determined, it is encoded into the corresponding output pitch period index PPI, calculated as PPI = pp − 10 . Possible values of PPI are all integers from 0 to 254 for the BV32 codec. Therefore, the refined pitch period pp is encoded into 8 bits, without any distortion. The value of PPI = 255 is reserved for signaling purposes and therefore is not used by the BV32 codec. Block 25 also calculates ppt1, the optimal tap weight for a single-tap pitch predictor, as follows 26 c~( pp ) ppt1 = ~ . E ( pp ) ~ In the degenerate case where E ( pp ) = 0, ppt1 is set to zero. Block 27 calculates the long-term noise feedback filter coefficient λ as follows. ppt1 ≥ 1 ⎧0.5, ⎪ λ = ⎨0.5 × ppt1, 0 < ppt1 < 1 ⎪0, ppt1 ≤ 0 ⎩ 3.9 Long-Term Predictor Parameter Quantization Pitch predictor taps quantizer block 26 quantizes the three pitch predictor taps to 5 bits using vector quantization. The pitch predictor has a transfer function of 3 Pl ( z ) = ∑ bi z − pp + 2 − i , i =1 where pp is the pitch period calculated in Section 3.8. Rather than minimizing the mean-square error of the three taps b1 , b2 , and b3 as in conventional VQ codebook search, block 26 finds from the VQ codebook the set of candidate pitch predictor taps that minimizes the pitch prediction residual energy in the current frame. Using the same dq(n) buffer and time index convention as in block 25, and denoting the set of three taps corresponding to the j-th codevector, b j = [b j1 b j 2 b j 3 ]T , as { b j1 , b j 2 , b j 3 }, we can express such pitch prediction residual energy as Ej = FRSZ ∑ n =1 2 3 ⎡ ⎤ dq ( n ) b ji dq( n − pp + 2 − i )⎥ − ∑ ⎢ i =1 ⎣ ⎦ . The codevector is selected from a 3-dimensional codebook of 32 codevectors, { b0 , b1 ,K, b31} , listed in Appendix 5. The codevector that minimizes the pitch prediction residual energy is selected. The index of the selected codevector is given by PPTI = j* = arg min {E j } j∈{0 ,1,K, 31} and the corresponding set of three quantized pitch predictor taps, denoted as ppt = {b1 , b2 , b3} in Figure 6, is given by 27 ⎡ b1 ⎤ ⎢b ⎥ = b . j* ⎢ 2⎥ ⎢⎣b3 ⎥⎦ This completes the description of block 20, long-term predictive analysis and quantization. 3.10 Excitation Gain Quantization In BV32 coding, there are two 2.5 ms sub-frames in each 5 ms frame, with one residual gain for each sub-frame. The sub-frame size is therefore SFRSZ = FRSZ/2 = 40 samples. The unquantized residual gains are based on the pitch prediction residuals of the respective sub-frames and are quantized open-loop in the base-2 logarithmic domain. The quantization of the residual gain is part of the prediction residual quantizer block 30 in Figure 2. Block 30 is further expanded in Figure 7. All the operations in Figure 7 are performed sub-frame by sub-frame. dq(n), ppt, pp CI GI 309 300 lv(m -1) Compare with threshold 303 305 mrlg(m) lge(m) + + + + - Residual quantizer codebook search - 311 lgeq(m) 307 + 308 + lgq(m) Convert to linear gain 304 MA log-gain predictor 302 Log-gain mean value Scale residual quantizer codebook gq(m) 306 Scalar quantizer 312 313 310 301 Calculate logarithmic gain lg(m) uq(n) Estimate signal level Calculate pitch prediction residual em(n) 30 elg(m) lgmean u(n) Figure 7 Prediction residual quantizer (block 30) 28 Block 300 in Figure 7 calculates the pitch prediction residual signal. For the first sub-frame the pitch prediction residual signal is given by 3 e1 ( n ) = dq( n ) − ∑ bi dq( n − pp + 2 − i ), n = 1,2,..., SFRSZ , i =1 where the same dq(n) buffer and time index convention of block 25 is used. Hence, the first subframe of dq(n) for n = 1, 2, …, SFRSZ is the unquantized open-loop short-term prediction residual signal d(n). For the second sub-frame the pitch prediction residual signal is given by 3 e2 ( n ) = dq( SFRSZ + n ) − ∑ bi dq( SFRSZ + n − pp + 2 − i ), n = 1, 2, ..., SFRSZ . i =1 Again, the second sub-frame of dq(n), n = SFRSZ+1, SFRSZ+2, …, FRSZ, is the unquantized openloop short-term prediction residual signal d(n). However, at the time the pitch prediction residual of the second sub-frame is calculated, the excitation of the first sub-frame is fully quantized and is located at dq(n), n = 1, 2, …, SFRSZ, which is then no longer the unquantized open-loop short-term prediction residual signal. Block 301 calculates the residual gain in the base-2 logarithmic domain. First, the average power of the pitch prediction residual signal in the m-th sub-frame is calculated as 1 SFRSZ 2 ∑ em (n) , SFRSZ n =1 where m = 1 and 2 for the first and second sub-frames of the current frame, respectively. Then, to avoid taking logarithm of zero or a very small number, the logarithmic gain (log-gain) of the m-th sub-frame is calculated as Pe ( m) = ⎧log 2 Pe ( m), if Pe ( m) > 1 / 4 . lg ( m) = ⎨ if Pe ( m) ≤ 1 / 4 ⎩ − 2, The long-term mean value of the log-gain is calculated off-line and stored in block 302. This loggain mean value for BV32 is lgmean = 11.82031. The adder 303 calculates the mean-removed version of the log-gain as mrlg(m) = lg(m) - lgmean. The MA log-gain predictor block 304 is a 16thorder FIR filter with its memory initialized to zero at the very first frame. The coefficients of this log-gain predictor lgp(k), k = 1, 2, 3, …, 16, are fixed, as given below: lgp(1) = 0.5913086 lgp(2) = 0.5251160 lgp(3) = 0.5724792 lgp(4) = 0.5977783 29 lgp(5) = 0.4800720 lgp(6) = 0.4939270 lgp(7) = 0.4729614 lgp(8) = 0.4158936 lgp(9) = 0.3805847 lgp(10) = 0.3395081 lgp(11) = 0.2780151 lgp(12) = 0.2455139 lgp(13) = 0.1916199 lgp(14) = 0.1470032 lgp(15) = 0.1138611 lgp(16) = 0.0664673 Block 304 calculates its output, the estimated log-gain, as elg ( m) = GPO ∑ lgp(k )lgeq(m − k ) , k =1 where GPO = 16 is the gain predictor order for BV32, and lgeq(m - k) is the quantized version of the log-gain prediction error at sub-frame m – k. Here it is assumed that the sub-frame indices of different frames form a continuous sequence of integers. For example, if the sub-frame indices in the current frame is 1 and 2, then the sub-frame indices of the last frame is –1 and 0, and the subframe indices of the frame before that is –3 and –2. The adder 305 calculates the log-gain prediction error as lge(m) = mrlg(m) - elg(m). The scalar quantizer block 306 performs 5-bit scalar quantization of the resulting log-gain prediction error lge(m). The codebook entries of this gain quantizer, along with the corresponding codebook indices, are listed in Appendix 6. The operation of this quantizer is controlled by block 310, whose purpose is to achieve a good trade-off between clear-channel performance and noisy-channel performance of the excitation gain quantizer. The operation of block 310 will be described later. For each temporarily quantized lgeq(m), the adders 307 and 308 together calculate the corresponding temporarily quantized log-gain as lgq(m) = lgeq(m) + elg(m) + lgmean Block 309 estimates the signal level based on the final quantized log-gain, to be determined later subject to the constraint imposed by block 310. Let lv(m) denote the output estimated signal level of block 309 at sub-frame m. Since the final value of lgq(m) has not been determined yet at this point, block 310 can only use the estimated signal level at the last sub-frame, namely, lv(m – 1). One way to think of this situation is that block 309 has a one-sample delay unit for its input lgq(m). 30 At sub-frame m, block 310 controls the quantization operation of block 306 based on lv(m – 1), lgq(m – 1), and lgq(m – 2) 6. It uses an NG × NGC gain change threshold matrix T(i, j), i = 1, 2, …, NG, j = 1, 2, …, NGC to limit how high lgq(m) can go. For BV32, the parameter values are NG = 18 and NGC = 11. The threshold matrix T(i, j) is given in Appendix 7. Block 310 and block 306 work together to perform the quantization of lge(m) in the following way. First, the row index into the threshold matrix T(i, j) is calculated as ⎡ lgq( m − 1) − lv ( m − 1) − GLB ⎤ i=⎢ ⎥⎥ 2 ⎢ , where GLB = –24, and the symbol ⎡.⎤ means “take the next larger integer” or “rounding to the nearest integer toward infinity”. If i > NG, i is clipped to NG. If i < 1, i is clipped to 1. Second, the column index into the threshold matrix T(i, j) is calculated as ⎡ lgq( m − 1) − lgq( m − 2) − GCLB ⎤ j=⎢ ⎥⎥ 2 ⎢ , where GCLB = –8. If j > NGC, j is clipped to NGC. If j < 1, j is clipped to 1. Third, with the row and column indices i and j calculated above, a gain quantization limit is calculated as GL = lgq(m – 1) + T(i, j) – elg(m) – lgmean Fourth, block 306 performs normal scalar quantization of lge(m) into its nearest neighbor in the quantizer codebook. If the resulting quantized value is not greater than GL, this quantized value is accepted as the final quantized log-gain prediction error lgeq(m), and the corresponding codebook index is the output gain index GI m . On the other hand, if the quantized value is greater than GL, the next smaller gain quantizer codebook entry is compared with GL. If it is not greater than GL, it is accepted as the final output lgeq(m) of block 306, and the corresponding codebook index is accepted as GI m . However, if it is still greater the GL, then block 306 keeps looking for the next smaller quantizer codebook entry (in descending order of codebook entry value), until it finds one that is not greater than GL. In such a search, the first one (that is, the largest one) that it finds to be no greater than GL is chosen as the final output lgeq(m) of block 306, and the corresponding codebook index is accepted as GI m . In the rare occasion when all the gain quantizer codebook entries are greater than GL, then the smallest gain quantizer codebook entry is chosen as the final output lgeq(m) of block 306, and the corresponding codebook index (0 in this case) is chosen as the output GI m . The final gain quantizer codebook index GI m is passed to the bit multiplexer block 95 of Figure 2. 6 The initial value of lgq(m – 1) and lgq(m – 2) is –2, i.e. lgq(0)= –2 and lgq(-1)= –2. 31 Once the quantized log-gain prediction error lgeq(m) is determined in this way, adders 307 and 308 add elg(m) and lgmean to lgeq(m) to obtain the quantized log-gain lgq(m) as lgq(m) = lgeq(m) + elg(m) + lgmean After this final quantized log-gain lgq(m) subject to the constraint imposed by block 310 is calculated, it is used by block 309 to update the estimated signal level lv(m). This value lv(m) is used by block 310 in the next sub-frame (the (m + 1)-th sub-frame). At sub-frame m, after the final quantized log-gain lgq(m) is calculated, block 309 estimates the signal level using the following algorithm. The parameter values used are α = 8191/8192, β = 1023/1024, and γ = 511/512. At codec initialization, the related variables are initialized as: lmax(m 1) = -100, lmin(m - 1) = 100, lmean(m - 1) = 8, lv(m - 1) = 13.5, and x(m - 1) =13.5. Algorithm for updating estimated long-term average signal level: (i) If lgq(m) > lmax(m - 1), set lmax(m) = lgq(m); otherwise; set lmax(m) = lmean(m - 1) + α [lmax(m - 1) - lmean(m - 1)]. (ii) If lgq(m) < lmin(m - 1), set lmin(m) = lgq(m); otherwise; set lmin(m) = lmean(m - 1) + α [lmin(m - 1) - lmean(m - 1)]. (iii) Set lmean(m) = β × lmean(m - 1) + (1 - β) [lmax(m) + lmin(m)]/2 . (iv) Set lth = lmean(m) + 0.2 [lmax(m) – lmean(m)] . (v) If lgq(m) > lth, set x(m) = γ × x(m - 1) + (1- γ)lgq(m), and set lv(m) = γ × lv(m - 1) + (1- γ) x(m); Otherwise, set x(m) = x(m - 1) and lv(m) = lv(m - 1). Block 311 converts the quantized log-gain lgq(m) to the quantized gain gq(m) in the linear domain as follows. gq( m) = 2 lgq ( m ) 2 Block 312 scales the residual vector quantization (also called excitation VQ) codebook by simply multiplying every element of every codevector in the excitation VQ codebook by gq(m). The resulting scaled codebook is then used by block 313 to perform Excitation VQ codebook search, as described in the next section. 3.11 Excitation Vector Quantization 32 The excitation VQ codebook of BV32 has a sign-shape structure, with 1 bit for sign and 5 bits for shape. The vector dimension is 4. Thus, there are 32 independent shape codevectors stored in the codebook, but the negated version of each shape codevector (i.e., the mirror image with respect to the origin) is also a valid codevector for excitation VQ. The 32 shape codevectors, along with the corresponding codebook indices, are listed in Appendix 8. Block 313 in Figure 7 performs the excitation VQ codebook search using the filter structure shown in Figure 8, which is essentially a subset of the BV32 encoder shown in Figure 2. The only difference is that the prediction residual quantizer (block 30) in Figure 2 is replaced by block 48 in Figure 8, which is labeled as “scaled VQ codebook”. This scaled VQ codebook is calculated in Section 3.10 above. 48 55 d(n) + 75 v(n) + + Scaled VQ codebook u(n) - 85 uq(n) dq(n) + ltfv(n) 80 + 70 + + q(n) 65 ltnf(n) - N l ( z) − 1 60 ppv(n) Pl (z ) 90 + 50 stnf(n) + qs(n) Fs (z ) Figure 8 Filter structure used in BV32 excitation VQ codebook search The three filters of blocks 50, 60, and 65 have transfer functions given by 8 Fs ( z ) = − ∑ ai′ z − i , i =1 where ai′ = (0.75)i a~i , and a~i is the i-th coefficient of the short-term prediction error filter; 3 Pl ( z ) = ∑ bi z − pp + 2 − i , i =1 where pp is the pitch period, and bi is the i-th long-term predictor coefficient; 33 N l ( z ) − 1 = λ z − pp , where λ is the long-term noise feedback filter coefficient calculated in Section 3.8. Using the filter structure in Figure 8, block 313 in Figure 7 performs excitation VQ codebook search one excitation vector at a time. Each excitation vector contains four samples. The excitation gain gq(m) is updated once a sub-frame. Each sub-frame contains 10 excitation vectors. Therefore, for each sub-frame, the same scaled VQ codebook is used in 10 separate VQ codebook searches corresponding to the 10 excitation vectors in that sub-frame. Let n = 1, 2, 3, 4 denote the sample time indices corresponding to the current four–dimensional excitation vector. Before the excitation VQ codebook search for the current excitation vector starts, the open-loop short-term prediction residual d(n), n = 1, 2, 3, 4 has been calculated in Section 3.7. In addition, before the VQ codebook search starts, the initial filter states (also called “filter memory”) of the three filters in Figure 8 (blocks 50, 60, and 65) are also known. All the other signals in Figure 8 are not determined yet for n = 1, 2, 3, 4. The basic ideas of the excitation VQ codebook search are explained below. Refer to Figure 8. Block 48 stores the N scaled shape codevectors, where N = 32. Counting also the negated version of each scaled shape codevector, it is equivalent to having 2N scaled codevectors available for excitation VQ. From these 2N scaled codevectors, block 48 puts out one scaled codevector at a time as uq(n), n = 1, 2, 3, 4. With the initial filter memories in blocks 50, 60, and 65 set to what were left after vector-quantizing the last excitation vector, this uq(n) vector then “drives” the rest of the filter structure until the corresponding quantization error vector q(n), n = 1, 2, 3, 4 is obtained. The energy of this q(n) vector is calculated and stored. This process is repeated for each of the 2N scaled codevectors, with the filter memories reset to their initial values before the process is repeated each time. After all 2N codevectors have been tried, we have calculated 2N corresponding quantization error energy values. The scaled codevector that minimizes the energy of the quantization error vector q(n), n = 1, 2, 3, 4 is the winning scaled codevector and is used as the VQ output vector. The corresponding output VQ codebook index is a 6-bit index consisting of a sign bit as the most significant bit (MSB), followed by 5 shape bits. If the winning scaled codevector is a negated version of a scaled shape codevector, then the sign bit is 1, otherwise, the sign bit is 0. The 5 shape bits are simply the binary representation of the codebook index of the winning shape codevector, as defined in Appendix 8. Note that there are 20 such excitation codebook indices in a frame, since each frame has 20 excitation vectors. These 20 indices are grouped in an excitation codebook index array, denoted as CI = {CI (1), CI ( 2),..., CI ( 20)} , where CI (k ) is the excitation codebook index for the k-th excitation vector in the current frame. This excitation codebook index array CI is passed to the bit multiplexer block 95. Given a uq(n) vector (taking the value of one of the 2N scaled codevectors), the way to derive the corresponding energy of the q(n) vector is now described in more detail below. First, block 60 performs pitch prediction to produce the pitch-predicted vector ppv(n) as 3 ppv ( n ) = ∑ bi dq( n − pp + 2 − i ) , n = 1, 2, 3, 4 i =1 34 Adder 85 then updates the dq(n) vector as dq(n) = uq(n) + ppv(n) , n = 1, 2, 3, 4 Next, block 50 and adders 90 and 55 work together to update the v(n) vector as 8 v ( n ) = d ( n ) − ∑ ai′ [v( n − i ) − dq( n − i )] , n = 1, 2, 3, 4. i =1 Finally, the corresponding q(n) vector is calculated as q( n ) = v ( n ) − ppv( n ) − λ q( n − pp ) − uq( n ) , n = 1, 2, 3, 4. The energy of the q(n) vector is calculated as 4 Eq = ∑ q 2 ( n ) . n =1 Such calculation from a given uq(n) vector to the corresponding energy term Eq is repeated 2N times for the 2N scaled VQ codevectors. After the winning scaled codevector that minimizes the Eq term is selected, the filter memories of blocks 50, 60, and 65 are updated by using the filter memories that were left after the calculation of the Eq term for that particular winning codevector was done. Such updated filter memories become the initial filter memories used for the excitation VQ codebook search for the next excitation vector. 3.12 Bit Multiplexing The bit multiplexer block 95 in Figure 2 packs the five sets of indices LSPI, PPI, PPTI, GI, and CI into a single bit stream. This bit stream is the output of the BroadVoice32 encoder. It is passed to the communication channel. Figure 9 shows the BV32 bit stream format in each frame. In Figure 9, the bit stream for the current frame is the shaded area in the middle. The bit stream for the last frame is on the left, while the bit stream for the next frame is on the right. Although the bit stream of different frames may not be sent next to each other in a packet voice system, this illustration is meant to show that time goes from left to right, and the 40 side information bits consisting of LSPI, PPI, PPTI, and GI goes before the excitation codebook indices CI(k), k =1, 2, …, 20 when the bit stream is transmitted in a serial manner. Note that for each index, the most significant bit (MSB) goes first (on the left), while the least significant bit (LSB) goes last. This completes the detailed description of the BV32 encoder. 35 PPI PPTI 7 5 5 8 5 5 5 6 6 6 ... 0 7 5 5 8 5 5 5 6 6 6 ... 40 160 bits ... LSPI3 CI(1) LSPI2 GI GI CI(2) 1 2 LSPI1 CI(3) ... CI(20) Frame M - 1 Frame M Frame M + 1 (Previous Frame) (Current Frame) (Next Frame) Figure 9 BV32 bit stream format 36 4 DETAILED DESCRIPTION OF THE BV32 DECODER This section gives a detailed description of each functional block in the BV32 decoder shown in Figure 3. Those blocks or signals that have the same labels as their counterparts in the encoder of Figure 2 have the same meaning as those counterparts. 4.1 Bit De-multiplexing The bit de-multiplexer block 100 takes one frame of input bit stream at a time, and de-multiplexes, or separates, the five sets of indices LSPI, PPI, PPTI, GI, and CI from the current frame of input bit stream. As described in Section 3 above, LSPI contains three indices: a 7-bit first-stage VQ index, a 5-bit second-stage lower VQ index, and a 5-bit second-stage upper VQ index. PPI is an 8-bit pitch period index. PPTI is a 5-bit pitch predictor tap VQ index. GI contains two 5-bit gain indices, and CI contains twenty 6-bit excitation VQ indices, each with 1 sign bit and 5 shape bits. 4.2 Long-Term Predictor Parameter Decoding The long-term predictor parameter decoder (block 110) decodes the indices PPI and PPTI. The pitch period is decoded from PPI as pp = PPI + 10 Let { b0 , b1 ,K, b31} be the 3-dimensional, 32-entry codebook used for pitch predictor tap VQ, as listed in Appendix 5. Let b j be the j-th codevector in this codebook, where the subscript j is the codebook index listed in the first column of the table in Appendix 5. The three pitch predictor taps b1 , b2 , and b3 are decoded from PPTI as ⎡ b1 ⎤ ⎢b ⎥ = b PPTI . ⎢ 2⎥ ⎢⎣b3 ⎥⎦ 4.3 Short-Term Predictor Parameter Decoding The short-term predictor parameter decoding takes place in block 120 of Figure 3. Block 120 receives the set of decoded LSP indices, LSPI = {LSPI 1 , LSPI 2 , LSPI 3 }, from the bit de-multiplexer, ~ block 100 in Figure 3. First, block 120 reconstructs the LSP coefficients, {li } , from the LSP indices, and then it produces the coefficients of the short-term prediction error filter, {a~i } , from the LSP coefficients according to the conversion procedure specified in Section 3.6. 37 1206 ~ e2 LSPI1 Index subquantizer 2 LSPI2 Index subquantizer 3 LSPI3 First stage VQ Regular 8 dimensional inverse subquantizer 1202 1204 1205 ~ e21 ~ e22 Second stage VQ Regular 3 dimensional inverse subquantizer ~e 22 ,1 order MA prediction 12012 TEI Index subquantizer 1 8th Mean LSP vector ~ e2(1) + 1209 l̂ + ~ e2(1) + ( (1) l 12010 + 12014 12015 Buffer ( ( 2) l LP coefficients LSP to LP conversion {a~i } Check ordering property of lower 3 LSP coefficients Append subvectors TEI 1201 Regular 5 dimensional inverse subquantizer + 1208 12016 1203 ~ e2( 2 ) ê1 1207 l ( l TEI ~ e22 , 2 ~ l 12013 LSP spacing Reconstructed LSP vector 12011 Figure 10 BV32 short-term predictor parameter decoder (block 120) Block 120 of Figure 3 is expanded in Figure 10. The reconstruction of the LSP coefficients from the LSP indices is the inverse of the LSP quantization, and many operations have equivalents in Section 3.5 and Figure 5. The first-stage VQ is decoded in block 1204, and the second-stage split VQ is decoded in block 12016. Based on the index for the second-stage upper split VQ, block 1201 looks up the quantized upper (0) (1) ( 31) , producing split vector from the codebook CB 22 = cb 22 , cb 22 ,K, cb 22 { } ( LSPI 3 ) ~ e22, 2 = cb 22 . Similarly, based on the index for the second-stage lower split VQ, block 1202 looks up the quantized (0) (1) ( 31) , producing lower split vector from the codebook CB 21 = cb 21 , cb 21 ,K, cb 21 { } ( LSPI ) ~ e22,1 = cb 21 2 . Block 1203 performs the identical operation of block 1610 in Figure 5 and appends the two secondstage sub-vectors to form the second-stage output vector, ~ ~e = ⎡ e22,1 ⎤ . 22 ⎢~e ⎥ ⎣ 22, 2 ⎦ 38 From the index for the first stage VQ, block 1204 looks up the quantized first stage vector from the (0) (1) (127 ) , codebook CB1 = cb1 , cb1 ,K, cb1 { } ( LSPI ) ~ e21 = cb1 1 . Adder 1205 performs the equivalent operation of Adder 1611 in Figure 5. It adds the first-stage and second-stage vectors to obtain a first reconstructed prediction error vector, ~ e2(1) = ~ e21 + ~ e22 . Equivalent to block 163 in Figure 5, block 1206 performs the 8th-order MA prediction of the meanremoved LSP vector according to T eˆ1,i = p LSP ,i [e~2,i (1) T ~ e2,i ( 2) e~2,i (3) e~2,i ( 4) ~ e2,i (5) ~ e2,i (6) ~ e2,i (7) ~ e2,i (8) ] , i = 1, 2,K, 8 , where ~e2,i ( k ) and p LSP,i are defined in Section 3.5. Adder 1207, equivalent to Adder 1612 in Figure 5, generates the predicted LSP vector by adding the mean LSP vector and the mean-removed predicted LSP vector, ˆl = l + eˆ . 1 Subsequently, adder 1208 adds the predicted LSP vector to the first reconstructed prediction error vector to obtain a first intermediate reconstructed LSP vector, ( (1) l = ˆl + ~ e2(1) . ( Adder 1209 subtracts the predicted LSP vector from a second intermediate reconstructed LSP l ( 2 ) , to calculate a second reconstructed prediction error vector ( ~ e2( 2 ) = l ( 2 ) − ˆl , to be used to update the MA predictor memory in the presence of bit-errors. Block 12010 determines the ordering property of the first 3 first intermediate reconstructed LSP coefficients, ( l1(1) ( l2(1) ( l3(1) ≥ 0 ( ≥ l1(1) , ( ≥ l2(1) This ordering property was enforced during the encoding operation of the constrained 3-dimensional VQ of the lower split vector, block 168 of Figure 5. If the ordering is found to be preserved, the Transmission-Error-Indicator, TEI , is set to 0 to indicate that no bit-errors in the LSP bits have been detected. Otherwise, if it is not preserved, the Transmission-Error-Indicator is set to 1 to indicate the likely presence of bit-errors in the LSP bits. 39 If the Transmission-Error-Indicator is 0, the switches 12011 and 12012 are in the left position, and they route the first reconstructed prediction error vector ~e2(1) and the first intermediate reconstructed ( LSP vector l (1) to the reconstructed prediction error vector ~e2 and the intermediate reconstructed ( LSP vector l , respectively. Otherwise, if the Transmission-Error-Indicator is 1, the switches 12011 and 12012 are in the right position, and they route the second reconstructed prediction error vector ~e ( 2 ) and the second intermediate reconstructed LSP vector (l ( 2 ) to the reconstructed prediction error 2 ( vector ~e2 and the intermediate reconstructed LSP vector l , respectively. Hence, the reconstructed prediction error vector and the intermediate reconstructed LSP vector are obtained as ⎧~ e (1) , if TEI = 0 ~ e2 = ⎨ ~2( 2 ) ⎩ e2 , if TEI = 1 and ( ( ⎧ l (1) , if TEI = 0 l = ⎨( ( 2) , , if = 1 l TEI ⎩ respectively. Block 12013 enforces LSP spacing; it is functionally identical to block 1614 in Figure 5, as specified in Section 3.5. Block 12014 buffers the reconstructed LSP vector for future use in the presence of bit-errors. The reconstructed LSP vector of the current frame becomes the second intermediate reconstructed LSP vector of the next frame, ( ( 2) ~ l ( k + 1) = l ( k ) , where the additional parameter k here represents the frame index of the current frame. For the very first frame the second intermediate reconstructed LSP vector is initialized to ( ( 2) T l = [1 / 9 2 / 9 K 8 / 9] The final step of the short-term predictor parameter decoding is to convert the reconstructed LSP coefficients to linear prediction coefficients. This operation takes place in block 12015, which is functionally identical to block 17 of Figure 4, described in Section 3.6. 4.4 Excitation Gain Decoding The excitation gain decoder is shown in Figure 11. It is part of block 130 in Figure 3. It decodes the two gain indices in GI into the two corresponding decoded sub-frame excitation gains gq(m), m = 1, 2 in the linear domain. All operations in Figure 11 are performed sub-frame by sub-frame. 40 507 Estimate signal level lv(m -1) 508 Compare with threshold 501 GI Gain prediction error decoder 509 506 505 lgeq(m) + + lgq'(m) Determine final decoded log-gain 510 lgq(m) Convert to linear gain qg(m) 502 lgeq'(m) + - 503 MA log-gain predictor 512 511 + + - + elg(m) lgmean 504 Log-gain mean value Figure 11 Excitation gain decoder Refer to Figure 11. Let m be the sub-frame index of the current sub-frame, and assume the same convention for the sub-frame index m as in Section 3.10. Block 501 decodes the 5-bit gain index GI m into the log-gain prediction error lgeq(m) using the codebook in Appendix 6. Switch 502 is normally in the upper position, connecting the output of block 501 to the input of block 503. Then, the MA log-gain predictor (block 503) calculates the estimated log-gain for the current sub-frame as elg ( m) = GPO ∑ lgp(k )lgeq(m − k ) , k =1 where GPO = 16, and lgp(k), k = 1, 2, …, GPO are the MA log-gain predictor coefficients given in Section 3.10. Block 504 holds the long-term average log-gain value lgmean = 11.82031. Adders 505 and 506 adds elg(m) and lgmean, respectively, to lgeq(m), resulting in the temporarily decoded log-gain of lgq′(m) = lgeq(m) + elg(m) + lgmean . Block 507 is functionally identical to block 309 in Figure 7, described in Section 3.10. It is important to note that equivalently to the encoder, the log-gain value passed to block 507 for 41 updating its estimate of the long-term average signal level is the final value of the decoded log-gain lgq(m), i.e. after the threshold check of block 508 and potential log-gain extrapolation and substitution of block 509, respectively, as described below. Block 508 calculates the row and column indices i and j into the threshold matrix T(i, j) in the same way as block 310 in Figure 7. Namely, the row index is calculated as ⎡ lgq( m − 1) − lv ( m − 1) − GLB ⎤ i=⎢ ⎥⎥ 2 ⎢ , where GLB = –24. If i > NG, i is clipped to NG. If i < 1, i is clipped to 1. The column index is calculated as ⎡ lgq( m − 1) − lgq( m − 2) − GCLB ⎤ j=⎢ ⎥⎥ 2 ⎢ , where GCLB = –8. If j > NGC, j is clipped to NGC. If j < 1, j is clipped to 1. Block 508 controls the actions of block 509 and switch 502 in the following way. If GI m = 0 or lgq′(m) ≤ T(i, j) + lgq(m – 1), then switch 502 is in the upper position, block 509 determines the final decoded log-gain as lgq( m) = lgq′( m) , and the filter memory in the MA log-gain predictor (block 503) is updated by shifting the old memory values by one position, and then assigning lgeq(m) to the newest position of the filter memory. If, on the other hand, GI m > 0 and lgq(m) > T(i, j) + lgq(m – 1), then the temporarily decoded loggain lgq′(m) is discarded, block 509 determines the final decoded log-gain as lgq( m) = lgq( m − 1) (by extrapolating the decoded log-gain of the last sub-frame); furthermore, switch 502 is moved to the lower position, adders 511 and 512 subtract lgmean and elg(m), respectively, from lgq(m) to get lgeq′( m) = lgq( m) − lgmean − elg ( m) , and this lgeq′(m) is used to update the newest position of the filter memory of block 503, after the old memory values are shifted by one position. Once the final decoded log-gain lgq(m) subject to the constraint imposed by block 509 is determined as described above, it is used by block 508 to update the estimated signal level lv(m). This value lv(m) is then used by block 509 in the next sub-frame (the (m + 1)-th sub-frame). 42 Block 510 converts final decoded log-gain lgq(m) to the linear domain as gq( m) = 2 4.5 lgq ( m ) 2 . Excitation VQ Decoding and Scaling The excitation codebook index array CI of each frame contains 20 excitation codebook indices, CI(k), k = 1, …, 20, each containing 1 sign bit and 5 shape bits. The excitation vectors are decoded vector-by-vector, and then sub-frame-by-sub-frame, since the excitation gain is updated once a subframe. Suppose the current excitation vector that needs to be decoded is in the m-th sub-frame and has a corresponding excitation codebook index of CI(k). This index assumes a value between 0 and 63. The most significant bit of this index is the sign bit. Therefore, if CI(k) < 32, the sign bit is 0; otherwise, the sign bit is 1. Let c j (n ), n = 1, 2, 3, 4 represent the j-th shape codevector in Appendix 8, with a shape codebook index of j. Furthermore, without loss of generality, let n = 1, 2, 3, 4 correspond to the sample time indices of the current vector. Then, in Figure 3, the decoded and scaled excitation vector, or uq(n), n = 1, 2, 3, 4, is obtained as ⎧ gq( m) cCI ( k ) ( n ), n = 1, 2, 3, 4, uq( n ) = ⎨ ⎩− gq( m) cCI ( k ) − 32 ( n ), n = 1, 2, 3, 4, 4.6 if CI ( k ) < 32 if CI ( k ) ≥ 32 Long-Term Synthesis Filtering Let n = 1, 2, …, FRSZ correspond to the sample time indices of the current frame. In Figure 3, the long-term synthesis filter (block 155, consisting of block 140 and adder 150 in a feedback loop) performs sample-by-sample long-term synthesis filtering as follows. 3 dq( n ) = uq( n ) + ∑ bi dq( n − pp + 2 − i ) , n = 1, 2, … FRSZ. i =1 4.7 Short-Term Synthesis Filtering The short-term synthesis filter (block 175, consisting of block 160 and adder 170 in a feedback loop) performs sample-by-sample short-term synthesis filtering as follows. 43 8 sq( n ) = dq( n ) − ∑ a~i sq( n − i ) , n = 1, 2, … FRSZ. i =1 4.8 De-emphasis Filtering The de-emphasis filter (block 180) is a first-order pole-zero filter with fixed coefficients. It is exactly the inverse filter of the pre-emphasis filter H pe (z ) described in Section 3.2. This deemphasis filter has the following transfer function. H de ( z ) = 1 + 0.75z −1 1 + 0.5z −1 Block 180 filters the short-term synthesis filter output signal sq(n) to produce the output signal of the entire decoder in Figure 3. This completes the detailed description of the BV32 decoder. 4.9 Example Packet Loss Concealment The packet loss concealment is not a mandatory component of this BV32 Codec Specification, since packet lost concealment does not affect bit-stream compatibility or encoder-decoder interoperability. However, an example packet loss concealment technique is described in this section for reference purposes only. An implementer of BV32 can utilize other packet loss concealment techniques without affecting inter-operability. The example packet loss concealment technique utilizes the synthesis model of the decoder. In principle, all side information of the previous frame is repeated while the excitation of the cascaded long-term and short-term synthesis filters is from a random source, scaled to a proper level. Hence, with the additional index m denoting the m-th frame, during packet-loss: • The pitch period, pp, is set to the pitch period of the last frame 7: pp = ppm −1 . • The pitch taps, b1 b2 and b3, are set to the pitch taps of the last frame 8. bi = bm −1,i , i=1,2,3. The short-term synthesis filter coefficients, a~ , i = 1,...,8 , are set to those of the last frame 9: • i a~i = a~m −1,i , i=1,…,8. • A properly scaled random sequence is used as long-term synthesis filter excitation, uq(n), n = 1, 2, … FRSZ. 7 If the first frame is lost a value of 100 is used for the pitch period. If the first frame is lost the pitch taps are set to zero. 9 If the first frame is lost the short-term filter coefficients are set to zero. 8 44 The speech synthesis of the bad frame (part of lost packet) now takes place exactly as specified in Sections 4.6, 4.7, and 4.8. The random sequence is scaled according to uq (n) = g plc ⋅ Em −1 FRSZ ∑ [r (n)] ⋅ r (n) , n = 1, 2, … FRSZ, 2 n =1 where r(n), n = 1, 2, … FRSZ, is a random sequence, Em-1 is in principle the energy of the long-term synthesis filter excitation of the previous frame 10, and the scaling factor, gplc, is calculated as detailed below. During good frames an estimate of periodicity is updated as perm = 0.5 perm −1 + 0.5 bs , where bs is the sum of the three pitch taps clipped at a lower threshold of zero and an upper threshold of one 11, while it is maintained during bad frames: perm = perm −1 . Based on the periodicity the scaling factor is calculated as g plc = −2 perm −1 + 1.9 with gplc clipped at a lower threshold of 0.1 and an upper threshold of 0.9. After synthesis of the signal output of a lost frame, memories of predictive quantizers are updated. The memory of the inverse LSP quantizer is updated with ~ ~ e2,i = I m −1,i − eˆ1,i − I i , i=1,2,…,8, ~ where eˆ1,i is given in Section 4.3, I i in Section 3.5, and I m −1,i denotes the i-th LSP coefficients of the (m-1)-th frame (as decoded according to Section 4.3 for a good frame, or repeated for a bad frame). The memory of the inverse gain quantizer is updated with lgeq(m) = lgq(m) − lgmean − elg (m) , where elg (m) is given in Section 4.4, lgmean in Section 3.10, and lgq(m) is calculated as 10 11 The energy is initialized to zero, i.e. E0=0. The estimate of periodicity is initialized to zero, i.e. per0=0. 45 E m −1 E ⎧ log , if m −1 > 1 4 ⎪⎪ 2 FRSZ FRSZ lgq (m) = ⎨ E ⎪- 2, if m −1 ≤ 1 4 ⎪⎩ FRSZ . The level estimation for a bad frame is updated exactly as for a good frame, see Section 4.4. At the end of a good frame (after synthesis of the output) the estimate of periodicity is estimated as explained above, and the energy of the long-term synthesis filter excitation is updated as Em = FRSZ ∑ [uq(n)] 2 . n =1 At the end of the processing of a bad frame (after synthesis of the output and update of predictive quantizers), the energy of the long-term synthesis filter excitation and the long-term synthesis filter coefficients are scaled down when 8 or more consecutive frames are lost: ⎧⎪ E Em = ⎨ m −1 2 ⎪⎩(β Nclf ) Em −1 ⎧bm −1,i bm, i = ⎨ ⎩β Nclf bm −1,i Nclf < 8 , Nclf ≥ 8 Nclf < 8 , i=1,2,3, Nclf ≥ 8 where Nclf is the number of consecutive lost frames, and the scaling, β Nclf , is given by ⎧1 − 0.02 ( Nclf − 7) 8 ≤ Nclf ≤ 57 . Nclf > 57 ⎩0 β Nclf = ⎨ This will gradually mute the output signal when consecutive packets are lost for an extended period of time. 46 APPENDIX 1: GRID FOR LPC TO LSP CONVERSION Grid point 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 Grid value 0.9999390 0.9935608 0.9848633 0.9725342 0.9577942 0.9409180 0.9215393 0.8995972 0.8753662 0.8487854 0.8198242 0.7887573 0.7558899 0.7213440 0.6853943 0.6481323 0.6101379 0.5709839 0.5300903 0.4882507 0.4447632 0.3993530 0.3531189 0.3058167 0.2585754 0.2109680 0.1630859 0.1148682 0.0657349 0.0161438 -0.0335693 -0.0830994 -0.1319580 -0.1804199 -0.2279663 -0.2751465 -0.3224487 -0.3693237 -0.4155884 -0.4604187 -0.5034180 -0.5446472 -0.5848999 -0.6235962 -0.6612244 -0.6979980 -0.7336731 -0.7675781 -0.7998962 -0.8302002 -0.8584290 -0.8842468 -0.9077148 -0.9288635 -0.9472046 -0.9635010 -0.9772034 -0.9883118 -0.9955139 47 59 -0.9999390 48 APPENDIX 2: FIRST-STAGE LSP CODEBOOK Index 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 Element 1 -0.00384521 -0.00511169 -0.00367737 -0.00312805 -0.00250244 -0.00090027 0.00399780 -0.00151062 -0.00547791 -0.00393677 -0.00331116 -0.00222778 -0.00325012 -0.00054932 -0.00546265 -0.00254822 -0.00204468 -0.00289917 -0.00028992 -0.00028992 -0.00030518 0.00292969 0.01701355 0.00061035 -0.00398254 -0.00177002 0.00256348 0.00088501 0.00035095 0.00041199 0.01571655 0.00881958 -0.00672913 -0.00515747 -0.00303650 -0.00221252 -0.00231934 -0.00123596 0.00738525 0.00349426 -0.00619507 -0.00459290 -0.00405884 -0.00209045 -0.00489807 -0.00341797 -0.00016785 -0.00109863 -0.00527954 -0.00317383 0.00453186 0.00018311 -0.00143433 0.00213623 0.01060486 0.00221252 -0.00749207 -0.02087402 -0.00173950 -0.00424194 Element 2 -0.00849915 -0.01313782 -0.00166321 -0.00488281 -0.00323486 -0.00347900 0.01086426 -0.00581360 -0.00958252 -0.00848389 -0.00723267 -0.00709534 -0.00445557 -0.00219727 0.00070190 0.00099182 0.00265503 -0.00740051 0.00064087 0.00001526 0.00170898 0.00251770 0.02578735 -0.00015259 0.00350952 -0.00981140 0.01017761 -0.00016785 0.00717163 0.00004578 0.02601624 0.02149963 -0.01612854 -0.01365662 -0.00975037 -0.00933838 -0.00257874 -0.00584412 0.02700806 0.00294495 -0.01358032 -0.00839233 -0.00538635 -0.00204468 -0.00955200 -0.00909424 0.01191711 0.00473022 -0.00999451 -0.00744629 0.01782227 -0.00355530 0.00292969 0.00561523 0.05717468 0.00817871 -0.00627136 -0.04931641 0.01293945 -0.00158691 Element 3 -0.01591492 -0.01698303 0.00045776 0.00282288 0.00154114 -0.00909424 0.00677490 -0.00186157 0.00094604 -0.01943970 0.00175476 -0.00581360 0.00651550 -0.00631714 0.02934265 0.02000427 -0.00135803 -0.01710510 0.00022888 -0.00805664 -0.00651550 -0.00447083 -0.00593567 0.00686646 0.01591492 -0.03118896 0.01966858 -0.00163269 0.00427246 -0.01815796 0.01066589 0.01010132 -0.02481079 -0.01542664 -0.02221680 -0.02006531 0.00263977 -0.01034546 0.01812744 -0.00387573 -0.01676941 -0.02026367 0.00645447 -0.00219727 -0.00572205 -0.00500488 0.03486633 0.01737976 -0.00939941 -0.00877380 0.00762939 -0.01539612 0.01277161 0.00642395 0.03829956 0.01704407 0.02369690 0.00619507 0.04112244 0.02459717 Element 4 -0.00360107 -0.00103760 -0.00309753 -0.00173950 0.00422668 -0.00746155 0.00090027 -0.00430298 0.01203918 -0.01473999 0.03128052 0.01132202 0.00497437 -0.01139832 0.01412964 -0.00164795 -0.02322388 -0.02655029 -0.00819397 -0.02310181 -0.01683044 -0.01782227 0.00595093 0.00129700 -0.00076294 -0.01042175 0.01533508 -0.00199890 0.00279236 -0.03132629 0.03164673 0.00360107 -0.00184631 -0.01049805 0.01498413 0.00033569 -0.00134277 -0.01982117 0.02203369 -0.01075745 0.01498413 -0.02606201 0.03422546 0.00228882 0.00482178 -0.00860596 0.03454590 0.00859070 -0.00805664 -0.02050781 -0.00749207 -0.02656555 0.00936890 -0.00889587 0.03216553 -0.00007629 0.02711487 0.00404358 0.03024292 0.01078796 49 Element 5 -0.00013733 -0.01216125 0.01814270 0.00004578 -0.00964355 -0.00656128 0.00244141 -0.01788330 0.00695801 0.01364136 0.00772095 -0.00482178 -0.01744080 -0.01916504 0.00656128 -0.01643372 0.00332642 -0.01350403 0.00061035 -0.00082397 0.00083923 -0.02940369 0.01370239 -0.00637817 0.02429199 -0.00013733 0.01405334 -0.00700378 0.02046204 -0.00378418 0.03356934 0.00122070 0.00761414 -0.01742554 0.02423096 0.00292969 -0.00151062 -0.02880859 0.00323486 -0.02171326 0.02687073 0.02151489 0.03749084 0.02597046 -0.00778198 -0.04263306 0.02195740 -0.00253296 -0.00268555 -0.03236389 0.03543091 -0.00277710 0.00128174 -0.03330994 0.02561951 -0.00616455 0.03462219 0.01080322 0.03976440 0.00611877 Element 6 0.00610352 -0.00427246 -0.00053406 -0.00094604 -0.01895142 -0.02726746 -0.00988770 -0.01603699 0.02105713 -0.00468445 -0.00163269 -0.00050354 0.01000977 -0.00711060 0.00003052 -0.00813293 0.01715088 0.00151062 0.02536011 -0.00106812 -0.00955200 -0.02981567 0.01223755 -0.02079773 0.02890015 0.00044250 0.01646423 -0.00726318 0.00689697 -0.02220154 0.02770996 -0.00657654 0.01754761 0.02040100 0.00935364 -0.01268005 -0.00566101 -0.02052307 -0.00514221 -0.03224182 0.02645874 0.02061462 0.02166748 0.00415039 0.01531982 -0.00547791 0.01472473 -0.03044128 0.04862976 0.01905823 0.01852417 -0.01931763 -0.00985718 -0.05546570 0.02203369 -0.04737854 0.04241943 0.00926208 0.03063965 0.00105286 Element 7 0.01640320 -0.00271606 0.00256348 -0.01976013 0.01704407 -0.00769043 0.00549316 -0.03099060 0.00720215 -0.00344849 0.00566101 -0.01037598 0.01194763 0.00106812 0.01229858 -0.00671387 0.01350403 -0.00038147 -0.00822449 -0.02081299 0.02677917 0.00372314 0.00622559 -0.05078125 0.01559448 -0.00659180 -0.00257874 -0.02569580 0.02848816 0.00140381 0.01812744 -0.01893616 0.00720215 -0.00880432 -0.00544739 -0.02940369 0.00665283 -0.01663208 0.01075745 -0.02403259 0.01818848 -0.00651550 0.00497437 -0.02684021 0.03317261 0.00357056 0.03034973 -0.00776672 0.01870728 0.01884460 -0.00367737 -0.03083801 0.04154968 0.00897217 0.01969910 -0.03558350 0.02859497 0.00779724 0.00881958 -0.02471924 Element 8 -0.00166321 0.00846863 -0.00833130 0.00306702 0.00219727 -0.00224304 -0.00628662 -0.00659180 0.00140381 0.00566101 -0.00460815 -0.01887512 -0.00160217 -0.01481628 0.00367737 -0.01013184 0.00199890 0.00778198 -0.02096558 -0.01762390 0.00958252 -0.00421143 -0.00111389 -0.01544189 0.00701904 -0.01545715 -0.01338196 -0.03907776 0.01043701 -0.00294495 0.00709534 -0.02380371 0.01480103 -0.00152588 -0.00675964 -0.00543213 0.03112793 0.00572205 0.00660706 -0.02343750 0.01010132 -0.00538635 -0.00592041 -0.01873779 0.01727295 0.00357056 0.02073669 -0.01104736 0.00442505 0.00524902 -0.01086426 0.00360107 0.02775574 0.00265503 0.00923157 0.00561523 0.01635742 0.00225830 -0.00358582 -0.02410889 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 -0.00451660 -0.00369263 0.01309204 0.00236511 -0.00433350 -0.00448608 -0.00320435 -0.00141907 -0.00105286 -0.00238037 0.01274109 0.00193787 -0.00361633 -0.00132751 -0.00360107 -0.00175476 -0.00105286 0.00044250 0.00265503 0.00163269 -0.00450134 -0.00181580 -0.00350952 0.00189209 -0.00234985 0.00024414 0.05004883 0.00430298 0.00041199 -0.00491333 0.01916504 0.00370789 0.01432800 0.00350952 0.05522156 0.03974915 -0.00840759 -0.00660706 -0.00529480 -0.00257874 -0.00048828 0.00151062 0.01791382 0.00483704 -0.00576782 -0.00526428 -0.00202942 -0.00016785 -0.00460815 0.00132751 -0.00267029 0.00160217 -0.00639343 -0.00352478 -0.00077820 0.00099182 -0.00256348 0.00869751 0.02838135 0.04316711 -0.00311279 -0.02593994 0.00602722 0.00640869 -0.00239563 -0.00415039 -0.01776123 0.03527832 0.01925659 -0.01188660 -0.01176453 0.00030518 -0.00477600 -0.00151062 -0.00828552 0.02935791 -0.00151062 -0.00816345 -0.00799561 -0.00260925 -0.00695801 0.00723267 -0.00314331 0.02203369 0.00399780 -0.00598145 -0.00743103 -0.00422668 -0.00177002 -0.00527954 -0.00541687 0.03166199 0.00444031 0.00929260 -0.02757263 0.03500366 0.00387573 0.03143311 0.00082397 0.04231262 0.03291321 -0.02593994 -0.02162170 -0.00663757 -0.00895691 0.00041199 -0.00248718 0.06657410 0.01110840 -0.01533508 -0.01495361 -0.00065613 0.00070190 -0.00889587 -0.00230408 0.01330566 0.01959229 -0.01054382 -0.01866150 0.01048279 -0.01753235 -0.00033569 0.00711060 0.09507751 0.05152893 0.00325012 -0.07890320 0.03062439 0.01506042 0.00444031 0.00253296 -0.03298950 0.04226685 0.04072571 -0.02235413 -0.02374268 0.00944519 -0.00032043 -0.00180054 -0.00988770 0.00981140 -0.00468445 0.00148010 -0.02526855 0.03428650 0.00030518 0.02352905 -0.00833130 0.05549622 0.01559448 -0.01719666 -0.02114868 -0.00115967 -0.01432800 -0.01350403 -0.01794434 0.02220154 0.00691223 0.00347900 -0.06730652 0.02929688 0.00061035 0.01612854 -0.03111267 0.04219055 0.01431274 -0.04820251 -0.03446960 -0.01538086 -0.00971985 -0.00028992 -0.01004028 0.02952576 0.01173401 -0.01855469 -0.03617859 0.02978516 0.00903320 -0.00535583 -0.00685120 0.07746887 0.05360413 -0.00729370 -0.02894592 0.01577759 -0.02914429 0.00053406 -0.00775146 0.06649780 0.01130676 0.02406311 -0.02648926 0.07060242 0.02645874 0.01907349 0.01228333 -0.01219177 0.04809570 0.02778625 0.01066589 -0.01464844 0.01014709 -0.00436401 -0.00811768 0.00376892 -0.00921631 -0.01261902 0.03401184 -0.03221130 0.04959106 0.02726746 0.00462341 -0.02253723 0.02410889 -0.00083923 -0.02134705 -0.03652954 0.00111389 -0.02612305 -0.01855469 -0.02980042 0.01562500 -0.00653076 0.00259399 -0.02465820 0.03329468 -0.00419617 0.00932312 -0.06707764 0.03793335 0.00024414 0.00361633 -0.01261902 -0.00068665 -0.00666809 -0.00938416 -0.02021790 0.01698303 -0.00868225 0.03782654 -0.05659485 0.07008362 0.03486633 -0.00605774 -0.03175354 0.05206299 0.01110840 0.00303650 -0.03585815 0.02217102 -0.03764343 0.00070190 -0.03376770 0.05419922 -0.02204895 0.04458618 -0.02957153 0.06431580 0.01609802 0.03089905 50 0.02276611 -0.03230286 0.04991150 0.01647949 0.01145935 -0.01629639 0.03031921 -0.00563049 -0.02941895 -0.02708435 -0.01629639 -0.02470398 0.01333618 0.00328064 0.01815796 -0.00277710 -0.01211548 -0.03590393 0.00866699 -0.03933716 0.02500916 -0.03193665 0.03088379 -0.01161194 -0.00726318 -0.03829956 0.00930786 -0.01719666 0.05375671 -0.00869751 0.02725220 -0.01568604 0.01620483 -0.02024841 0.03443909 0.00086975 0.01782227 -0.02426147 0.05569458 0.00314331 -0.00831604 -0.03338623 0.00154114 -0.03361511 0.04870605 0.00096130 0.05216980 0.03285217 -0.01533508 -0.06939697 0.03462219 -0.02023315 0.02307129 -0.06474304 0.07875061 -0.03930664 -0.00105286 -0.06314087 0.04470825 -0.04406738 0.07960510 -0.01586914 0.04623413 0.00148010 0.03352356 0.02371216 -0.02035522 0.04533386 -0.01173401 -0.00656128 -0.01852417 0.00007629 -0.02128601 -0.01837158 -0.03489685 -0.00587463 -0.03384399 0.01911926 0.00810242 0.00881958 -0.01660156 0.02276611 0.00534058 0.00965881 -0.01277161 0.02310181 -0.00167847 0.04490662 -0.01190186 -0.00196838 -0.04582214 0.00764465 -0.04112244 0.03878784 -0.00566101 0.01902771 -0.02262878 0.02969360 -0.01860046 0.03150940 -0.01142883 0.03044128 0.01382446 0.01844788 -0.00125122 -0.03677368 -0.05041504 -0.00361633 -0.06233215 0.04002380 0.01994324 0.03585815 0.00875854 0.03904724 -0.02249146 0.02912903 -0.03753662 0.07855225 -0.00245667 0.04440308 -0.04081726 0.00088501 -0.08934021 0.03926086 -0.06632996 0.07987976 -0.01681519 0.02545166 -0.01939392 0.05075073 0.05001831 -0.01049805 0.03337097 -0.02360535 0.02409363 0.01446533 -0.00328064 -0.03314209 0.03617859 -0.00431824 0.00247192 -0.04949951 0.02272034 0.00950623 -0.00042725 -0.02694702 0.02523804 -0.01576233 0.00958252 -0.02479553 0.02972412 0.00451660 -0.00390625 -0.04681396 0.04997253 0.01480103 0.00833130 -0.09020996 0.02937317 -0.00590515 -0.00694275 -0.05206299 0.03417969 -0.00958252 0.02209473 -0.03588867 0.02810669 0.01550293 0.00303650 -0.05572510 0.01962280 -0.02108765 0.00166321 -0.03771973 0.02944946 0.01362610 0.01889038 -0.03100586 0.05235291 -0.01206970 0.02680969 -0.01402283 0.04028320 0.02102661 0.00306702 -0.06845093 0.08264160 -0.01795959 0.02592468 -0.08439636 0.04357910 -0.00686646 -0.00128174 -0.05572510 0.07539368 0.02963257 -0.00700378 0.01974487 -0.01696777 0.01565552 0.01126099 0.01599121 0.02626038 0.01126099 0.00047302 0.00064087 -0.00338745 0.01939392 0.01345825 -0.00680542 -0.01084900 0.01177979 -0.01797485 -0.00190735 -0.01690674 0.01644897 0.00935364 -0.01063538 -0.02130127 0.02980042 0.01237488 0.00251770 -0.00898743 0.01449585 0.00354004 -0.01644897 -0.04679871 0.02700806 -0.00173950 0.01277161 -0.01281738 0.02386475 0.01689148 0.00178528 0.00030518 0.03395081 0.00358582 -0.00538635 -0.02009583 0.01617432 0.00981140 0.00332642 -0.03533936 0.02601624 -0.00375366 0.01100159 -0.01716614 0.01892090 0.01512146 -0.00975037 -0.01873779 0.03260803 0.00088501 0.01179504 -0.05546570 0.01593018 -0.00469971 -0.01522827 -0.06500244 0.03486633 125 126 127 -0.00079346 0.00588989 0.02717590 -0.03021240 0.03402710 0.07472229 -0.05854797 0.08795166 0.08680725 -0.07080078 0.09323120 0.03575134 51 -0.06494141 0.07124329 0.00018311 -0.05015564 0.05776978 -0.03523254 -0.02285767 0.03340149 -0.05368042 -0.00508118 0.01075745 -0.04931641 APPENDIX 3: SECOND-STAGE LOWER SPLIT LSP CODEBOOK Index 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Element 1 0.00281525 -0.00021553 0.00709152 -0.00341034 -0.00196075 -0.00179482 -0.00576019 -0.00498390 0.00724030 -0.00100517 0.01622772 -0.01317978 0.00139236 0.00160599 0.00048065 0.00121498 0.01221657 0.00564766 0.02144051 -0.01160431 -0.00497437 -0.00357437 -0.01611328 -0.01193810 0.01710129 0.00753784 0.03960609 -0.03484535 -0.00045013 -0.00150681 0.00778198 -0.01263237 Element 2 0.00292778 -0.00037766 -0.00558853 -0.00456047 0.00144005 -0.00482559 0.00680923 -0.01045990 0.00892258 0.00750542 0.00503349 -0.00148201 0.01294518 -0.00276566 0.02153206 -0.01841927 0.00114632 0.00059319 -0.01291847 -0.01168442 -0.00429916 -0.01308441 0.01459503 -0.02121544 0.01618958 0.01832008 0.01548195 0.00230217 0.01565170 -0.01651573 0.04269028 -0.04002953 Element 3 0.00433731 -0.00252151 -0.00040245 0.00535393 0.01340103 -0.00926208 0.00318718 -0.00181580 -0.00010681 -0.01124763 -0.00928497 -0.00485039 0.01284790 -0.02051735 -0.00239372 0.00706482 0.01258469 -0.00907707 -0.00042725 0.01208878 0.02562332 -0.01529694 0.00725365 -0.00399017 0.00624657 -0.02398491 -0.00556374 0.00053406 0.03667641 -0.03601646 0.00644302 0.00638008 52 APPENDIX 4: SECOND-STAGE UPPER SPLIT LSP CODEBOOK Index 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Element 1 0.00223160 0.00498199 -0.00000954 0.00362587 -0.01116562 0.00366402 -0.00282288 0.01284599 -0.00849152 -0.00639915 -0.00438499 0.01028252 -0.01620674 -0.00172234 -0.00300217 0.01217651 -0.00471497 0.00894547 -0.00122452 -0.00556946 -0.01784134 -0.00508881 -0.00472450 0.02260780 -0.00877571 0.00730515 0.00929070 0.01049805 -0.02701187 0.00286293 0.00362587 0.02854538 Element 2 -0.00800133 0.00384903 0.00230217 0.01415634 0.00059700 0.00034904 -0.00809288 0.00154495 -0.00714302 0.00654030 0.00685120 0.00627327 0.00895309 0.00682259 -0.00821686 -0.00773621 -0.01052666 -0.00356674 0.00730324 0.02675247 0.00078583 0.00965881 -0.01339912 0.01769447 -0.00870895 0.00027847 -0.00706673 0.01000977 -0.01168251 -0.00534248 -0.02618980 -0.00962830 Element 3 -0.00899124 -0.00713539 0.00827026 0.00111580 -0.01137161 -0.00654984 0.00408554 0.00731087 0.00018120 -0.00492859 -0.00248146 -0.00315285 0.00953102 0.00998497 0.00954819 0.00847435 -0.02195930 -0.00493240 0.01606369 -0.00582695 -0.00429535 0.00708389 0.00592613 0.00827408 -0.01420212 -0.00198555 -0.00564384 -0.02177620 0.01052856 0.02644157 0.00177765 -0.00597000 Element 4 0.00006485 -0.00961494 0.00367355 0.00265884 0.00316811 0.00271797 -0.00595474 0.00330925 0.00532913 -0.00344276 0.01663589 0.00683403 0.00367737 -0.01184273 0.01287270 -0.00031281 -0.01058769 -0.02550888 0.01205063 -0.00326729 -0.01312637 -0.01148987 -0.01262474 -0.00707054 0.01482201 -0.01367950 0.01904678 0.00494194 0.00321388 -0.00658035 0.00383186 -0.00085640 53 Element 5 0.00058365 -0.00307274 0.00186920 -0.00458145 -0.00823975 -0.01940155 -0.00964355 -0.00998116 0.00732613 0.01243401 0.00031281 0.00990868 -0.00362778 0.00318718 -0.00807762 0.00645638 0.00412560 -0.00962448 0.01569366 0.00189209 -0.00244522 -0.02126884 0.00816154 -0.00349998 0.01783562 0.02097321 0.00018692 0.00013351 0.00094223 -0.00415039 -0.00398064 -0.00148964 APPENDIX 5: PITCH PREDICTOR TAB CODEBOOK Index 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Element 1 -0.1450195 -0.1288755 -0.0344850 -0.0050965 -0.0058290 0.2434080 -0.0683900 0.0208435 0.0831910 0.1194460 0.0783080 0.1614380 0.1701965 0.3446655 0.1160280 0.2894895 -0.1333315 -0.0087890 -0.0806275 -0.0332030 0.0019225 0.2192995 0.0077210 0.1473085 0.1340940 0.4174500 0.0624085 0.1651305 0.1640320 0.3888245 0.0041200 0.2724305 Element 2 0.2992860 0.6889955 0.2010195 0.5024415 0.2820435 0.4494935 -0.2305605 0.3207705 0.3331300 0.6207885 0.1853335 0.4213255 0.2445375 0.4873655 -0.2495730 0.3384705 0.3481750 0.8484800 0.2053835 0.4768980 0.2650755 0.7223815 0.0712585 0.2614745 0.4643860 0.6805420 0.1940615 0.4480895 0.2704775 0.5142820 -0.2192080 0.3651735 Element 3 0.1412050 0.4095765 0.1164550 0.3855895 0.0539245 0.2871705 0.0762635 0.2186585 0.0210265 0.2395935 0.0917360 0.3347170 0.0487670 0.1487425 0.0732725 0.2445070 -0.0697630 0.1400755 -0.0161745 0.1187745 -0.1307375 0.0311585 0.0126040 0.1786500 -0.1044920 -0.1161805 -0.0299375 0.1498110 -0.1030580 -0.0249940 -0.0823975 0.0523375 54 APPENDIX 6: GAIN CODEBOOK Index 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Element -4.91895 -3.75049 -3.09082 -2.59961 -2.22656 -1.46240 -0.88037 -0.34717 -1.93408 -1.25635 -0.70117 -0.16650 0.20361 0.82568 1.59863 2.75684 -1.68457 -1.06299 -0.52588 0.01563 0.39941 1.05664 1.91602 3.34326 0.60693 1.31201 2.29736 4.11426 5.20996 6.70410 8.74316 10.92188 55 Relative log-gain of previous frame, [dB2] APPENDIX 7: GAIN CHANGE THRESHOLD MATRIX -24 to –22 -22 to –20 -20 to –18 -18 to –16 -16 to –14 -14 to –12 -12 to –10 -10 to –8 -8 to –6 -6 to –4 -4 to –2 -2 to 0 0 to 2 2 to 4 4 to 6 6 to 8 8 to 10 10 to 12 i=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8 i=9 i=10 i=11 i=12 i=13 i=14 i=15 i=16 i=17 i=18 -8 to -6 j=1 0.00000 0.00000 0.00000 6.31250 0.00000 -0.36523 5.51172 3.95703 7.37305 7.37305 4.39844 0.58789 0.14453 0.00000 0.00000 0.00000 0.00000 0.00000 -6 to -4 j=2 0.13477 0.64453 0.33594 5.50977 5.04883 6.15625 6.31641 10.51172 8.93945 8.12109 5.94336 5.10938 5.64844 5.54688 0.39258 0.00000 0.00000 0.00000 -4 to -2 j=3 2.26563 4.90039 7.27734 4.83984 5.09180 8.26953 9.66602 8.42969 8.57422 6.66406 5.73047 5.41602 5.05859 5.15625 3.92188 1.15039 0.37695 0.07617 Log-gain change of previous frame, [dB2] -2 to 0 0 to 2 2 to 4 4 to 6 j=4 j=5 j=6 j=7 2.94336 4.71875 0.00000 0.00000 3.38281 4.58203 5.69336 0.00000 5.82422 11.66211 11.66211 0.00000 6.99023 8.22852 11.49805 1.89844 5.91406 6.92188 7.38086 4.13867 5.40430 5.88477 11.53906 5.31836 7.58594 10.63281 12.03906 8.79297 7.62891 11.45703 11.95898 10.85352 6.85742 9.67773 11.54492 10.98242 5.87891 7.59766 10.67969 10.42578 5.10742 5.69531 8.31641 10.05273 4.55273 4.32813 5.75586 7.42383 4.06836 3.51758 4.07617 4.56055 3.37891 2.90430 2.74805 2.82422 2.67383 2.66602 2.40039 4.65039 2.56641 3.98438 3.61133 4.66797 4.30664 7.07031 0.81641 2.86914 1.46875 3.49219 3.16992 -0.84180 56 6 to 8 j=8 0.00000 0.00000 0.00000 0.00000 0.00000 -4.97070 3.06836 2.83008 10.43359 9.46875 8.23047 6.63867 4.99219 3.37500 3.29883 0.58398 1.19336 3.81250 8 to 10 j=9 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 1.50000 2.53320 6.85938 7.11328 6.81055 5.51953 4.02930 2.16016 -0.26563 0.69922 -0.50781 10 to 12 j=10 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 5.05859 3.06445 3.04102 4.14258 4.82227 4.49805 2.95703 0.09570 -1.23242 0.00000 12 to 14 j=11 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 -1.27930 3.31641 5.19141 3.42188 0.40820 0.00000 0.00000 0.00000 APPENDIX 8: EXCITATION VQ SHAPE CODEBOOK Index 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Element 1 -0.537476 1.145142 1.174194 2.946899 -1.704102 -0.889038 -0.756958 1.373169 -0.573364 1.058716 0.432617 2.542358 -2.572266 0.432251 0.701538 0.857056 -1.474365 -0.361694 0.407104 1.670288 -1.860596 -0.845703 0.425415 0.208374 -1.022583 0.502075 0.270630 2.266357 -1.876343 -0.389771 -0.040771 0.448242 Element 2 0.974976 1.222534 1.399414 0.798096 0.098755 -0.337402 -0.061890 -0.413330 -0.463745 -0.566040 0.441895 0.207031 -2.758423 -2.303711 -1.355591 -1.842285 1.636108 0.711914 1.661255 1.159668 0.592285 0.081421 0.641357 0.481567 0.425781 -0.491455 0.005981 -1.128540 -0.895142 -1.818604 -1.141968 -0.755127 Element 3 -0.631104 -1.252441 0.330933 -0.274658 -0.526001 0.784546 0.558960 0.690552 -0.606934 -1.677246 -0.630493 -1.611450 -0.499390 -2.016479 -0.861572 -0.006348 -0.683838 -0.136353 0.566406 1.760254 1.213379 2.197754 1.210205 1.808472 -0.168945 -0.296631 0.257813 -0.399414 -0.012207 1.185791 0.364258 1.767578 Element 4 -0.617920 0.616211 0.823120 -0.027344 -0.395508 0.298462 -0.907227 -0.794067 -0.623535 0.752563 -1.445801 0.313354 -0.020142 0.228638 -0.243042 1.216919 0.362915 1.619873 -0.559937 0.524780 0.719482 1.654785 -1.444580 0.685913 -1.642700 -0.068359 -0.466309 0.438477 0.886841 0.913452 -0.283691 -0.691406 57