ENGINEERING COMMITTEE Data Standards Subcommittee SCTE 24-21 2012 BV16 Speech Codec Specification for Voice over IP Applications in Cable Telephony NOTICE The Society of Cable Telecommunications Engineers (SCTE) Standards are intended to serve the public interest by providing specifications, test methods and procedures that promote uniformity of product, interchangeability and ultimately the long term reliability of broadband communications facilities. These documents shall not in any way preclude any member or non-member of SCTE from manufacturing or selling products not conforming to such documents, nor shall the existence of such standards preclude their voluntary use by those other than SCTE members, whether used domestically or internationally. SCTE assumes no obligations or liability whatsoever to any party who may adopt the Standards. Such adopting party assumes all risks associated with adoption of these Standards, and accepts full responsibility for any damage and/or claims arising from the adoption of such Standards. Attention is called to the possibility that implementation of this standard may require the use of subject matter covered by patent rights. By publication of this standard, no position is taken with respect to the existence or validity of any patent rights in connection therewith. SCTE shall not be responsible for identifying patents for which a license may be required or for conducting inquiries into the legal validity or scope of those patents that are brought to its attention. Patent holders who believe that they hold patents which are essential to the implementation of this standard have been requested to provide information about those patents and any related licensing terms and conditions. Any such declarations made before or after publication of this document are available on the SCTE web site at http://www.scte.org. All Rights Reserved © Society of Cable Telecommunications Engineers, Inc. 2012 140 Philips Road Exton, PA 19341 i Contents 1 INTRODUCTION ...................................................................................................... 1 2 OVERVIEW OF THE BV16 SPEECH CODEC .......................................................... 1 2.1 Brief Introduction of Two-Stage Noise Feedback Coding (TSNFC)....................................... 1 2.2 Overview of the BV16 Codec ................................................................................................ 3 3 DETAILED DESCRIPTION OF THE BV16 ENCODER ............................................ 7 3.1 High-Pass Pre-Filtering ........................................................................................................ 7 3.2 Short-Term Linear Predictive Analysis................................................................................. 7 3.3 Conversion to LSP .............................................................................................................. 10 3.4 LSP Quantization ............................................................................................................... 12 3.5 Conversion to Short-Term Predictor Coefficients ............................................................... 17 3.6 Long-Term Linear Predictive Analysis (Pitch Extraction) .................................................. 18 3.7 Long-Term Predictor Parameter Quantization ................................................................... 26 3.8 Excitation Gain Quantization ............................................................................................. 27 3.9 Excitation Vector Quantization .......................................................................................... 31 3.10 Bit Multiplexing ................................................................................................................. 35 4 DETAILED DESCRIPTION OF THE BV16 DECODER .......................................... 36 4.1 Bit De-multiplexing............................................................................................................. 36 4.2 Long-Term Predictor Parameter Decoding ......................................................................... 36 4.3 Short-Term Predictor Parameter Decoding ........................................................................ 36 4.4 Excitation Gain Decoding ................................................................................................... 39 4.5 Excitation VQ Decoding and Scaling .................................................................................. 42 4.6 Long-Term Synthesis Filtering ........................................................................................... 42 4.7 Short-Term Synthesis Filtering ........................................................................................... 42 4.8 Example Postfilter .............................................................................................................. 43 4.9 Example Packet Loss Concealment ..................................................................................... 45 APPENDIX 1: GRID FOR LPC TO LSP CONVERSION ............................................... 48 APPENDIX 2: FIRST-STAGE LSP CODEBOOK .......................................................... 49 APPENDIX 3: SECOND-STAGE LSP SHAPE CODEBOOK ......................................... 52 APPENDIX 4: PITCH PREDICTOR TAB CODEBOOK ............................................... 54 APPENDIX 5: GAIN CODEBOOK ................................................................................ 55 APPENDIX 6: GAIN CHANGE THRESHOLD MATRIX .............................................. 56 APPENDIX 7: EXCITATION VQ SHAPE CODEBOOK ............................................... 57 ii Figures Figure 1 Basic codec structure of Two-Stage Noise Feedback Coding (TSNFC) ...................... 2 Figure 2 An alternative codec structure of Two-Stage Noise Feedback Coding (TSNFC) .......... 3 Figure 3 Block diagram of the BV16 encoder ....................................................................... 4 Figure 4 Block diagram of the BV16 decoder ....................................................................... 6 Figure 5 Short-term linear predictive analysis and quantization (block 210) ............................. 8 Figure 6 LSP quantizer (block 216)................................................................................... 13 Figure 7 Long-term predictive analysis and quantization (block 220) .................................... 19 Figure 8 Prediction residual quantizer (block 230) .............................................................. 28 Figure 9 Filter structure used in excitation VQ codebook search .......................................... 32 Figure 10 Bit stream format ............................................................................................. 35 Figure 11 Short-term predictor parameter decoder (block 420) ............................................ 37 Figure 12 Excitation gain decoder..................................................................................... 40 Tables Table 1 Bit allocation of the BV16 codec ............................................................................ 6 iii 1 INTRODUCTION This document contains the description of the BV16 speech codec 1. BV16 compresses 8 kHz sampled narrowband speech to a bit rate of 16 kb/s by employing a speech coding algorithm called Two-Stage Noise Feedback Coding (TSNFC), developed by Broadcom. The rest of this document is organized as follows. Section 2 gives a high-level overview of the TSNFC algorithm. Sections 3 and 4 give detailed description of the BV16 encoder and decoder, respectively. The BV16 codec specification given in Sections 3 and 4 contain sufficient details to allow those skilled in the art to implement bit-stream compatible and functionally equivalent BV16 encoders and decoders. 2 OVERVIEW OF THE BV16 SPEECH CODEC In this section, the general principles of Two-Stage Noise Feedback Coding (TSNFC) are first introduced. Next, an overview of the BV16 algorithm is given. 2.1 Brief Introduction of Two-Stage Noise Feedback Coding (TSNFC) In conventional Noise Feedback Coding (NFC), the encoder modifies a prediction residual signal by adding a noise feedback signal to it. A scalar quantizer quantizes this modified prediction residual signal. The difference between the quantizer input and output, or the quantization error signal, is passed through a noise feedback filter. The output signal of this filter is the noise feedback signal added to the prediction residual. The noise feedback filter is used to control the spectrum of the coding noise in order to minimize the perceived coding noise. This is achieved by exploiting the masking properties of the human auditory system. Conventional NFC codecs typically use only a short-term noise feedback filter to shape the spectral envelope of the coding noise, and a scalar quantizer is used universally. In contrast, Broadcom’s Two-Stage Noise Feedback Coding (TSNFC) system uses a codec structure employing two stages of noise feedback coding in a nested loop: the first NFC stage performs short-term prediction and short-term noise spectral shaping (spectral envelope shaping), and the second nested NFC stage performs long-term prediction and long-term noise spectral shaping (harmonic shaping). Such a nested two-stage NFC structure is shown in Figure 1 below. 1 The “BV16 speech codec” specification is based on Broadcom Corporation’s BroadVoice®16 speech codec. The BroadVoice® open source software is provided under the GNU Lesser General Public License (“LGPL”), version 2.1, as published by the Free Software Foundation. -1- Input signal s(n ) d (n ) + + + - Quantizer + + - + Ps (z ) short-term predictor uq(n ) u(n ) v(n ) + + Output signal - Ps (z ) q(n ) short-term predictor + N l ( z) − 1 + sq(n ) vq(n ) long-term noise feedback filter Pl (z ) + Fs (z ) - + long-term predictor qs(n ) short-term noise feedback filter Figure 1 Basic codec structure of Two-Stage Noise Feedback Coding (TSNFC) In Figure 1 above, the outer layer (including the two short-term predictors and the short-term noise feedback filter) follows the structure of the conventional NFC codec. The TSNFC structure in Figure 1 is obtained by replacing the simple scalar quantizer in the conventional (single-stage) NFC structure by a “predictive quantizer” that employs long-term prediction and long-term noise spectral shaping. This “predictive quantizer” is represented by the inner feedback loop in Figure 1, including the long-term predictor and long-term noise feedback filter. This inner feedback loop uses an alternative but equivalent conventional NFC structure, where N l (z ) represents the filter whose frequency response is the desired noise shape for long-term noise spectral shaping. In the outer layer, the short-term noise feedback filter Fs (z ) is usually chosen as a bandwidth-expanded version of the short-term predictor Ps (z ) . The choice of different NFC structures in the outer and inner layers is based on complexity consideration. By combining two stages of NFC in a nested loop, the TSNFC in Figure 1 can reap the benefits of both short-term and long-term prediction and also achieve short-term and long-term noise spectral shaping at the same time. It is natural and straightforward to use a scalar quantizer in Figure 1. However, to achieve better coding efficiency, a vector quantizer is used in BV16. In the Vector Quantization (VQ) codebook search, the u(n ) vector cannot be generated before the VQ codebook search starts. Due to the feedback structure in Figure 1, the elements of u(n ) from the second element on will depend on the vector-quantized version of earlier elements. Therefore, the VQ codebook search is performed by trying out each of the candidate codevectors in the VQ codebook (i.e. fixing a candidate uq(n ) vector first), calculating the corresponding u(n ) vector and the corresponding VQ error q( n ) = u( n ) − uq( n ) . The VQ codevector that minimizes the energy of q(n ) within the current vector time span is chosen as the winning codevector, and the corresponding codebook index becomes part of the encoder output bit stream for the current speech frame. -2- The TSNFC decoder structure is simply a quantizer decoder followed by the two feedback filter structures involving the long-term predictor and the short-term predictor, respectively, shown on the right half of Figure 1. Thus, the TSNFC decoder is similar to the decoders of other predictive coding techniques such as Adaptive Predictive Coding (APC), Multi-Pulse Linear Predictive Coding (MPLPC), and Code-Excited Linear Prediction (CELP). If the alternative NFC structure in the inner feedback loop of Figure 1 is also used in the outer feedback loop, an alternative TSNFC codec structure is obtained, as shown in Figure 2 below. Here N s (z ) represents a short-term filter whose frequency response is the desired noise shape for short-term noise spectral shaping. The codec structure in Figure 2 is mathematically equivalent to the structure in Figure 1, but it allows direct specification of the short-term noise spectral shape as defined by N s (z ) . This can be an advantage in some applications. Input signal s(n ) + + - uq(n ) u(n ) v(n ) + + - + + N l ( z) − 1 Output signal - Ps (z ) q(n ) short-term predictor + long-term noise feedback filter Pl (z ) + + + + Quantizer sq(n ) vq(n ) N s ( z) − 1 - + long-term predictor qs(n ) short-term noise feedback filter Figure 2 An alternative codec structure of Two-Stage Noise Feedback Coding (TSNFC) 2.2 Overview of the BV16 Codec The BV16 codec is a purely forward-adaptive TSNFC codec. It operates at an input sampling rate of 8 kHz and an encoding bit rate of 16 kb/s, or 2 bits per sample. BV16 uses a frame size of 5 ms, or 40 samples. There is no look ahead. Therefore, the total algorithmic buffering delay is just the frame size itself, or 5 ms. The main design goal of BV16 is to make the coding delay and the codec complexity as low as possible, while providing toll speech quality exceeding or equivalent to that of G.728 and G.729E. -3- The block diagram of the BV16 encoder is shown in Figure 3. It is based on the alternative TSNFC codec structure shown in Figure 2. The BV16 decoder block diagram is shown in Figure 4. LSPI 210 220 295 Short-term predictive analysis & quantization Long-term predictive analysis & quantization PPI PPTI GI s(n) Output bit stream Bit multiplexer CI pp, ppt 203 Input signal High-pass pre-filter + + - 230 255 v(n) + + 275 u(n) 245 285 Prediction residual quantizer uq(n) + dq(n) 280 - Shortterm predictor + 265 270 + ltnf(n) sq(n) 240 + + q(n) Long-term noise feedback filter 260 ppv(n) + + Longterm predictor 290 - 250 + 253 stnf(n) qs(n) Short-term noise feedback filter Figure 3 Block diagram of the BV16 encoder Due to the small frame size, the parameters of the short-term predictor (also called the “LPC predictor”) and the long-term predictor (also called the “pitch predictor”) are both transmitted and updated once a frame. The gain of the excitation signal is transmitted once every frame. The excitation VQ uses a vector dimension of 4 samples. Hence, there are 10 excitation vectors in a frame. The BV16 encoder first passes the input signal through a fixed pole-zero high-pass pre-filter to remove possible DC bias or low frequency rumble. The resulting signal is then used to derive the LPC predictor coefficients. To keep the complexity low, BV16 uses a relatively low LPC predictor order of 8, and the LPC analysis window is 20 ms (160 samples) long. The LPC analysis window is asymmetric, with the peak of the window located at the center of the current frame, and the end of the window coinciding with the last sample of the current frame. Autocorrelation LPC analysis based on -4- Levinson-Durbin recursion is used to derive the coefficients of the 8th-order LPC predictor. The derived LPC predictor coefficients are converted to Line-Spectrum Pair (LSP) parameters, which are then quantized by an inter-frame predictive coding scheme. The inter-frame prediction of LSP parameters uses an 8th-order moving-average (MA) predictor. The MA predictor coefficients are fixed. The time span that this MA predictor covers is 8 × 5 ms = 40 ms. The inter-frame LSP prediction residual is quantized by a two-stage vector quantizer. The first stage employs an 8-dimensional vector quantizer with a 7-bit codebook. The second stage uses an 8-dimensional sign-shape VQ with 1 bit for sign and 6 bits for shape. For long-term prediction, a three-tap pitch predictor with an integer pitch period is used. To keep the complexity low, the pitch period and the pitch taps are both determined in an open-loop fashion. The three pitch predictor taps are jointly quantized using a 5-bit vector quantizer. The distortion measure used in the codebook search is the energy of the open-loop pitch prediction residual. The 32 codevectors in the pitch tap codebook have been “stabilized” to make sure that they will not give rise to an unstable pitch synthesis filter. The excitation gain is also determined in an open-loop fashion to keep the complexity low. The average power of the open-loop pitch prediction residual within the current frame is calculated and converted to the logarithmic domain. The resulting log-gain is then quantized using intersubframe MA predictive coding. The MA predictor order for the log-gain is 8, corresponding to a time span of 8 × 5 = 40 ms. Again, the log-gain MA predictor coefficients are fixed. The log-gain prediction residual is quantized by a 4-bit scalar quantizer. The 4-dimensional excitation VQ codebook has a simple sign-shape structure, with 1 bit for sign, and 4 bits for shape. In other words, only 16 four-dimensional codevectors are stored, but the mirror image of each codevector with respect to the origin is also a codevector. In the BV16 decoder, the decoded excitation vectors are scaled by the excitation gain. The scaled excitation signal passes through a long-term synthesis filter and a short-term synthesis filter. Figure 4 shows the block diagram of the BV16 decoder. -5- Long-term synthesis filter 400 Input bit stream 430 GI Bit de-multiplexer CI Prediction residual uq(n) quantizer decoder 450 PPTI 475 Output signal 470 sq(n) dq(n) + + 440 Longterm predictor 410 PPI 455 Short-term synthesis filter 460 Shortterm predictor Long-term predictive parameter decoder 420 Short-term predictive parameter decoder LSPI Figure 4 Block diagram of the BV16 decoder Table 1 shows the bit allocation of BV16 in each 5 ms frame. The LSP parameters are encoded into 14 bits per frame, including 7 bits for the first-stage VQ, and 1 + 6 = 7 bits for the secondstage sign-shape VQ. The pitch period and pitch predictor taps are encoded into 7 and 5 bits, respectively. The excitation gain in each frame is encoded into 4 bits. The 10 excitation vectors are each encoded with 1 bit for sign and 4 bits for shape, resulting in 50 bits per frame for excitation VQ. Including the other 30 bits of side information, the grand total is 80 bits per 40sample frame, which is 2 bits/sample, or 16 kb/s. Table 1 Bit allocation of the BV16 codec Parameter Bits per frame (40 samples) LSP 7 + 7 = 14 Pitch Period 7 3 Pitch Predictor Taps 5 Excitation Gain 4 10 Excitation Vectors (1 + 4) × 10 = 50 Total 80 -6- 3 DETAILED DESCRIPTION OF THE BV16 ENCODER In this section, detailed description of each functional block of the BV16 encoder in Figure 3 is given. When necessary, certain functional blocks will be expanded into more detailed block diagrams. The description given in this section will be in sufficient detail to allow those skilled in the art to implement a mathematically equivalent BV16 encoder. 3.1 High-Pass Pre-Filtering Refer to Figure 3. The input signal is assumed to be represented by 16-bit linear PCM. Block 203 is a high-pass pre-filter with fixed coefficients. It is a second-order pole-zero filter with the following transfer function. H hpf ( z ) = 0.924133 − 1.848267 z −1 + 0.924133 z −2 1 − 1.899109 z −1 + 0.905396 z − 2 This high-pass pre-filter removes undesirable low-frequency components from the input signal. 3.2 Short-Term Linear Predictive Analysis The high-pass filtered signal s(n ) is buffered at block 210, which performs short-term linear predictive analysis and quantization to obtain the coefficients for the short-term predictor 240 and the short-term noise feedback filter 250. This block 210 is further expanded in Figure 5. Refer to Figure 5. The input signal s(n ) is buffered in block 211, where a 20 ms asymmetric analysis window is applied to the buffered s(n ) signal array. The “left window” is 17.5 ms long, and the “right window” is 2.5 ms long. Let LWINSZ be the number of samples in the left window (LWINSZ = 140 for 8 kHz sampling), then the left window is given by wl ( n ) = 1 nπ 1 − cos , n = 1, 2, …, LWINSZ. 2 LWINSZ + 1 Let RWINSZ be the number of samples in the right window. Then, RWINSZ = 20 for 8 kHz sampling. The right window is given by ( n − 1)π wr ( n ) = cos , n = 1, 2, …, RWINSZ . 2 RWINSZ The concatenation of wl(n) and wr(n) gives the 20 ms asymmetrical analysis window, with the peak of the window located at the center of the current frame. When applying this analysis window, the last sample of the window is lined up with the last sample of the current frame. Therefore, the codec does not use any look ahead. -7- To block 221 To block 240 LSPI 210 a′i 217 218 a~i Bandwidth expansion Convert to predictor coefficients Apply window & calculate autocorrelation LSP quantizer r (i ) White noise correction & spectral smoothing li Convert to LSP ai 212 211 215 216 ~ li 213 rˆ(i ) LevinsonDurbin recursion 214 âi Bandwidth expansion α̂ i To block 250 βˆ i s(n ) Figure 5 Short-term linear predictive analysis and quantization (block 210) More specifically, without loss of generality, let the sampling time index range of n = 1, 2, …, FRSZ corresponds to the current frame, where the frame size FRSZ = 40. Then, the s(n) signal buffer stored in block 211 is for n = -119, -118, …, -1, 0, 1, 2, …, 40. The asymmetrical LPC analysis window function can be expressed as wl ( n + 120), w( n ) = wr ( n − 20), n = −119, − 118, ..., 20 . n = 21, 22, ..., 40 The windowing operation is performed as follows. sw ( n ) = s( n )w( n ), n = -119, -118, …, -1, 0, 1, 2, …, 40. Next, block 211 calculates the autocorrelation coefficients as follows. r (i ) = 40 ∑s w n = −119 + i ( n ) sw ( n − i ) , i = 0, 1, 2, …, 8. The calculated autocorrelation coefficients are passed to block 212, which applies a Gaussian window to the autocorrelation coefficients to perform spectral smoothing. The Gaussian window function is given by -8- ( − 2 π iσ / f s gw(i ) = e )2 2 , i = 1, 2, …, 8, where f s is the sampling rate of the input signal, expressed in Hz, and σ is 40 Hz. After multiplying the r(i) array by such a Gaussian window, block 212 then multiplies r(0) by a white noise correction factor of WNCF = 1 + ε , where ε = 0.0001. In summary, the output of block 212 is given by i=0 1.0001 × r(0), rˆ(i ) = i = 1,2,...,8 gw(i )r (i ), Block 213 performs the Levinson-Durbin recursion to convert the autocorrelation coefficients rˆ(i ) to the short-term predictor coefficients âi , i = 0, 1, …, 8. If the Levinson-Durbin recursion exits pre-maturely before the recursion is completed (for example, because the prediction residual energy E(i) is less than zero), then the short-term predictor coefficients of the last frame is also used in the current frame. To do the exception handling this way, there needs to be an initial value of the âi array. The initial value of the âi array is set to aˆ0 = 1 and âi = 0 for i = 1, 2, …, 8. The Levinson-Durbin recursion is performed in the following algorithm. 1. If rˆ(0) ≤ 0 , use the âi array of the last frame, and exit the Levinson-Durbin recursion. 2. E (0) = rˆ(0) 3. k1 = − rˆ(1) / rˆ(0) (1) 4. aˆ1 = k1 5. E (1) = (1 − k1 ) E (0) 6. If E (1) ≤ 0 , use the âi array of the last frame, and exit the Levinson-Durbin recursion. 7. For i = 2, 3, 4, …, 8, do the following 2 i −1 ki = aˆi − rˆ(i ) − ∑ aˆ j rˆ(i − j ) j =1 E (i − 1) (i ) = ki (i ) = aˆ j aˆ j ( i −1) ( i −1) ( i −1) + ki aˆi − j , for j = 1, 2, ..., i − 1 E (i ) = (1 − ki ) E (i − 1) If E (i ) ≤ 0, use the aˆi array of the last frame, and exit the Levinson - Durbin recursion. 2 If the recursion is exited pre-maturely, the âi array of the last frame is used as the output of block 213. If the recursion is completed successfully (which is normally the case), then the final output of block 213 is taken as -9- aˆ0 = 1 (8) aˆi = aˆi , for i = 1, 2, …, 8 Block 214 performs bandwidth expansion as follows ai = (0.96852)i aˆi , for i = 0, 1, …, 8. In addition, it also performs bandwidth expansion operations to derive the coefficients of the shortterm noise feedback filter (block 250). Block 250 in Figure 3 has a transfer function of 8 Fs ( z ) = N s ( z ) − 1 = ∑ βˆ z −i ∑αˆ −i i i =1 8 . i z i =0 Block 214 calculates the coefficients of Fs (z ) as αˆi = (0.85)i âi , for i = 0, 1, …, 8, βˆi = [(0.5)i − (0.85)i ] âi 3.3 , for i = 1, 2, …, 8. Conversion to LSP In Figure 5, block 215 converts the LPC coefficients ai , i = 1, 2,, 8 of the prediction error filter given by 8 A( z ) = 1 + ∑ ai z − i i =1 to a set of 8 Line-Spectrum Pair (LSP) coefficients li , i = 1, 2,, 8 . The LSP coefficients, also known as the Line Spectrum Frequencies (LSF), are the angular positions normalized to 1, i.e. 1.0 corresponds to the Nyquist frequency, of the roots of Ap ( z ) = A( z ) + z −9 A( z −1 ) and Am ( z ) = A( z ) − z −9 A( z −1 ) - 10 - on the upper half of the unit circle, z = e jω , 0 ≤ ω ≤ π , less the trivial roots in z = −1 and z = 1 of Ap (z ) and Am (z ) , respectively. Due to the symmetry and anti-symmetric of Ap (z ) and Am (z ) , respectively, the roots of interest can be determined as the roots of 4 G p (ω ) = ∑ g p ,i cos(iω ) i =0 and 4 Gm (ω ) = ∑ g m,i cos(iω ) i =0 where f p |m, 4 g p |m,i = 2 f p |m, 4 − i i=0 i = 1,,4 in which i=0 1.0 f p ,i = ai + a9 − i − f p ,i −1 i = 1,,4 and i=0 1.0 . f m,i = ai − a9−i + f p ,i −1 i = 1, ,4 The subscript "p|m" means dual versions of the equation exist, with either subscript "p" or subscript "m". The roots of Ap (z ) and Am (z ) , and therefore the roots of G p (ω ) and Gm (ω ) , are interlaced, with the first root belonging to G p (ω ) . The evaluation of the functions G p (ω ) and Gm (ω ) are carried out efficiently using Chebyshev polynomial series. x = cos(ω ) , With the mapping cos( mω ) = Tm (x ) where Tm ( x ) is the mth-order Chebyshev polynomial, the two functions G p (ω ) and Gm (ω ) can be expressed as 4 G p |m ( x ) = ∑ g p |m,i Ti ( x ) . i =0 - 11 - Due to the recursive nature of Chebyshev polynomials the functions can be evaluated as G p|m ( x ) = b p|m,0 ( x ) − b p|m, 2 ( x ) + g p|m,0 2 where bp|m,0 ( x ) and bp|m, 2 ( x ) are calculated using the following recurrence b p | m , i ( x ) = 2 x b p | m , i +1 ( x ) − b p | m , i + 2 ( x ) + g p | m , i with initial conditions b p|m,5 ( x ) = b p|m,6 ( x ) = 0 . The roots of G p (x ) and Gm (x ) are determined in an alternating fashion starting with a root in G p (x ) . Each root of G p (x ) and Gm (x ) is located by identifying a sign change of the relevant function along a grid of 60 points, given in Appendix 1. The estimation of the root is then refined using 4 bisections followed by a final linear interpolation between the two points surrounding the root. It should be noted that the roots and grid points are in the cosine domain. Once the 8 roots xi = cos(ω i ), i = 1, 2,, 8 are determined in the cosine domain, they are converted to the normalized frequency domain according to li = cos −1 ( xi ) π , i = 1, 2,, 8 in order to obtain the LSP coefficients. In the rare event that less than 8 roots are found, block 215 returns the LSP coefficients of the previous frame, li ( k − 1), i = 1, 2,, 8 , where the additional parameter k represents the frame index of the current frame. The LSP coefficients of the previous frame at the very first frame are initialized to li (0) = i / 9, 3.4 i = 1, 2,, 8 . LSP Quantization Block 216 of Figure 5 vector-quantizes and encodes the LSP coefficient vector, The output LSP quantizer index array, l = [l1 l2 l8 ] T , to a total of 14 bits. LSPI = {LSPI 1 , LSPI 2 }, is passed to the bit multiplexer (block 295), while the quantized LSP ~ ~ ~ ~ coefficient vector, l = [ l1 l2 l8 ] T , is passed to block 217. - 12 - The LSP quantizer is based on mean-removed inter-frame moving-average (MA) prediction with two-stage vector quantization (VQ) of the prediction error. The quantizer enables bit-error detection at the decoder by constraining the codevector selection at the encoder. It should be noted that the encoder must perform the specified constrained VQ in order to maintain interoperability properly. The first-stage VQ is searched using the simple mean-squared error (MSE) distortion criterion, while the second-stage sign-shape VQ is searched using the weighted mean-square error (WMSE) distortion criterion. 21614 LSP spacing l 2163 l̂ order MA prediction l Input LSP vector + ~e 22 + 21611 Second stage VQ 8th 21612 e1 + + - Quantized LSP vector ~e 2 21613 + Mean LSP vector ~ l + l 2162 21615 2165 ê1 + w First stage VQ 2166 e2 Regular 8 dimensional MSE VQ ~e 21 - 2164 + e 22 + 8 dimensional constrained WMSE VQ with signed codebook 2161 Calculate LSP weights w LSPI1 Index sub-quantizer 1 LSPI2 Index sub-quantizer 2 Figure 6 LSP quantizer (block 216) Block 216 is further expanded in Figure 6. The first-stage VQ takes place in block 2165, and the second-stage constrained sign-shape VQ takes place in block 21615. Except for the LSP quantizer indices LSPI 1 , LSPI 2 all signal paths in Figure 6 are for vectors of dimension 8. Block 2161 uses the unquantized LSP coefficient vector to calculate the weights to be used later in the second-stage WMSE VQ. The weights are determined as i =1 1 /( l2 − l1 ), wi = 1 / min(li − li −1 , li +1 − li ), 1 < i < 8 . 1 /(l − l ), i =8 M M −1 Basically, the i-th weight is the inverse of the distance between the i-th LSP coefficient and its nearest neighbor LSP coefficient. - 13 - Adder 2162 subtracts the constant LSP mean vector, l = [0.0950317 0.1489563 0.2513123 0.3629456 0.4780884 0.5877075 0.7058105 0.8007202] T , from the unquantized LSP coefficient vector to get the mean-removed LSP vector, e1 = l − l . In Figure 6, block 2163 performs 8th order inter-frame MA prediction of the mean-removed LSP vector e1 based on the ~e2 vectors in the previous 8 frames, where ~e2 is the quantized version of the inter-frame LSP prediction error vector 2. Let ~e2,i ( k ) denote the i-th element of the vector ~e2 in the frame that is k frames before the current frame. Let eˆ1,i be the i-th element of the inter-framepredicted mean-removed LSP vector ê1 . Then, block 2163 calculates the predicted mean-removed LSP vector according to T eˆ1,i = p LSP ,i ⋅ [~ e2,i (1) T ~ e2,i ( 2) ~ e2,i (3) ~ e2,i ( 4) ~ e2,i (5) ~ e2,i (6) ~ e2,i (7) ~ e2,i (8) ] , i = 1, 2, , 8 , where p LSP,i holds the 8 prediction coefficients for the i-th LSP coefficient and is given by p LSP ,1T = [ 1.040710 0.844971 0.682922 0.575989 p LSP ,2T = [ 1.034851 0.884094 0.723816 0.609863 p LSP ,3T = [ 1.055237 0.922180 0.762695 0.644531 p LSP ,4T = [ 1.076843 0.935608 0.790771 0.673523 p LSP ,5T = [ 1.065552 0.901978 0.746155 0.636047 0.464600 0.346008 0.226074 0.103577 ] 0.489563 0.512695 0.540588 0.514282 0.366516 0.373474 0.399841 0.386169 0.240234 0.238037 0.264221 0.256165 0.109253 ] 0.108337 ] 0.118774 ] . 0.117493 ] p LSP ,6T = [ 1.037476 0.848816 0.684326 0.577393 0.463684 0.347717 0.232666 0.107239 ] p LSP ,7T = [ 1.022278 0.809021 0.645081 0.535767 0.430481 0.325562 0.219055 0.099304 ] p LSP ,8T = [ 0.964844 0.743469 0.578125 0.484375 0.393250 0.297913 0.201416 0.091736 ] Adder 2164 calculates the prediction error vector e 2 = e1 − ê1 , which is the input to the first-stage VQ. In block 2165 the 8-dimensional prediction error vector, is vector quantized with the 128-entry, 8-dimensional codebook, e2 , { } , listed in Appendix 2. The codevector minimizing the MSE is CB1 = cb1 , cb1 ,, cb1 ~ denoted e21 and the corresponding index is denoted LSPI 1 : (0) (1) (127 ) {( k ∈{0 ,1,,127} 2 ) (e (k ) T LSPI1 = arg min e 2 − cb1 − cb1 (k ) 2 )}, At the first frame, the previous, non-existing, quantized interframe LSP prediction error vectors are set to zero-vectors. - 14 - ( LSPI ) ~ e21 = cb1 1 , where the notation I = arg min{D(i )} means that I is the argument that minimizes the entity i D(i ) , i.e. D( I ) ≤ D(i ) for all i . Adder 2166 subtracts the first-stage codevector from the prediction error vector to form the quantization error vector of the first stage, e 22 = e 2 − ~ e21 . This is the input to the second-stage VQ, which is a sign-shape VQ with a 2-entry, 1-dimensional sign codebook, S = {s0 , s1} = {− 1,+1}, and a 64-entry, 8-dimensional shape codebook, { } , listed in Appendix 3. The product codevector that minimizes the CB 2 = cb 2 , cb 2 ,, cb 2 WMSE, subject to the constraint that the 3 first elements of the intermediate quantized LSP vector, (0) (1) ( 63 ) l = ˆl + ~e2 , = l + eˆ 1 + ~e21 + ~e22 preserve the ordering property l1 l2 l3 ≥ ≥ ≥ 0 l1 , l2 is selected as, (I ) ~ e22 = sI sg cb 2 sh , where the indices are given by {I sg , I sh } = { arg min {i , k }∈ {h , j } h∈{0 ,1}, l1( h , j ) ≥ 0 , l 2( h , j ) ≥ l1( h , j ) , l 3( h , j ) ≥ l 2( h , j ) , j∈{0 ,1,, 63} {(e } and the weighting matrix is w1 W= 0 w2 - 15 - 0 . w8 22 − si cb 2 ) W (e (k ) T 22 − si cb 2 (k ) )}, ( h, j ) The symbol li is the i-th element of the reconstructed LSP vector l that is generated by using a sign index I sg = h and the j-th shape codevector in CB 2 . From the sign index, I sg , and the shape index, I sh , the index of the second stage VQ, LSPI 2 , is calculated as 127 − I sh , I sg = 0 LSPI 2 = , I sg = 1 I sh , In the unlikely event that no product codevector fulfills the constraint, the product codevector (0) ~ e22 = cb 2 is selected, and the index LSPI 2 = 0 is returned. Once the quantization is complete, the remaining operations of block 216 construct the quantized LSP vector from the codevectors, LSP mean, and MA prediction. Adder 21611 calculates the quantized prediction error vector by adding the stage 1 and stage 2 quantized vectors, ~ e2 = ~ e21 + ~ e22 . Adder 21612 adds the mean LSP vector and the predicted mean-removed LSP vector to obtain the predicted LSP vector, ˆl = l + eˆ . 1 Adder 21613 adds the predicted LSP vector and the quantized prediction error vector to get the intermediate reconstructed LSP vector, l = ˆl + ~e2 . Block 21614 checks the elements of the reconstructed LSP vector to enforce certain minimum spacing rules. It enforces a minimum value of 6 Hz for the smallest LSP coefficient, a maximum value of 3991 Hz for the largest LSP coefficient, and a minimum distance between neighboring LSP coefficients of 50 Hz. In the normalized domain of the LSP coefficients, the spacing requirement is given by ~ l1 ≥ 0.0015 ~ ~ li +1 − li ≥ 0.0125 i = 1, 2,, 7 . ~ l8 ≤ 0.99775 The spacing is carried out as follows: (i) The elements of the intermediate reconstructed LSP vector are sorted such that l1 ≤ l2 ≤ ≤ l8 . (ii) Set lmax = 0.91025 . - 16 - l1 < 0.0015 , set else if l1 > lmax , set ~ else set l1 = l1 . ~ l1 = 0.0015 . ~ l1 = lmax . (iii) If (iv) for i = 2, 3, … , 8 do the following: 1. 2. 3. 3.5 ~ lmin = li −1 + 0.0125 . Set lmax ← lmax + 0.0125 . ~ If li < lmin , set li = lmin . ~ else if li > lmax , set li = lmax . ~ else set li = li . Set Conversion to Short-Term Predictor Coefficients ~ Refer back to Figure 5. In block 217, the quantized set of LSP coefficients { li }, which is determined once a frame, is converted to the corresponding set of linear prediction coefficients { a~i }, the quantized linear prediction coefficients for the current frame. With the notation x p ,i xm,i ~ = cos(π l2i −1 ), i = 1, 2, 3, 4 ~ = cos(π l2i ), i = 1, 2, 3, 4 the 4 unique coefficients of each of the two polynomials Ap∆ ( z ) = Ap ( z ) /(1 + z −1 ) and Am∆ ( z ) = Am ( z ) /(1 − z −1 ) can be determined using the following recursion: For i = 1, 2, 3, 4, do the following : ( a ∆p |m,i = 2 a ∆p |m,i − 2 − x p |m,i a ∆p |m,i −1 ) a ∆p |m, j = a ∆p |m, j + a ∆p |m, j − 2 − 2 x p |m,i a ∆p |m, j −1 , j = i − 1, i − 2,, 1 with initial conditions a ∆p|m,0 = 1 and a ∆p|m, −1 = 0 . In the recursion above, {a ∆p ,i } and {am∆ ,i } are the sets of four unique coefficients of the polynomials Ap∆ (z ) and Am∆ (z ) , respectively. Similarly, let the two sets of coefficients {a p ,i } and {am,i } , each of 4 unique coefficients except for a sign on {am,i } , represent the unique coefficients of the polynomials Ap (z ) and Am (z ) , respectively. Then, {a p ,i } and {am,i } can be obtained from {a ∆p ,i } and {am∆ ,i } as a p ,i am , i = a ∆p ,i + a ∆p ,i −1 , i = 1, 2, 3, 4 = am∆ ,i − am∆ ,i −1 , i = 1, 2, 3, 4 - 17 - From Ap (z ) and Am (z ) , the polynomial of the prediction error filter is obtained as A p ( z ) + Am ( z ) ~ . A( z ) = 2 ~ In terms of the unique coefficients of Ap (z ) and Am (z ) , the coefficients {a~i } of A( z ) can be expressed as 1.0, i=0 a~i = 0.5 ( a p ,i + am,i ), i = 1, 2, 3, 4 0.5 ( a i = 5, 6, 7, 8 p , 9 − i − am , 9 − i ), where the tilde signifies that the coefficients correspond to the quantized LSP coefficients. Note that 8 ~ ~ z −i , A( z ) = 1 − Ps ( z ) = 1 + ∑ a i i =1 where 8 Ps ( z ) = −∑ a~i z − i i =1 is the transfer function of the short-term predictor block 240 in Figure 3. Block 218 performs further bandwidth expansion on the set of predictor coefficients { a~i } using a bandwidth expansion factor of γ 1 = 0.75. The resulting bandwidth-expanded set of filter coefficients is given by ai′ = γ 1 a~i , for i = 1, 2, …, 8. i This bandwidth-expanded set of filter coefficients { ai′ } is used to update the coefficients of the weighted short-term synthesis filter block 221 in Figure 7 (to be discussed later). This completes the description of short-term predictive analysis and quantization block 210 in Figure 3 and Figure 5. 3.6 Long-Term Linear Predictive Analysis (Pitch Extraction) In Figure 3, the long-term predictive analysis and quantization block 220 uses the short-term prediction residual signal d(n) of the current frame and its quantized version dq(n) in the previous - 18 - frames to determine the quantized values of the pitch period and the pitch predictor taps. This block 220 is further expanded in Figure 7 below. 220 228 short-term prediction error filter 221 223 222 weighted d(n) short-term dw(n) synthesis filter Low-pass filter to 800 Hz Decimate to 2 kHz sampling rate 224 dwd(n) First-stage pitch period search cpp 225 s(n) Second-stage pitch period search PPI ppt1 pp 227 Calculate long-term noise feedback filter coefficient 226 Pitch predictor taps quantizer λ PPTI ppt dq(n) To blocks 230, 260 & 265 To block 265 To block 230 & 260 Figure 7 Long-term predictive analysis and quantization (block 220) Now refer to Figure 7. Block 228 performs short-term prediction error filtering to get the shortterm prediction residual d(n) as follows. 8 d ( n ) = s( n ) + ∑ a~i s( n − i ) . i =1 The short-term prediction residual signal d(n) passes through the weighted short-term synthesis filter block 221, whose output is calculated as 8 dw( n ) = d ( n ) − ∑ ai′dw( n − i ) i =1 - 19 - The signal dw(n) is passed through a fixed low-pass filter block 222, which has a –3 dB cut off frequency at about 800 Hz. A 4th-order elliptic filter is used for this purpose. The transfer function of this low-pass filter is H lpf ( z ) = 0.0433083 − 0.0687180 z −1 + 0.0991097 z −2 − 0.0687180 z −3 + 0.0433083z −4 1 − 2.9580236 z −1 + 3.6337313z − 2 − 2.1249529 z −3 + 0.5003969 z − 4 Block 223 down-samples the low-pass filtered signal to a sampling rate of 2 kHz. This represents an 4:1 decimation. The first-stage pitch search block 224 then uses the decimated 2 kHz sampled signal dwd(n) to find a “coarse pitch period”, denoted as cpp in Figure 7. The time lag represented by cpp is in terms of number of samples in the 2 kHz down-sampled signal dwd(n). A pitch analysis window of 15 ms is used. The end of the pitch analysis window is aligned with the end of the current frame. At a sampling rate of 2 kHz, 15 ms correspond to 30 samples. Without loss of generality, let the index range of n = 1 to n = 30 correspond to the pitch analysis window for dwd(n). Block 224 first calculates the following values 30 c( k ) = ∑ dwd ( n )dwd ( n − k ) , n =1 30 E ( k ) = ∑ [dwd ( n − k )] , 2 n =1 c 2 ( k ), if c( k ) ≥ 0 c 2( k ) = 2 − c ( k ), if c( k ) < 0 for all integers from k = MINPPD - 1 to k = MAXPPD + 1, where MINPPD and MAXPPD are the minimum and maximum pitch period in the decimated domain, respectively, MINPPD = 2 sample and MAXPPD = 34 samples. Block 224 then searches through the range of k = MINPPD, MINPPD + 1, MINPPD + 2, …, MAXPPD to find all local peaks 3 of the array { c 2( k ) / E ( k ) } for which c(k) > 0. Let N p denote the number of such positive local peaks. Let k p ( j ) , j =1, 2, …, N p be the indices where c 2( k p ( j )) / E ( k p ( j )) is a local peak and c( k p ( j )) > 0, and let k p (1) < k p ( 2) < ... < k p ( N p ) . For convenience, the term c 2( k ) / E ( k ) will be referred to as the “normalized correlation square”. If N p = 0, the output coarse pitch period is set to cpp = MINPPD, and the processing of block 224 is terminated. If N p = 1, block 224 output is set to cpp = k p (1) , and the processing of block 224 is terminated. 3 A value is characterized as a local peak if both of the adjacent values are smaller. - 20 - If there are two or more local peaks ( N p ≥ 2 ), then block 224 uses Algorithms 3.8.1, 3.8.2, 3.8.3, and 3.8.4 (to be described below), in that order, to determine the output coarse pitch period cpp. Variables calculated in the earlier algorithms will be carried over and used in the later algorithms. Block 224 first uses Algorithm 3.8.1 below to identify the largest quadratically interpolated peak around local peaks of the normalized correlation square c 2( k p ) / E ( k p ) . Quadratic interpolation is performed for c( k p ) , while linear interpolation is performed for E ( k p ) . Such interpolation is performed with the time resolution for the sampling rate of the input speech (8 kHz). In the algorithm below, D denotes the decimation factor used when decimating dw(n) to dwd(n). Thus, D = 4. Algorithm 3.8.1 Find largest quadratically interpolated peak around c 2( k p ) / E ( k p ) : (i) Set c2max = -1, Emax = 1, and jmax = 0. (ii) For j =1, 2, …, N p , do the following 12 steps: [ Set b = 0.5 [c( k ] ( j ) − 1)] 1. Set a = 0.5 c( k p ( j ) + 1) + c( k p ( j ) − 1) − c( k p ( j )) 2. p ( j ) + 1) − c( k p 3. Set ji = 0 4. Set ei = E ( k p ( j )) 5. Set c 2m = c 2( k p ( j )) 6. Set Em = E ( k p ( j )) 7. If c 2( k p ( j ) + 1) E ( k p ( j ) − 1) > c 2( k p ( j ) − 1) E ( k p ( j ) + 1) , do the remaining part of step 7: ∆ = [ E ( k p ( j ) + 1) − ei ] D For k = 1, 2, … , D/2, do the following indented part of step 7: ci = a ( k / D ) 2 + b ( k / D ) + c( k p ( j )) ei ← ei + ∆ If (ci ) 2 Em > (c 2m) ei , do the next three indented lines: ji = k c 2m = (ci ) 2 Em = ei 8. If c 2( k p ( j ) + 1) E ( k p ( j ) − 1) ≤ c 2( k p ( j ) − 1) E ( k p ( j ) + 1) , do the remaining part of step 8: ∆ = [ E ( k p ( j ) − 1) − ei ] D For k = -1, -2, … , -D/2, do the following indented part of step 8: ci = a ( k / D ) 2 + b ( k / D ) + c( k p ( j )) ei ← ei + ∆ If (ci ) 2 Em > (c 2m) ei , do the next three indented lines: ji = k - 21 - c 2m = (ci ) 2 Em = ei 9. Set lag ( j ) = k p ( j ) + ji / D 10. Set c 2i ( j ) = c 2m 11. Set Ei ( j ) = Em 12. If c2m × Emax > c2max × Em, do the following three indented lines: jmax = j c2max = c2m Emax = Em (iii) Set the first candidate for coarse pitch period as cpp = k p ( jmax ) . The symbol ← indicates that the parameter on the left-hand side is being updated with the value on the right-hand side 4. To avoid picking a coarse pitch period that is around an integer multiple of the true coarse pitch period, a search through the time lags corresponding to the local peaks of c 2( k p ) / E ( k p ) is performed to see if any of such time lags is close enough to the output coarse pitch period of block 224 in the last frame, denoted as cpplast 5. If a time lag is within 25% of cpplast, it is considered close enough. For all such time lags within 25% of cpplast, the corresponding quadratically interpolated peak values of the normalized correlation square c 2( k p ) / E ( k p ) are compared, and the interpolated time lag corresponding to the maximum normalized correlation square is selected for further consideration. The following algorithm performs the task described above. The interpolated arrays c2i( j) and Ei( j) calculated in Algorithm 3.8.1 above are used in this algorithm. Algorithm 3.8.2 Find the time lag maximizing interpolated c 2( k p ) / E ( k p ) among all time lags close to the output coarse pitch period of the last frame: (i) Set index im = -1 (ii) Set c2m = -1 (iii) Set Em = 1 (iv) For j =1, 2, …, N p , do the following: If | k p ( j ) − cpplast | ≤ 0.25 × cpplast , do the following: If c 2i ( j ) × Em > c2m × Ei ( j ) , do the following three lines: im = j c 2m = c 2i ( j ) Em = Ei ( j ) 4 5 An equal sign is not applicable due to a potential mathematical conflict. For the first frame cpplast is initialized to 12. - 22 - Note that if there is no time lag k p ( j ) within 25% of cpplast, then the value of the index im will remain at –1 after Algorithm 3.8.2 is performed. If there are one or more time lags within 25% of cpplast, the index im corresponds to the largest normalized correlation square among such time lags. Next, block 224 determines whether an alternative time lag in the first half of the pitch range should be chosen as the output coarse pitch period. Basically, block 224 searches through all interpolated time lags lag( j) that are less than 16, and checks whether any of them has a large enough local peak of normalized correlation square near every integer multiple of it (including itself) up to 32. If there are one or more such time lags satisfying this condition, the smallest of such qualified time lags is chosen as the output coarse pitch period of block 224. Again, variables calculated in Algorithms 3.8.1 and 3.8.2 above carry their final values over to Algorithm 3.8.3 below. In the following, the parameter MPDTH is 0.065, and the threshold array MPTH(k) is given as MPTH(2) = 0.63, MPTH(3) = 0.48, MPTH(4) = 0.42, MPTH(5) = 0.36, and MPTH(k) = 0.30, for k > 5. Algorithm 3.8.3 Check whether an alternative time lag in the first half of the range of the coarse pitch period should be chosen as the output coarse pitch period: For j = 1, 2, 3, …, N p , in that order, do the following while lag( j) < 16: (i) If j ≠ im, set threshold = 0.73; otherwise, set threshold = 0.4. (ii) If c2i( j) × Emax ≤ threshold × c2max × Ei( j), disqualify this j, skip step (iii) for this j, increment j by 1 and go back to step (i). (iii) If c2i( j) × Emax > threshold × c2max × Ei( j), do the following: a) For k = 2, 3, 4, … , do the following while k × lag( j) < 32: 1. s = k × lag( j) 2. a = (1 – MPDTH) s 3. b = (1 + MPDTH) s 4. Go through m = j+1, j+2, j+3, …, N p , in that order, and see if any of the time lags lag(m) is between a and b. If none of them is between a and b, disqualify this j, stop step (iii), increment j by 1 and go back to step (i). If there is at least one such m that satisfies a < lag(m) ≤ b and c2i(m) × Emax > MPTH(k) × c2max × Ei(m), then it is considered that a large enough peak of the normalized correlation square is found in the neighborhood of the k- - 23 - th integer multiple of lag( j); in this case, stop step (iii) a) 4., increment k by 1, and go back to step (iii) a) 1. b) If step (iii) a) is completed without stopping prematurely, that is, if there is a large enough interpolated peak of the normalized correlation square within ±100×MPDTH% of every integer multiple of lag( j) that is less than 32, then stop this algorithm and stop the operation of block 224, and set cpp = k p ( j ) as the final output coarse pitch period of block 224. If Algorithm 3.8.3 above is completed without finding a qualified output coarse pitch period cpp, then block 224 examines the largest local peak of the normalized correlation square around the coarse pitch period of the last frame, found in Algorithm 3.8.2 above, and makes a final decision on the output coarse pitch period cpp using the following algorithm. Algorithm 3.8.4 performs this final decision. Again, variables calculated in Algorithms 3.8.1 and 3.8.2 above carry their final values over to Algorithm 3.8.4 below. In the following, the parameters are SMDTH = 0.095 and LPTH1= 0.79. Algorithm 3.8.4: Final decision of the output coarse pitch period (i) If im = -1, that is, if there is no large enough local peak of the normalized correlation square around the coarse pitch period of the last frame, then use the cpp calculated at the end of Algorithm 3.8.1 as the final output coarse pitch period of block 224, and exit this algorithm. (ii) If im = jmax, that is, if the largest local peak of the normalized correlation square around the coarse pitch period of the last frame is also the global maximum of all interpolated peaks of the normalized correlation square within this frame, then use the cpp calculated at the end of Algorithm 3.8.1 as the final output coarse pitch period of block 224, and exit this algorithm. (iii) If im < jmax, do the following indented part: If c2m × Emax > 0.43 × c2max × Em, do the following indented part of step (iii): a) If lag(im) > MAXPPD/2, set block 224 output cpp = k p (im) and exit this algorithm. b) Otherwise, for k = 2, 3, 4, 5, do the following indented part: 1. s = lag(jmax) / k 2. a = (1 – SMDTH) s 3. b = (1 + SMDTH) s - 24 - 4. If lag(im) > a and lag(im) < b, set block 224 output cpp = k p (im) and exit this algorithm. (iv) If im > jmax, do the following indented part: If c2m × Emax > LPTH1 × c2max × Em, set block 224 output cpp = k p (im) and exit this algorithm. (v) If algorithm execution proceeds to here, none of the steps above have selected a final output coarse pitch period. In this case, just accept the cpp calculated at the end of Algorithm 3.8.1 as the final output coarse pitch period of block 224. Block 225 takes cpp as its input and performs a second-stage pitch period search in the undecimated signal domain to get a refined pitch period pp. Block 225 first converts the coarse pitch period cpp to the undecimated signal domain by multiplying it by the decimation factor D, where D = 4. Then, it determines a search range for the refined pitch period around the value cpp × D. The lower bound of the search range is lb = max(MINPP, cpp × D – D + 1) , where MINPP = 10 samples is the minimum pitch period. The upper bound of the search range is ub = min(MAXPP, cpp × D + D – 1), where MAXPP is the maximum pitch period, which is 136 samples. Block 225 maintains a signal buffer with a total of MAXPP + 1 + FRSZ samples, where FRSZ is the frame size, which is 40 samples. The last FRSZ samples of this buffer are populated with the open-loop short-term prediction residual signal d(n) in the current frame. The first MAXPP + 1 samples are populated with the MAXPP + 1 samples of quantized version of d(n), denoted as dq(n), immediately preceding the current frame. For convenience of writing equations later, the symbol dq(n) will be used to denote the entire buffer of MAXPP + 1 + FRSZ samples, even though the last FRSZ samples are really d(n) samples. Again, let the index range from n = 1 to n = FRSZ denotes the samples in the current frame. After the lower bound lb and upper bound ub of the pitch period search range are determined, block 225 calculates the following correlation and energy terms in the undecimated dq(n) signal domain for time lags k within the search range [lb, ub]. c~( k ) = FRSZ ∑ dq(n)dq(n − k ) n =1 FRSZ ~ E ( k ) = ∑ dq( n − k ) 2 n =1 ~ The time lag k ∈ [lb, ub] that maximizes the ratio c~ 2 (k ) / E (k ) is chosen as the final refined pitch period. That is, - 25 - c~ 2 ( k ) pp = arg max ~ . k ∈[ lb , ub ] E ( k ) Once the refined pitch period pp is determined, it is encoded into the corresponding output pitch period index PPI, calculated as PPI = pp − 10 . Possible values of PPI are all integers from 0 to 126. Therefore, the refined pitch period pp is encoded into 7 bits, without any distortion. The value of PPI = 127 is reserved for signaling purposes and therefore is not used by the codec. Block 225 also calculates ppt1, the optimal tap weight for a single-tap pitch predictor, as follows c~( pp ) . ppt1 = ~ E ( pp ) ~ In the degenerate case where E ( pp ) = 0, ppt1 is set to zero. Block 227 calculates the long-term noise feedback filter coefficient λ as follows. ppt1 ≥ 1 0.5, λ = 0.5 × ppt1, 0 < ppt1 < 1 0, ppt1 ≤ 0 3.7 Long-Term Predictor Parameter Quantization Pitch predictor taps quantizer block 226 quantizes the three pitch predictor taps to 5 bits using vector quantization. The pitch predictor has a transfer function of 3 Pl ( z ) = ∑ bi z − pp + 2 − i , i =1 where pp is the pitch period calculated in Section 3.6. Rather than minimizing the mean-square error of the three taps b1 , b2 , and b3 as in conventional VQ codebook search, block 226 finds from the VQ codebook the set of candidate pitch predictor taps that minimizes the pitch prediction residual energy in the current frame. Using the same dq(n) buffer and time index convention as in block 225, and denoting the set of three taps corresponding to the j-th codevector, b j = [b j1 b j 2 b j 3 ]T , as { b j1 , b j 2 , b j 3 }, we can express such pitch prediction residual energy as - 26 - Ej = FRSZ ∑ n =1 2 3 dq ( n ) b ji dq( n − pp + 2 − i ) − ∑ i =1 . The codevector is selected from a 3-dimensional codebook of 32 codevectors, { b0 , b1 ,, b31} , listed in Appendix 4. The codevector that minimizes the pitch prediction residual energy is selected. The index of the selected codevector is given by PPTI = j* = arg min {E j } j∈{0 ,1,, 31} and the corresponding set of three quantized pitch predictor taps, denoted as ppt = {b1 , b2 , b3} in Figure 7, is given by b1 b = b . j* 2 b3 This completes the description of block 220, long-term predictive analysis and quantization. 3.8 Excitation Gain Quantization There is one residual gain for each frame. The unquantized residual gain is based on the pitch prediction residual of the frame and is quantized in an open-loop fashion in the base-2 logarithmic domain. The quantization of the residual gain is part of the prediction residual quantizer block 230 in Figure 3. Block 230 is further expanded in Figure 8. All the operations in Figure 8 are performed on a frame-by-frame basis. Block 300 in Figure 8 calculates the pitch prediction residual signal, given by 3 e( n ) = dq( n ) − ∑ bi dq( n − pp + 2 − i ), n = 1, 2, ..., FRSZ , i =1 where the same dq(n) buffer and time index convention of block 225 is used. That is, the current frame of dq(n) for n = 1, 2, …, FRSZ is the unquantized open-loop short-term prediction residual signal d(n). - 27 - dq(n), ppt, pp CI GI 309 300 uq(n) Estimate signal level Calculate pitch prediction residual lv(m -1) e(n) Residual quantizer codebook search Compare with threshold Calculate logarithmic gain 303 + + mrlg(m) + + - 305 lge(m) 311 307 lgeq(m) - Scale residual quantizer codebook gq(m) 306 Scalar quantizer 312 313 310 301 lg(m) 230 + 308 + lgq(m) Convert to linear gain 304 MA log-gain predictor 302 Log-gain mean value elg(m) lgmean u(n) Figure 8 Prediction residual quantizer (block 230) Block 301 calculates the residual gain in the base-2 logarithmic domain. First, the average power of the pitch prediction residual signal in the current frame, the m-th frame, is calculated as Pe ( m) = 1 FRSZ FRSZ ∑ e (n) 2 n =1 The logarithmic gain (log-gain) of the current frame is calculated as log P ( m), if Pe ( m) > 1 lg ( m) = 2 e . 0, if Pe ( m) ≤ 1 The long-term mean value of the log-gain is calculated off-line and stored in block 302. This loggain mean value is lgmean = 11.45752. The adder 303 calculates the mean-removed version of the log-gain as mrlg(m) = lg(m) - lgmean. The MA log-gain predictor block 304 is an 8th-order FIR filter with its memory initialized to zero at the very first frame. The coefficients of this log-gain predictor lgp(k), k = 1, 2, 3, …, 8, are fixed, as given below: - 28 - lgp(1) = 0.7801514 lgp(2) = 0.7377625 lgp(3) = 0.6150818 lgp(4) = 0.5926208 lgp(5) = 0.4674072 lgp(6) = 0.3635864 lgp(7) = 0.2378540 lgp(8) = 0.1286926 Block 304 calculates its output, the estimated log-gain, as elg ( m) = GPO ∑ lgp(k )lgeq(m − k ) , k =1 where GPO = 8 is the gain predictor order, and lgeq(m - k) is the quantized version of the log-gain prediction error at frame m – k. The adder 305 calculates the log-gain prediction error as lge(m) = mrlg(m) - elg(m). The scalar quantizer block 306 performs 4-bit scalar quantization of the resulting log-gain prediction error lge(m). The codebook entries of this gain quantizer, along with the corresponding codebook indices, are listed in Appendix 5. The operation of this quantizer is controlled by block 310, whose purpose is to achieve a good trade-off between clear-channel performance and noisychannel performance of the excitation gain quantizer. The operation of block 310 will be described later. For each temporarily quantized lgeq(m), the adders 307 and 308 together calculate the corresponding temporarily quantized log-gain as lgq(m) = lgeq(m) + elg(m) + lgmean Block 309 estimates the signal level based on the final quantized log-gain, to be determined later subject to the constraint imposed by block 310. Let lv(m) denote the output estimated signal level of block 309 at frame m. Since the final value of lgq(m) has not been determined yet at this point, block 310 can only use the estimated signal level at the last frame, namely, lv(m – 1). One way to think of this situation is that block 309 has a one-sample delay unit for its input lgq(m). At frame m, block 310 controls the quantization operation of block 306 based on lv(m – 1), lgq(m – 1), and lgq(m – 2) 6. It uses an NG × NGC gain change threshold matrix T(i, j), i = 1, 2, …, NG, j = 1, 2, …, NGC to limit how high lgq(m) can go. The parameter values are NG = 18 and NGC = 12. The threshold matrix T(i, j) is given in Appendix 6. 6 The initial values of lgq(m – 1) and lgq(m – 2) are 0, i.e. lgq(0)= 0 and lgq(-1)= 0. - 29 - Block 310 and block 306 work together to perform the quantization of lge(m) in the following way. First, the row index into the threshold matrix T(i, j) is calculated as lgq( m − 1) − lv ( m − 1) − GLB i= 2 , where GLB = –24, and the symbol . means “take the next larger integer” or “rounding to the nearest integer toward infinity”. If i > NG, i is clipped to NG. If i < 1, i is clipped to 1. Second, the column index into the threshold matrix T(i, j) is calculated as lgq( m − 1) − lgq( m − 2) − GCLB j= 2 , where GCLB = –8. If j > NGC, j is clipped to NGC. If j < 1, j is clipped to 1. Third, with the row and column indices i and j calculated above, a gain quantization limit is calculated as GL = lgq(m – 1) + T(i, j) – elg(m) – lgmean Fourth, block 306 performs normal scalar quantization of lge(m) into its nearest neighbor in the quantizer codebook. If the resulting quantized value is not greater than GL, this quantized value is accepted as the final quantized log-gain prediction error lgeq(m), and the corresponding codebook index is the output gain index GI m . On the other hand, if the quantized value is greater than GL, the next smaller gain quantizer codebook entry is compared with GL. If it is not greater than GL, it is accepted as the final output lgeq(m) of block 306, and the corresponding codebook index is accepted as GI m . However, if it is still greater the GL, then block 306 keeps looking for the next smaller quantizer codebook entry (in descending order of codebook entry value), until it finds one that is not greater than GL. In such a search, the first one (that is, the largest one) that it finds to be no greater than GL is chosen as the final output lgeq(m) of block 306, and the corresponding codebook index is accepted as GI m . In the rare occasion when all the gain quantizer codebook entries are greater than GL, then the smallest gain quantizer codebook entry is chosen as the final output lgeq(m) of block 306, and the corresponding codebook index (0 in this case) is chosen as the output GI m . The final gain quantizer codebook index GI m is passed to the bit multiplexer block 295 of Figure 3. Once the quantized log-gain prediction error lgeq(m) is determined in this way, adders 307 and 308 add elg(m) and lgmean to lgeq(m) to obtain the quantized log-gain lgq(m) as lgq(m) = lgeq(m) + elg(m) + lgmean - 30 - After this final quantized log-gain lgq(m) subject to the constraint imposed by block 310 is calculated, it is used by block 309 to update the estimated signal level lv(m). This value lv(m) is used by block 310 in the next frame (the (m + 1)-th frame). At frame m, after the final quantized log-gain lgq(m) is calculated, block 309 estimates the signal level using the following algorithm. The parameter values used are α = 4095/4096, β = 511/512, and γ = 255/256. At codec initialization, the related variables are initialized as: lmax(m - 1) = 100, lmin(m - 1) = 100, lmean(m - 1) = 12.5, lv(m - 1) = 17, and x(m - 1) =17. Algorithm for updating estimated long-term average signal level: (i) If lgq(m) > lmax(m - 1), set lmax(m) = lgq(m); otherwise; set lmax(m) = lmean(m - 1) + α [lmax(m - 1) - lmean(m - 1)]. (ii) If lgq(m) < lmin(m - 1), set lmin(m) = lgq(m); otherwise; set lmin(m) = lmean(m - 1) + α [lmin(m - 1) - lmean(m - 1)]. (iii) Set lmean(m) = β × lmean(m - 1) + (1 - β) [lmax(m) + lmin(m)]/2 . (iv) Set lth = lmean(m) + 0.2 [lmax(m) – lmean(m)] . (v) If lgq(m) > lth, set x(m) = γ × x(m - 1) + (1- γ)lgq(m), and set lv(m) = γ × lv(m - 1) + (1- γ) x(m); Otherwise, set x(m) = x(m - 1) and lv(m) = lv(m - 1). Block 311 converts the quantized log-gain lgq(m) to the quantized gain gq(m) in the linear domain as follows. gq( m) = 2 lgq ( m ) 2 Block 312 scales the residual vector quantization (also called excitation VQ) codebook by simply multiplying every element of every codevector in the excitation VQ codebook by gq(m). The resulting scaled codebook is then used by block 313 to perform Excitation VQ codebook search, as described in the next section. 3.9 Excitation Vector Quantization The excitation VQ codebook has a sign-shape structure, with 1 bit for sign and 4 bits for shape. The vector dimension is 4. Thus, there are 16 independent shape codevectors stored in the codebook, but the negated version of each shape codevector (i.e., the mirror image with respect to the origin) is also a valid codevector for excitation VQ. The 16 shape codevectors, along with the corresponding codebook indices, are listed in Appendix 7. - 31 - Block 313 in Figure 8 performs the excitation VQ codebook search using the filter structure shown in Figure 9, which is essentially a subset of the encoder shown in Figure 3. The only difference is that the prediction residual quantizer (block 230) in Figure 3 is replaced by block 248 in Figure 9, which is labeled as “scaled VQ codebook”. This scaled VQ codebook is calculated in Section 3.8. s(n) + + - 255 v(n) + + 248 275 u(n) Scaled VQ Codebook - 270 + dq(n) + sq(n) + sp(n) 280 - + ltnf(n) 245 285 uq(n) 240 Ps (z ) + q(n) 265 N l ( z) − 1 260 ppv(n) + + 253 stnf(n) 290 - + 250 Pl (z ) qs(n) N s ( z) − 1 Figure 9 Filter structure used in excitation VQ codebook search The four filters of blocks 240, 250, 260, and 265 have transfer functions given by 8 Ps ( z ) = −∑ a~i z − i (see Section 3.5), i =1 where a~i is the i-th coefficient of the quantized short-term prediction error filter; 8 N s ( z) − 1 = ∑ βˆ i z −i i =1 8 ∑αˆ (see Section 3.2); i z −i i =0 3 Pl ( z ) = ∑ bi z − pp + 2 − i , i =1 - 32 - where pp is the pitch period, and bi is the i-th long-term predictor coefficient; N l ( z ) − 1 = λ z − pp , where λ is the long-term noise feedback filter coefficient calculated in Section 3.6. Using the filter structure in Figure 9, block 313 in Figure 8 performs excitation VQ codebook search one excitation vector at a time. Each excitation vector contains four samples. The excitation gain gq(m) is updated once a frame. Each frame contains 10 excitation vectors. Therefore, for each frame, the same scaled VQ codebook is used in 10 separate VQ codebook searches corresponding to the 10 excitation vectors in that frame. Let n = 1, 2, 3, 4 denote the sample time indices corresponding to the current four–dimensional excitation vector. Before the excitation VQ codebook search for the current excitation vector starts, the high-pass filtered input s(n), n = 1, 2, 3, 4 has been calculated in Section 3.1. In addition, before the VQ codebook search starts, the initial filter states (also called “filter memory”) of the four filters in Figure 9 (blocks 240, 250, 260, and 265) are also known. All the other signals in Figure 9 are not determined yet for n = 1, 2, 3, 4. The basic ideas of the excitation VQ codebook search are explained below. Refer to Figure 9. Block 248 stores the N scaled shape codevectors, where N = 16. Counting also the negated version of each scaled shape codevector, it is equivalent to having 2N scaled codevectors available for excitation VQ. From these 2N scaled codevectors, block 248 puts out one scaled codevector at a time as uq(n), n = 1, 2, 3, 4. With the initial filter memories in blocks 240, 250, 260, and 265 set to what were left after vector-quantizing the last excitation vector, this uq(n) vector then “drives” the rest of the filter structure until the corresponding quantization error vector q(n), n = 1, 2, 3, 4 is obtained. The energy of this q(n) vector is calculated and stored. This process is repeated for each of the 2N scaled codevectors, with the filter memories reset to their initial values before the process is repeated each time. After all 2N codevectors have been tried, the scaled codevector that minimizes the energy of the quantization error vector q(n), n = 1, 2, 3, 4 is selected as the winning scaled codevector and is used as the VQ output vector. The corresponding output VQ codebook index is a 5-bit index consisting of a sign bit as the most significant bit (MSB), followed by 4 shape bits. If the winning scaled codevector is a negated version of a scaled shape codevector, then the sign bit is 1, otherwise, the sign bit is 0. The 4 shape bits are simply the binary representation of the codebook index of the winning shape codevector, as defined in Appendix 7. Note that there are 10 such excitation codebook indices in a frame, since each frame has 10 excitation vectors. These 10 indices are grouped in an excitation codebook index array, denoted as CI = {CI (1), CI ( 2),..., CI (10)} , where CI (k ) is the excitation codebook index for the k-th excitation vector in the current frame. This excitation codebook index array CI is passed to the bit multiplexer block 295 in Figure 3. Given a uq(n) vector (taking the value of one of the 2N scaled codevectors), the way to derive the corresponding energy of the q(n) vector is now described in more detail below. First, block 260 performs pitch prediction to produce the pitch-predicted vector ppv(n) as - 33 - 3 ppv ( n ) = ∑ bi dq( n − pp + 2 − i ) , n = 1, 2, 3, 4. i =1 Adder 285 then updates the dq(n) vector as dq(n) = uq(n) + ppv(n) , n = 1, 2, 3, 4. Next, block 240 and adder 245 together calculate short-term predicted speech vector sp(n) and quantized speech vector sq(n) as follows. For n = 1, 2, 3, 4, calculate sp(n) and sq(n) as follows: 8 sp( n ) = −∑ a~i sq( n − i ) i =1 sq( n ) = dq( n ) + sp( n ) Then, block 250 and adders 290, 253, and 255 work together to update the v(n) vector as follows. For n = 1, 2, 3, 4, calculate stnf(n) and v(n) as follows: 8 8 i =1 i =1 stnf ( n ) = ∑ βˆi [v ( n − i ) − dq( n − i )] − ∑αˆi stnf ( n − i ) v ( n ) = s( n ) − sp( n ) − stnf ( n ) Finally, the corresponding q(n) vector is calculated as q( n ) = v ( n ) − ppv ( n ) − λ q( n − pp ) − uq( n ) , n = 1, 2, 3, 4. The energy of the q(n) vector is calculated as 4 Eq = ∑ q 2 ( n ) . n =1 Such calculation from a given uq(n) vector to the corresponding energy term Eq is repeated 2N times for the 2N scaled VQ codevectors. After the winning scaled codevector that minimizes the Eq term is selected, the filter memories of blocks 240, 250, 260, and 265 are updated by using the filter memories that were left after the calculation of the Eq term for that particular winning codevector was done. Such updated filter memories become the initial filter memories used for the excitation VQ codebook search for the next excitation vector. - 34 - 3.10 Bit Multiplexing The bit multiplexer block 295 in Figure 3 packs the five sets of indices LSPI, PPI, PPTI, GI, and CI into a single bit stream. This bit stream is the output of the BraodVoice16 encoder. It is passed to the communication channel. Figure 10 shows the BV16 bit stream format in each frame. In Figure 10, the bit stream for the current frame is the shaded area in the middle. The bit stream for the last frame is on the left, while the bit stream for the next frame is on the right. Although the bit stream of different frames may not be sent next to each other in a packet voice system, this illustration is meant to show that time goes from left to right, and the 30 side information bits consisting of LSPI, PPI, PPTI, and GI goes before the excitation codebook indices CI(k), k =1, 2, …, 10 when the bit stream is transmitted in a serial manner. Note that for each index, the most significant bit (MSB) goes first (on the left), while the least significant bit (LSB) goes last. This completes the detailed description of the BV16 encoder. PPI PPTI GI 7 7 0 7 5 5 4 5 ... 5 30 7 7 7 5 80 bits ... LSPI2 LSPI1 CI(1) CI(2) CI(3) ... CI(10) Frame M - 1 Frame M Frame M + 1 (Previous Frame) (Current Frame) (Next Frame) Figure 10 Bit stream format - 35 - 4 4 DETAILED DESCRIPTION OF THE BV16 DECODER This section gives a detailed description of each functional block in the BV16 decoder shown in Figure 4. Those blocks or signals that have the same labels as their counterparts in the encoder of Figure 3 have the same meaning as those counterparts. 4.1 Bit De-multiplexing The bit de-multiplexer block 400 takes one frame of input bit stream at a time, and de-multiplexes, or separates, the five sets of indices LSPI, PPI, PPTI, GI, and CI from the current frame of input bit stream. As described in Section 3 above, LSPI contains two indices: a 7-bit first-stage VQ index and a 7-bit second-stage VQ index. PPI is a 7-bit pitch period index. PPTI is a 5-bit pitch predictor tap VQ index. GI is a 4-bit gain index, and CI contains ten 5-bit excitation VQ indices, each with 1 sign bit and 4 shape bits. 4.2 Long-Term Predictor Parameter Decoding The long-term predictor parameter decoder (block 410) decodes the indices PPI and PPTI. The pitch period is decoded from PPI as pp = PPI + 10 Let { b0 , b1 ,, b31} be the 3-dimensional, 32-entry codebook used for pitch predictor tap VQ, as listed in Appendix 4. Let b j be the j-th codevector in this codebook, where the subscript j is the codebook index listed in the first column of the table in Appendix 4. The three pitch predictor taps b1 , b2 , and b3 are decoded from PPTI as b1 b = b PPTI . 2 b3 4.3 Short-Term Predictor Parameter Decoding The short-term predictor parameter decoding takes place in block 420 of Figure 4. Block 420 receives the set of decoded LSP indices, LSPI = {LSPI 1 , LSPI 2 }, from the bit de-multiplexer, ~ block 400 in Figure 4. First, block 420 reconstructs the LSP coefficients, {li } , from the LSP indices, and then it produces the coefficients of the short-term prediction error filter, {a~i } , from the LSP coefficients according to the conversion procedure specified in Section 3.5. - 36 - 4206 ~e 2 8th order MA prediction 42012 TEI Index subquantizer 1 LSPI1 Index subquantizer 2 LSPI2 First stage VQ 4204 Mean LSP vector ~ e21 ~e 42016 + 4208 (1) l 42010 ~e 22 + + (1) 2 Second stage VQ 4209 l̂ + ~e ( 2 ) 2 4207 + 4205 Regular 8 dimensional inverse subquantizer Regular 8 dimensional inverse subquantizer with signed codebook ~e (1) 2 ê1 l 42014 42015 Buffer ( 2) l LP coefficients LSP to LP conversion {a~i } Check ordering property of lower 3 LSF pairs l TEI TEI 42013 ~ l LSP spacing 42011 Reconstructed LSP vector Figure 11 Short-term predictor parameter decoder (block 420) Block 420 of Figure 4 is expanded in Figure 11. The reconstruction of the LSP coefficients from the LSP indices is the inverse of the LSP quantization, and many operations have equivalents in Section 3.4 and Figure 6. The first-stage VQ is decoded in block 4204, and the second-stage split VQ is decoded in block 42016. In block 42016, the received index LSPI 2 is decoded into the sign index, 0, LSPI 2 > 63 , I sg = 1, LSPI 2 ≤ 63 and the shape vector index, 127 − LSPI 2 , LSPI 2 > 63 . I sh = LSPI 2 ≤ 63 LSPI 2 , From the sign and shape indices the reconstructed output of the second stage VQ is calculated as (I ) ~ e22 = sI sg cb 2 sh . From the index for the first stage VQ, block 4204 looks up the quantized first stage vector from (0) (1) (127 ) the codebook CB1 = cb1 , cb1 ,, cb1 , ( LSPI 1 ) ~ . e = cb { } 21 1 - 37 - Adder 4205 performs the equivalent operation of Adder 21611 in Figure 6. It adds the first-stage and second-stage vectors to obtain a first reconstructed prediction error vector, ~ e2(1) = ~ e21 + ~ e22 . Equivalent to block 2163 in Figure 6, block 4206 performs the 8th-order MA prediction of the mean-removed LSP vector according to T eˆ1,i = p LSP ,i [~ e2,i (1) T ~ e2,i ( 2) ~ e2,i (3) ~ e2,i ( 4) ~ e2,i (5) ~ e2,i (6) ~ e2,i (7) ~ e2,i (8) ] , i = 1, 2,, 8 , where ~e2,i ( k ) and p LSP,i are defined in Section 3.4. Adder 4207, equivalent to Adder 21612 in Figure 6, generates the predicted LSP vector by adding the mean LSP vector and the predicted mean-removed LSP vector, ˆl = l + eˆ . 1 Subsequently, adder 4208 adds the predicted LSP vector to the first reconstructed prediction error vector to obtain a first intermediate reconstructed LSP vector, (1) l = ˆl + ~ e2(1) . Adder 4209 subtracts the predicted LSP vector from a second intermediate reconstructed LSP l ( 2 ) , to calculate a second reconstructed prediction error vector ~ e2( 2 ) = l ( 2 ) − ˆl , to be used to update the MA predictor memory in the presence of bit-errors. Block 42010 determines the ordering property of the first 3 first intermediate reconstructed LSP coefficients, l1(1) l2(1) l3(1) ≥ ≥ ≥ 0 l1(1) , l2(1) This ordering property was enforced during the encoding operation of the constrained VQ of the second stage, block 21615 of Figure 6. If the ordering is found to be preserved, the TransmissionError-Indicator, TEI , is set to 0 to indicate that no bit-errors in the LSP bits have been detected. Otherwise, if it is not preserved, the Transmission-Error-Indicator is set to 1 to indicate the likely presence of bit-errors in the LSP bits. If the Transmission-Error-Indicator is 0, the switches 42011 and 42012 are in the left position, and they route the first reconstructed prediction error vector ~e2(1) and the first intermediate - 38 - reconstructed LSP vector l (1) to the reconstructed prediction error vector ~e2 and the intermediate reconstructed LSP vector l , respectively. Otherwise, if the Transmission-Error-Indicator is 1, the switches 42011 and 42012 are in the right position, and they route the second reconstructed prediction error vector ~ e2( 2 ) and the second intermediate reconstructed LSP vector l ( 2 ) to the reconstructed prediction error vector ~e2 and the intermediate reconstructed LSP vector l , respectively. Hence, the reconstructed prediction error vector and the intermediate reconstructed LSP vector are obtained as ~ e2(1) , if TEI = 0 ~ e2 = ~ ( 2 ) e2 , if TEI = 1 and l (1) , if TEI = 0 , l = ( 2) l , if TEI = 1 respectively. Block 42013 enforces LSP spacing; it is functionally identical to block 21614 in Figure 6, as specified in Section 3.4. Block 42014 buffers the reconstructed LSP vector for future use in the presence of bit-errors. The reconstructed LSP vector of the current frame becomes the second intermediate reconstructed LSP vector of the next frame, ( 2) ~ l ( k + 1) = l ( k ) , where the additional parameter k here represents the frame index of the current frame. For the very first frame the second intermediate reconstructed LSP vector is initialized to ( 2) T l = [1 / 9 2 / 9 8 / 9] The final step of the short-term predictor parameter decoding is to convert the reconstructed LSP coefficients to linear prediction coefficients. This operation takes place in block 42015, which is functionally identical to block 217 of Figure 5, described in Section 3.5. 4.4 Excitation Gain Decoding The excitation gain decoder is shown in Figure 12. It is part of block 430 in Figure 4. It decodes the gain index in GI into the corresponding decoded frame excitation gain gq(m) in the linear domain. All operations in Figure 12 are performed on a frame-by-frame basis. Refer to Figure 12. Let m be the frame index of the current frame, and assume the same convention for the frame index m as in Section 3.8. Block 501 decodes the 4-bit gain index GI m into the log-gain prediction error lgeq(m) using the codebook in Appendix 5. Switch 502 is normally in the upper position, connecting the output of block 501 to the input of block 503. - 39 - Then, the MA log-gain predictor (block 503) calculates the estimated log-gain for the current frame as elg ( m) = GPO ∑ lgp(k )lgeq(m − k ) , k =1 where GPO = 8, and lgp(k), k = 1, 2, …, GPO are the MA log-gain predictor coefficients given in Section 3.8. 507 Estimate signal level lv(m -1) 508 Compare with threshold 501 GI Gain prediction error decoder 509 506 505 lgeq(m) + + lgq'(m) Determine final decoded log-gain 510 lgq(m) Convert to linear gain qg(m) 502 lgeq'(m) - 503 MA log-gain predictor 511 512 + + + - + elg(m) lgmean 504 Log-gain mean value Figure 12 Excitation gain decoder Block 504 holds the long-term average log-gain value lgmean = 11.45752. Adders 505 and 506 adds elg(m) and lgmean, respectively, to lgeq(m), resulting in the temporarily decoded log-gain of lgq′(m) = lgeq(m) + elg(m) + lgmean . Block 507 is functionally identical to block 309 in Figure 8, described in Section 3.8. It is important to note that equivalently to the encoder, the log-gain value passed to block 507 for updating its estimate of the long-term average signal level is the final value of the decoded loggain lgq(m), i.e. after the threshold check of block 508 and potential log-gain extrapolation and substitution of block 509, respectively, as described below. - 40 - Block 508 calculates the row and column indices i and j into the threshold matrix T(i, j) in the same way as block 310 in Figure 8. Namely, the row index is calculated as lgq( m − 1) − lv ( m − 1) − GLB i= 2 , where GLB = –24. If i > NG, i is clipped to NG. If i < 1, i is clipped to 1. The column index is calculated as lgq( m − 1) − lgq( m − 2) − GCLB j= 2 , where GCLB = –8. If j > NGC, j is clipped to NGC. If j < 1, j is clipped to 1. Block 508 controls the actions of block 509 and switch 502 in the following way. If GI m = 0 or lgq′(m) ≤ T(i, j) + lgq(m – 1), then switch 502 is in the upper position, block 509 determines the final decoded log-gain as lgq( m) = lgq′( m) , and the filter memory in the MA log-gain predictor (block 503) is updated by shifting the old memory values by one position, and then assigning lgeq(m) to the newest position of the filter memory. If, on the other hand, GI m > 0 and lgq(m) > T(i, j) + lgq(m – 1), then the temporarily decoded loggain lgq′(m) is discarded, block 509 determines the final decoded log-gain as lgq( m) = lgq( m − 1) (by extrapolating the decoded log-gain of the last sub-frame); furthermore, switch 502 is moved to the lower position, adders 511 and 512 subtract lgmean and elg(m), respectively, from lgq(m) to get lgeq′( m) = lgq( m) − lgmean − elg ( m) , and this lgeq′(m) is used to update the newest position of the filter memory of block 503, after the old memory values are shifted by one position. Once the final decoded log-gain lgq(m) subject to the constraint imposed by block 509 is determined as described above, it is used by block 508 to update the estimated signal level lv(m). This value lv(m) is then used by block 509 in the next frame (the (m + 1)-th frame). - 41 - Block 510 converts final decoded log-gain lgq(m) to the linear domain as gq( m) = 2 4.5 lgq ( m ) 2 . Excitation VQ Decoding and Scaling The excitation codebook index array CI of each frame contains 10 excitation codebook indices, CI(k), k = 1, …, 10, each containing 1 sign bit and 4 shape bits. The excitation vectors are decoded vector-by-vector. Let gq(m) denote the decoded excitation gain in the linear domain for the current frame. In addition, let CI(k) denote the received excitation codebook index of the current excitation vector that needs to be decoded. This index assumes a value between 0 and 31. The most significant bit of this index is the sign bit. Therefore, if CI(k) < 16, the sign bit is 0; otherwise, the sign bit is 1. Let c j (n ), n = 1, 2, 3, 4 represent the j-th shape codevector in Appendix 7, with a shape codebook index of j. Furthermore, without loss of generality, let n = 1, 2, 3, 4 correspond to the sample time indices of the current vector. Then, in Figure 4, the decoded and scaled excitation vector, or uq(n), n = 1, 2, 3, 4, is obtained as gq( m) cCI ( k ) ( n ), n = 1, 2, 3, 4, uq( n ) = − gq( m) cCI ( k ) −16 ( n ), n = 1, 2, 3, 4, 4.6 if CI ( k ) < 16 if CI ( k ) ≥ 16 Long-Term Synthesis Filtering Let n = 1, 2, …, FRSZ correspond to the sample time indices of the current frame. In Figure 4, the long-term synthesis filter (block 455, consisting of block 440 and adder 450 in a feedback loop) performs sample-by-sample long-term synthesis filtering as follows. 3 dq( n ) = uq( n ) + ∑ bi dq( n − pp + 2 − i ) , n = 1, 2, … FRSZ. i =1 4.7 Short-Term Synthesis Filtering The short-term synthesis filter (block 475, consisting of block 460 and adder 470 in a feedback loop) performs sample-by-sample short-term synthesis filtering to obtain the output signal as follows. 8 sq( n ) = dq( n ) − ∑ a~i sq( n − i ) , n = 1, 2, … FRSZ. i =1 - 42 - 4.8 Example Postfilter This document specifies codec components that need to be clearly specified in order to foster interoperability. Decoder postfiltering is not a mandatory component of this BV16 Codec Specification, since such postfiltering does not affect bit-stream compatibility or encoder-decoder inter-operability. However, an example postfilter is described in this section for reference purposes only. An implementer of BV16 can utilize other postfilters without affecting interoperability. The example postfilter is an all-zero single tap pitch postfilter. The input to the pitch postfilter is the pitch period, pp, and the output signal, sq(n), from the short-term synthesis filter 7. In principle, the postfiltering is given by spf (n) = b pf (1) sq (n) + b pf (2) sq (n − pppf ) , n = 1, 2, … FRSZ, where spf(n) denotes the postfiltered output signal and pppf is the pitch period used for the pitch postfilter. First the pitch period of the decoder is refined by selecting the lag, pppf, corresponding to the highest squared normalized pitch correlation of the output signal in a ±4 sample range of the pitch period, pp, i.e. the lag, pppf, that maximizes, 2 FRSZ ∑ sq (n) sq (n − pppf ) n =1 , pppf = ppmin, ppmin+1, … , ppmax, Csq ( pppf ) = FRSZ FRSZ ∑ sq (n) sq (n) ∑ sq (n − pppf ) sq (n − pppf ) n =1 n =1 where ppmin = pp-4 and ppmax = pp+4, with the following constraints: if ppmin < MINPP: ppmin = MINPP, ppmax = MINPP+8, and similarly if ppmax > MAXPP: ppmax = MAXPP, ppmin = MAXPP-8. With the refined lag the normalized pitch correlation is calculated as Cpf = 7 FRSZ ∑ sq (n) sq (n − pppf ) n =1 FRSZ FRSZ ∑ sq (n) sq (n) ∑ sq (n − pppf ) sq (n − pppf ) n =1 n =1 At the first frame, the history of sq(n) is set to zero. - 43 - If the numerator is less than zero or the denominator is zero, the normalized pitch correlation is set to zero, Cpf = 0. Next, a running mean of the normalized pitch correlation is calculated as Crm(m) = 0.75 Crm(m − 1) + 0.25 Cpf , where Crm(m) is the running mean of the current frame, and Crm(m-1) is the running mean of the previous frame 8. Based on the normalized pitch correlation and the running mean of the normalized pitch correlation, the initial pitch postfilter tap is calculated as 0 a pf = 0.3 Cpf Crm(m) < 0.55 and Cpf < 0.8 otherwise . Subsequently, a scaling factor is calculated as FRSZ g pf = ∑ [sq(n)] 2 n =1 ∑ [sq(n) + a FRSZ n =1 pf sq (n − pppf ) ] . 2 It is set to one if either the numerator or the denominator is zero. The two pitch postfilter coefficients of the current (m-th) frame is calculated as b pf ,m (1) = g pf and b pf ,m (2) = g pf a pf . In practice, for the first Lint=20 samples of each frame, the impulse responses of adjacent pitch postfilters are interpolated while the pitch postfilter of the current frame is used for the remaining samples of the frame: spf (n) = b pf (1, n) sq (n) + b pf (2, n) sq (n − pppf m ) + b pf (3, n) sq (n − pppf m −1 ) , n = 1, 2, … FRSZ, where pppfm and pppfm-1 are the refined pitch period of the current and previous frames, respectively, and α ( n ) bpf , m (1) + [1 − α ( n )] bpf , m −1 (1) = bpf , m (1) α ( n ) bpf , m ( 2) bpf ( 2, n ) = bpf , m ( 2) [1 − α ( n )] bpf , m −1 ( 2) bpf (3, n ) = 0 bpf (1, n ) 8 For the first frame, running mean of the previous frame is set to zero, i.e. Crm(0)=0. - 44 - n ≤ L int n > L int n ≤ L int . n > L int n ≤ L int n > L int A linear interpolation between adjacent pitch postfilters 9 is used: α ( n) = 4.9 n . L int + 1 Example Packet Loss Concealment Similar to decoder postfiltering, packet loss concealment is not a mandatory component of this BV16 Codec Specification, since packet lost concealment does not affect bit-stream compatibility or encoder-decoder inter-operability. However, an example packet loss concealment technique is described in this section for reference purposes only. An implementer of BV16 can utilize other packet loss concealment techniques without affecting inter-operability. The example packet loss concealment technique utilizes the synthesis model of the decoder. In principle, all side information of the previous frame is repeated while the excitation of the cascaded long-term and short-term synthesis filters is from a random source, scaled to a proper level. Hence, with the additional index m denoting the m-th frame, during packet-loss: • • • The pitch period, pp, is set to the pitch period of the last frame 10: pp = ppm −1 . The pitch taps, b1 b2 and b3, are set to the pitch taps of the last frame 11. bi = bm −1,i , i=1,2,3. The short-term synthesis filter coefficients, a~ , i = 1,...,8 , are set to those of the last frame 12: i a~i = a~m −1,i , i=1,…,8. • A properly scaled random sequence is used as long-term synthesis filter excitation, uq(n), n = 1, 2, … FRSZ. The speech synthesis of the bad frame (part of lost packet) now takes place exactly as specified in Sections 4.6, 4.7, and 4.8 if the example postfilter is included. The random sequence is scaled according to uq (n) = g plc ⋅ Em −1 FRSZ ∑ [r (n)] ⋅ r (n) , n = 1, 2, … FRSZ, 2 n =1 where r(n), n = 1, 2, … FRSZ, is a random sequence, Em-1 is in principle the energy of the longterm synthesis filter excitation of the previous frame 13, and the scaling factor, gplc, is calculated as detailed below. 9 For the first frame, the parameters of the previous pitch postfilter are set to pppf0=100, b0(1)=1, b0(2)=0. If the first frame is lost a value of 100 is used for the pitch period. 11 If the first frame is lost the pitch taps are set to zero. 12 If the first frame is lost the short-term filter coefficients are set to zero. 10 - 45 - During good frames an estimate of periodicity is updated as perm = 0.5 perm −1 + 0.5 bs , where bs is the sum of the three pitch taps clipped at a lower threshold of zero and an upper threshold of one 14, while it is maintained during bad frames: perm = perm −1 . Based on the periodicity the scaling factor is calculated as g plc = −2 perm −1 + 1.9 with gplc clipped at a lower threshold of 0.1 and an upper threshold of 0.9. After synthesis of the signal output of a lost frame, memories of predictive quantizers are updated. The memory of the inverse LSP quantizer is updated with ~ e~2,i = I m −1,i − eˆ1,i − I i , i=1,2,…,8, ~ where eˆ1,i is given in Section 4.3, I i in Section 3.4, and I m −1,i denotes the i-th LSP coefficients of the (m-1)-th frame (as decoded according to Section 4.3 for a good frame, or repeated for a bad frame). The memory of the inverse gain quantizer is updated with lgeq(m) = lgq (m) − lgmean − elg (m) , where elg (m) is given in Section 4.4, lgmean in Section 3.8, and lgq(m) is calculated as Em −1 Em −1 log 2 FRSZ , if FRSZ > 1 . lgq (m) = Em −1 0 , if ≤1 FRSZ The level estimation for a bad frame is updated exactly as for a good frame, see Section 4.4. At the end of a good frame (after synthesis of the output) the estimate of periodicity is estimated as explained above, and the energy of the long-term synthesis filter excitation is updated as Em = 13 14 FRSZ ∑ [uq(n)] n =1 The energy is initialized to zero, i.e. E0=0. The estimate of periodicity is initialized to zero, i.e. per0=0. - 46 - 2 . At the end of the processing of a bad frame (after synthesis of the output and update of predictive quantizers), the energy of the long-term synthesis filter excitation and the long-term synthesis filter coefficients are scaled down when 8 or more consecutive frames are lost: E Em = m −1 2 (β Nclf ) Em −1 bm −1,i bm ,i = β Nclf bm −1,i Nclf < 8 , Nclf ≥ 8 Nclf < 8 , i=1,2,3, Nclf ≥ 8 where Nclf is the number of consecutive lost frames, and the scaling, β Nclf , is given by 1 − 0.02 ( Nclf − 7) 8 ≤ Nclf ≤ 57 . Nclf > 57 0 β Nclf = This will gradually mute the output signal when consecutive packets are lost for an extended period of time. - 47 - APPENDIX 1: GRID FOR LPC TO LSP CONVERSION Grid point 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 Grid value 0.9999390 0.9935608 0.9848633 0.9725342 0.9577942 0.9409180 0.9215393 0.8995972 0.8753662 0.8487854 0.8198242 0.7887573 0.7558899 0.7213440 0.6853943 0.6481323 0.6101379 0.5709839 0.5300903 0.4882507 0.4447632 0.3993530 0.3531189 0.3058167 0.2585754 0.2109680 0.1630859 0.1148682 0.0657349 0.0161438 -0.0335693 -0.0830994 -0.1319580 -0.1804199 -0.2279663 -0.2751465 -0.3224487 -0.3693237 -0.4155884 -0.4604187 -0.5034180 -0.5446472 -0.5848999 -0.6235962 -0.6612244 -0.6979980 -0.7336731 -0.7675781 -0.7998962 -0.8302002 -0.8584290 -0.8842468 -0.9077148 -0.9288635 -0.9472046 -0.9635010 -0.9772034 -0.9883118 -0.9955139 -0.9999390 - 48 - APPENDIX 2: FIRST-STAGE LSP CODEBOOK Index 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Element 1 -0.0059814 -0.0053177 -0.0009308 0.0031128 -0.0023270 -0.0081329 0.0006104 0.0090561 -0.0044785 0.0042114 -0.0039902 0.0035553 0.0032806 0.0010910 -0.0047226 -0.0000229 -0.0074081 -0.0019760 -0.0044403 -0.0001907 -0.0057526 -0.0022125 -0.0016632 0.0046158 -0.0006714 0.0114136 0.0089951 0.0116730 0.0061035 0.0069580 0.0065002 0.0147324 -0.0083008 -0.0031662 -0.0050354 0.0078888 -0.0038910 -0.0030060 -0.0017471 0.0070190 -0.0084000 -0.0012970 -0.0008011 0.0257950 0.0024414 0.0045166 -0.0064392 -0.0008316 -0.0227127 0.0003204 -0.0022583 0.0022202 -0.0029678 0.0019226 0.0002441 0.0024185 -0.0076904 0.0066910 -0.0050049 0.0057602 0.0025711 Element 2 -0.0075378 -0.0019302 -0.0001831 0.0013046 -0.0014496 -0.0077362 0.0040817 0.0081329 -0.0070572 0.0052719 -0.0037308 0.0042877 0.0013351 0.0050964 -0.0046234 0.0000000 -0.0089569 -0.0027161 -0.0059509 0.0020599 -0.0060425 -0.0046158 0.0000076 0.0114517 -0.0006714 0.0131760 0.0114975 0.0303574 0.0072174 0.0107727 0.0056458 0.0161285 -0.0135345 -0.0056992 -0.0077744 0.0054169 -0.0082321 -0.0051117 -0.0031509 0.0111084 -0.0120621 -0.0023575 0.0027390 0.0224075 0.0011520 0.0086136 -0.0072556 0.0018845 -0.0309753 -0.0007782 -0.0034180 0.0045013 -0.0052338 0.0010529 0.0029984 0.0418625 -0.0126266 0.0088730 -0.0121231 0.0139847 0.0053101 Element 3 -0.0113449 0.0037079 -0.0040741 0.0076218 -0.0036392 0.0091782 -0.0010300 0.0096436 -0.0158615 -0.0061417 -0.0103226 0.0199356 -0.0004501 0.0128632 0.0096664 0.0265427 -0.0175552 -0.0077667 -0.0128784 0.0160294 -0.0029755 -0.0083923 -0.0051346 0.0148926 -0.0119095 0.0045929 -0.0246811 0.0396042 0.0028000 0.0140839 0.0067139 0.0276260 -0.0167313 -0.0011444 -0.0103531 0.0038223 -0.0112686 0.0013962 -0.0094299 0.0142746 -0.0198364 0.0041809 -0.0027847 0.0190277 -0.0020218 0.0284348 0.0081406 0.0423431 -0.0029831 0.0007553 -0.0074692 0.0243607 0.0025406 0.0046844 0.0021286 0.0316925 -0.0251846 0.0157623 -0.0460205 0.0579147 0.0119553 Element 4 -0.0002670 -0.0106049 -0.0110474 -0.0042191 0.0071030 0.0048294 -0.0081787 0.0009613 0.0019913 0.0057449 -0.0064774 0.0078812 0.0149384 0.0091553 0.0042496 0.0128021 -0.0174561 -0.0104675 -0.0197525 0.0045853 -0.0092010 0.0117264 -0.0084305 0.0092087 -0.0186539 -0.0096207 -0.0092545 0.0238495 -0.0075989 0.0036621 0.0007935 0.0238800 -0.0003433 0.0063324 -0.0145035 -0.0016632 0.0100861 0.0250015 -0.0154495 0.0070648 -0.0063629 0.0055084 -0.0278168 0.0123291 0.0018616 0.0160980 0.0079956 0.0217514 -0.0045471 -0.0061646 -0.0160370 0.0083466 0.0110321 0.0322113 0.0054932 0.0256805 -0.0261307 0.0102997 -0.0182266 0.0351944 0.0070419 - 49 - Element 5 -0.0103607 -0.0021820 -0.0238800 -0.0073776 0.0026093 0.0101395 -0.0126343 0.0011063 0.0087204 0.0057068 -0.0049667 0.0031281 0.0076141 0.0088348 -0.0064697 0.0049896 -0.0057831 -0.0090866 -0.0304413 -0.0091476 0.0054550 0.0248260 -0.0128784 0.0188599 0.0112305 0.0138092 -0.0067444 0.0113144 0.0156174 0.0325394 0.0008087 0.0214386 -0.0090408 -0.0090790 -0.0191193 -0.0109177 -0.0043945 -0.0003738 -0.0188980 -0.0085373 0.0110550 0.0066299 -0.0051651 0.0018692 0.0149918 0.0127563 -0.0225372 0.0008698 -0.0044708 -0.0099792 -0.0401917 -0.0246048 0.0029221 0.0202255 -0.0150223 0.0141296 0.0040588 0.0193558 -0.0260468 -0.0040665 0.0170135 Element 6 -0.0055771 -0.0003815 -0.0042191 -0.0045013 -0.0172119 0.0007172 -0.0218582 -0.0042572 0.0005951 -0.0022202 -0.0043411 -0.0082245 0.0033264 0.0151443 -0.0039902 0.0054398 -0.0148010 -0.0027542 -0.0161133 -0.0058670 -0.0046692 0.0126343 -0.0196915 -0.0058212 -0.0053024 0.0076675 -0.0065155 0.0006714 0.0043716 0.0216980 -0.0099030 0.0131302 -0.0008469 0.0121918 -0.0035934 -0.0039520 -0.0049820 -0.0045395 -0.0264816 -0.0219345 0.0045700 0.0041122 -0.0065536 -0.0124512 0.0050735 0.0124054 -0.0159760 -0.0041199 -0.0003662 0.0272598 -0.0083847 0.0046997 -0.0056763 0.0150070 -0.0383453 0.0077591 0.0132675 0.0230255 -0.0259018 -0.0186386 0.0213165 Element 7 0.0091400 0.0100098 0.0014114 -0.0051651 -0.0009613 0.0030212 -0.0007629 0.0038910 0.0022583 0.0133896 -0.0066986 -0.0142746 -0.0038376 0.0096664 -0.0056915 0.0008698 0.0076141 0.0306244 0.0037613 0.0226593 -0.0137711 0.0082626 -0.0223007 0.0079727 0.0070267 0.0137863 -0.0055161 -0.0080719 -0.0073624 0.0056152 -0.0182724 0.0047607 -0.0017624 0.0022354 -0.0159454 -0.0170212 -0.0151062 0.0120697 -0.0149384 0.0042267 0.0082169 0.0141602 -0.0094833 -0.0261765 -0.0103073 0.0261307 -0.0059891 0.0085602 0.0006409 0.0179977 -0.0189896 -0.0021439 -0.0311356 0.0069733 -0.0137787 0.0154495 0.0196609 0.0201874 -0.0209122 -0.0284729 -0.0242462 Element 8 -0.0032730 0.0037460 -0.0061035 0.0158539 -0.0059662 0.0013885 -0.0092163 -0.0034485 -0.0074539 0.0077362 -0.0186844 -0.0015106 -0.0110245 0.0043411 -0.0162430 -0.0047150 0.0079803 0.0160751 0.0098114 0.0125122 -0.0035477 0.0001907 -0.0168076 0.0046082 -0.0016022 0.0142441 -0.0072098 0.0067749 -0.0141525 0.0061188 -0.0288086 -0.0047836 0.0161667 0.0048523 0.0042343 -0.0018616 0.0018616 0.0071411 0.0071030 0.0029221 0.0152664 0.0310822 -0.0070724 -0.0093994 0.0105972 0.0190277 -0.0012741 0.0102158 0.0024567 0.0155029 -0.0101929 0.0023041 -0.0081024 0.0021973 -0.0153046 0.0091095 0.0226059 0.0446930 -0.0175323 -0.0171432 -0.0078735 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 0.0176849 0.0018082 0.0040436 -0.0073166 0.0012283 -0.0025101 0.0082932 -0.0067825 -0.0006104 -0.0000534 0.0176697 -0.0064240 -0.0053406 -0.0037766 0.0014191 -0.0061264 -0.0086670 -0.0034256 -0.0020447 -0.0080795 0.0123291 -0.0029373 0.0141983 -0.0061722 -0.0051880 0.0019531 0.0283661 -0.0066528 0.0018539 -0.0135727 0.0099640 -0.0034714 -0.0011978 0.0023346 0.0088501 -0.0091782 -0.0042267 -0.0088577 0.0114822 -0.0047989 -0.0013351 -0.0027390 0.0343170 -0.0057678 -0.0029221 -0.0026627 0.0076294 -0.0038986 -0.0036087 0.0046082 0.0034485 -0.0314102 0.0007401 -0.0015106 0.0099411 -0.0020447 -0.0028152 0.0102158 0.0341263 -0.0082016 0.0022659 -0.0109406 0.0284500 -0.0009766 0.0068207 0.0341110 0.0054245 0.0186539 -0.0097580 0.0010452 -0.0054626 0.0077057 -0.0089111 -0.0049820 0.0012283 0.0149918 -0.0083618 -0.0044250 -0.0038528 0.0062637 -0.0116348 -0.0107346 -0.0035858 0.0006714 -0.0119095 0.0056229 -0.0080795 0.0176315 -0.0097046 -0.0111084 -0.0010757 0.0346222 -0.0088272 0.0035934 -0.0340042 0.0462646 -0.0054245 0.0003204 0.0034943 0.0262756 -0.0182266 -0.0044327 -0.0077744 0.0137863 -0.0082703 -0.0021439 -0.0010986 0.0352859 -0.0092545 -0.0022583 0.0002594 0.0422287 -0.0030670 0.0035782 0.0080490 0.0175858 -0.0573120 0.0014343 -0.0081024 0.0259781 -0.0056534 -0.0029221 0.0152206 0.0759430 -0.0157471 0.0080261 -0.0265427 0.0826721 -0.0013504 0.0242920 0.0360947 0.0223770 0.0682678 -0.0165558 -0.0012741 -0.0107498 0.0032272 -0.0184479 0.0121994 -0.0020981 0.0140686 -0.0160828 -0.0000076 -0.0154724 0.0232925 -0.0150681 0.0193634 -0.0015869 0.0414658 -0.0216980 -0.0075455 -0.0181351 0.0056763 -0.0205307 -0.0178680 -0.0006256 0.0130768 -0.0246811 -0.0006332 -0.0894012 0.0453796 -0.0114441 0.0082169 -0.0014572 0.0493164 -0.0430679 -0.0011826 -0.0206451 0.0051422 -0.0190659 0.0114136 -0.0103989 0.0218124 -0.0224991 0.0017624 -0.0224304 0.0109787 0.0042572 0.0424576 0.0111084 0.0821762 -0.0361557 0.0100021 -0.0302200 0.0389252 -0.0065460 0.0108337 -0.0010757 0.0464096 -0.0547714 0.0222321 -0.0845337 0.1126251 0.0082321 0.0580826 0.0325394 0.0096283 0.0692520 -0.0164719 -0.0147095 -0.0207672 -0.0084229 0.0017242 0.0176392 -0.0062180 0.0097809 -0.0105820 0.0004807 -0.0229874 0.0124817 0.0248032 0.0210648 -0.0019073 0.0249710 -0.0280533 -0.0211258 -0.0304947 -0.0044098 -0.0133286 0.0235138 -0.0082626 0.0101700 -0.0331345 -0.0124893 -0.0189590 0.0276489 0.0039444 0.0209274 0.0014343 0.0377655 0.0019302 0.0068283 -0.0217667 0.0017090 0.0137558 0.0434952 -0.0135803 0.0093842 -0.0150681 0.0089951 -0.0537415 0.0002823 0.0127869 0.0331345 -0.0065918 0.0272980 -0.0185471 0.0084229 -0.0461807 -0.0061798 -0.0004425 0.0616455 -0.0186920 0.0330963 -0.0575638 0.0229797 -0.0739746 0.0532761 0.0411072 0.0683975 - 50 - 0.0362167 -0.0214233 0.0290146 -0.0054932 0.0082169 -0.0277328 -0.0114975 -0.0064545 0.0069962 -0.0171661 0.0034790 0.0212021 0.0158310 0.0008011 0.0117035 0.0146561 0.0206528 -0.0142975 0.0197296 -0.0105896 -0.0128326 -0.0477219 -0.0164795 0.0058441 0.0195084 -0.0146942 0.0174866 0.0143738 0.0327225 -0.0093231 0.0160370 0.0178375 0.0536499 -0.0054932 0.0499496 0.0016556 0.0034714 -0.0166931 -0.0126801 -0.0056305 0.0037308 -0.0289612 -0.0054703 0.0306778 0.0164185 -0.0119095 0.0060196 0.0384750 0.0332794 -0.0519485 0.0156860 -0.0215912 0.0087585 -0.0730972 -0.0111618 -0.0104218 0.0292282 -0.0433731 0.0261612 0.0062866 0.0417252 -0.0471725 0.0114975 0.0223083 0.0789490 0.0317612 -0.0161209 0.0145493 0.0104904 0.0179520 0.0124207 0.0151215 -0.0217209 0.0066605 -0.0240707 0.0105209 0.0123367 0.0097198 0.0042114 0.0014648 0.0051270 0.0170822 0.0131302 0.0175705 -0.0002365 0.0001068 0.0010223 0.0085678 -0.0068512 0.0178680 -0.0227509 0.0197144 0.0057602 0.0176163 -0.0084381 0.0025406 0.0054550 0.0358963 -0.0127716 0.0279236 0.0016785 0.0279465 -0.0166321 -0.0010223 -0.0116806 -0.0042496 -0.0440826 -0.0086594 0.0213089 0.0172653 -0.0128479 0.0020523 0.0027542 0.0281830 -0.0157547 0.0159531 -0.0093765 0.0477676 -0.0303497 0.0061264 -0.0178070 0.0216827 -0.0584030 0.0209045 0.0029297 0.0456924 -0.0384445 -0.0184174 0.0020828 0.0621414 0.0233765 -0.0263824 0.0086975 0.0003052 0.0043182 0.0075836 0.0005341 -0.0024490 -0.0074310 0.0151367 0.0014572 -0.0018921 0.0251846 -0.0171432 -0.0106812 -0.0047836 0.0147781 -0.0151138 -0.0016098 -0.0009079 0.0061417 0.0070724 0.0159912 -0.0228195 0.0160370 -0.0364304 0.0173874 -0.0002747 0.0233994 -0.0090332 -0.0106049 -0.0171051 0.0197830 -0.0328522 -0.0018158 -0.0005188 0.0020370 -0.0041122 -0.0035172 -0.0150452 -0.0063400 0.0028915 0.0062637 0.0150299 0.0484390 -0.0069809 -0.0027847 -0.0117798 0.0280609 -0.0019455 0.0184174 -0.0097656 0.0178299 -0.0201721 0.0084991 -0.0585556 0.0112686 -0.0330887 0.0158920 0.0097733 0.0649796 -0.0217361 -0.0278168 -0.0291138 0.0445786 0.0178757 -0.0237961 0.0001144 -0.0026093 0.0050583 0.0025177 -0.0019226 -0.0019455 0.0169830 0.0081100 0.0027390 -0.0081329 0.0064545 -0.0225220 -0.0250015 -0.0073013 0.0120239 -0.0197067 0.0003967 0.0161667 0.0149689 0.0156937 0.0168228 0.0043335 0.0151443 0.0097427 0.0145874 -0.0007629 0.0193710 -0.0088577 -0.0192184 -0.0267639 0.0092850 -0.0487366 -0.0138168 -0.0010605 -0.0107651 -0.0111160 -0.0167770 -0.0211258 -0.0076370 0.0004730 0.0050507 0.0065155 0.0271988 -0.0067749 -0.0075226 -0.0054169 0.0207291 -0.0029144 0.0143356 -0.0018997 0.0107193 -0.0060349 -0.0016098 -0.0127106 0.0042725 -0.0182495 0.0113602 0.0060959 0.0428009 -0.0125504 -0.0385132 -0.0363312 0.0190887 126 127 0.0293198 0.0271988 0.0630722 0.0838776 0.0497131 0.1353760 -0.0120468 0.1022873 - 51 - -0.0440521 0.0741501 -0.0440979 0.0458984 -0.0299225 0.0275192 -0.0291214 -0.0002823 APPENDIX 3: SECOND-STAGE LSP SHAPE CODEBOOK Index 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 Element 1 -0.00045776 -0.00029755 -0.00118256 -0.00337219 0.00021362 -0.00189972 -0.00256348 -0.00487518 -0.00010681 -0.00064850 -0.00432587 -0.01011658 -0.00463867 -0.00424957 -0.00263214 0.00451660 0.00000763 -0.00207520 -0.00079346 -0.00298309 0.00035858 -0.00503540 -0.00057220 -0.00719452 -0.00290680 -0.00224304 -0.00428772 -0.01528931 -0.00177765 -0.00414276 -0.00677490 -0.00891113 -0.00096130 -0.00326538 -0.00380707 -0.00752258 0.00000763 -0.00630951 -0.00095367 -0.00719452 -0.00801086 -0.00186920 -0.00836945 -0.01739502 -0.00631714 -0.00720215 -0.00552368 0.00701904 0.00125885 -0.00152588 -0.00518036 -0.00656891 0.00264740 -0.00373077 -0.00197601 -0.01738739 -0.00734711 -0.00354767 -0.00668335 -0.03459930 -0.00075531 -0.00887299 Element 2 0.00002289 -0.00101471 -0.00199127 -0.00188446 0.00082397 -0.00291443 -0.00302124 -0.00572205 -0.00145721 -0.00139618 -0.00485229 0.00389099 -0.00547791 -0.00586700 -0.00723267 -0.01326752 0.00039673 -0.00434113 -0.00209808 -0.00412750 0.00170898 -0.00462341 -0.00221252 -0.01031494 -0.00582886 -0.00450134 -0.00616455 -0.00518799 -0.00195313 -0.00515747 -0.01074219 -0.01766968 -0.00098419 -0.00333405 -0.00453949 -0.00469208 -0.00075531 -0.00655365 -0.00313568 -0.01290894 -0.00989532 -0.00344086 -0.00853729 0.00542450 -0.01346588 -0.01295471 -0.01621246 -0.02751923 0.00099945 -0.00276184 -0.00488281 -0.00149536 0.00337219 -0.00283813 -0.01000214 -0.02180481 -0.00748444 -0.00787354 -0.00952911 0.00503540 -0.00411224 -0.02392578 Element 3 0.00099182 -0.00086212 -0.00380707 0.00494385 -0.00271606 -0.00379944 -0.00202942 0.00082397 -0.00476074 -0.00781250 -0.00765228 -0.00176239 -0.00512695 -0.00111389 -0.00938416 -0.00163269 0.00560760 0.00380707 0.00601196 0.01414490 0.00259399 0.00434875 -0.00485992 0.00247192 0.00511169 0.00026703 0.00080872 0.00449371 -0.00070953 -0.00126648 -0.01129150 0.00561523 -0.00413513 0.00357056 -0.00596619 0.00773621 -0.00958252 -0.00659943 -0.00857544 0.00675201 -0.00411224 -0.00266266 -0.00543976 -0.00310516 -0.01500702 -0.00642395 -0.01331329 0.00681305 0.00302887 0.01119232 0.00891876 0.02567291 -0.00964355 0.00484467 -0.00423431 0.00868988 0.01610565 -0.00128937 0.00102997 0.00631714 -0.02098846 -0.01149750 Element 4 0.00270081 0.00087738 -0.00403595 -0.00525665 -0.00733185 -0.00436401 0.00238037 0.00188446 0.00331116 -0.00374603 0.00385284 0.00331116 -0.00231171 0.00374603 0.00535583 -0.00040436 -0.00251770 0.00775909 -0.00038147 0.00212097 -0.01609802 -0.00257874 -0.00782013 -0.00617218 0.00539398 0.00598907 0.00531769 0.00463104 -0.01091003 0.00601959 -0.00234985 0.00578308 0.00173950 -0.00922394 -0.00867462 -0.00899506 -0.00546265 0.00173950 0.00354767 0.00386047 -0.00192261 -0.01030731 0.00074005 -0.00865936 -0.00602722 -0.00480652 0.01739502 0.00177765 -0.00566864 0.00526428 -0.00863647 -0.01044464 -0.03454590 -0.00320435 -0.00650787 -0.01496887 0.00942993 0.00877380 0.00148010 0.00146484 -0.01145172 0.03517151 - 52 - Element 5 0.00746155 -0.00106049 -0.00030518 0.00161743 0.00150299 -0.00100708 -0.00732422 -0.00714111 0.01039886 0.00415039 0.00276184 0.00072479 0.00556183 0.00692749 -0.00386810 0.00058746 0.00186157 -0.00274658 -0.00785828 -0.00019073 -0.00000763 0.00315094 -0.00888824 -0.00236511 0.00604248 0.00528717 -0.00407410 0.00074005 0.01485443 0.01933289 -0.00765991 0.00490570 0.00495911 0.00468445 -0.00395966 0.00354004 -0.00563049 -0.00452423 -0.01430511 -0.01235962 0.01290894 0.00197601 0.00202942 -0.00068665 0.00646210 0.01066589 -0.00738525 -0.00151062 -0.00022125 -0.00367737 -0.00762939 -0.00462341 0.00086975 0.00572968 -0.02683258 -0.00506592 0.01790619 -0.00132751 -0.00872040 0.00110626 0.03794098 0.02966309 Element 6 0.00529480 0.00087738 0.00240326 -0.00501251 0.00543213 -0.00173187 0.00712585 -0.00489044 -0.00167847 -0.00397491 -0.00205231 -0.00367737 0.00563049 -0.00564575 0.00198364 -0.00355530 0.01089478 0.00917053 0.00248718 -0.00061798 0.00162506 0.00467682 0.00464630 0.00193024 0.00421143 0.00506592 -0.00506592 0.00062561 0.00244141 0.00696564 -0.00885773 0.00194550 0.01183319 0.00469971 -0.00065613 -0.01000214 0.01355743 0.00329590 0.00208282 -0.00559235 -0.00526428 -0.00857544 -0.00057983 0.00004578 0.01339722 -0.01364136 0.00836182 -0.00057220 0.02838135 0.01557159 0.00898743 -0.00055695 0.01454163 0.00675201 0.01259613 -0.00038147 0.01776123 0.00061798 0.00305176 -0.00109100 0.02877808 0.00566864 Element 7 -0.00106049 -0.01163483 0.00474548 0.00176239 0.00144958 -0.00433350 0.00381470 -0.00206757 0.00269318 -0.00685883 0.00595856 -0.00161743 -0.00107574 -0.00616455 -0.00283813 -0.00116730 -0.00137329 -0.00515747 -0.00209808 -0.00274658 -0.00133514 0.00090790 -0.00598145 -0.00309753 0.00669861 -0.00660706 0.00903320 0.00030518 0.00580597 -0.00712585 0.00039673 -0.00356293 0.00845337 -0.01351166 0.00699615 0.00416565 0.00136566 -0.00886536 0.00795746 0.00556183 0.00534058 -0.01191711 0.01557922 0.00126648 -0.00647736 -0.01339722 0.00555420 0.00139618 -0.00177765 -0.01921844 0.00974274 -0.00331879 0.00080109 -0.00530243 -0.00359344 -0.00035858 0.01367950 -0.02446747 0.03495026 -0.00064850 0.01161194 -0.02129364 Element 8 -0.00178528 0.00027466 0.00889587 0.00527191 -0.00333405 -0.00628662 0.00191498 -0.00262451 0.00548553 0.00500488 -0.00077057 -0.00171661 0.00090027 -0.00193024 0.00453949 -0.00093842 0.00457001 -0.00215149 0.01184082 0.00065613 0.00100708 -0.01029968 0.00544739 0.00254059 0.00392151 0.01150513 0.00828552 0.00215912 -0.00041199 -0.00523376 0.00651550 -0.00082397 -0.00122833 -0.00124359 -0.00128174 -0.00489044 -0.01033783 -0.01154327 -0.00600433 -0.00572968 0.01023865 0.01605988 -0.00852203 0.00104523 -0.00049591 -0.00752258 0.00202942 0.00026703 0.00128937 -0.00600433 0.00371552 -0.00172424 0.00027466 -0.03117371 0.00352478 0.00020599 0.01322937 0.01464081 0.01872253 0.00020599 0.00054932 -0.02059937 62 63 -0.01180267 -0.02636719 -0.03713226 -0.04943848 -0.03850555 0.01272583 -0.00773621 0.01393127 - 53 - -0.01717377 0.00457001 -0.01065826 0.00045776 -0.00489044 0.00072479 -0.00129700 0.00040436 APPENDIX 4: PITCH PREDICTOR TAB CODEBOOK Index 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Element 1 -0.0112610 -0.2130735 0.1685180 0.1311035 0.2671205 0.2727660 0.1418760 0.4192810 -0.0790405 0.0102235 0.1182555 0.0584410 0.2677000 0.0510560 0.2883300 0.4607545 0.0267945 -0.1670530 0.0991210 0.0505675 0.0785520 -0.0176085 0.2232055 0.2735290 -0.0867615 -0.2120360 -0.1777955 -0.1222230 0.1735230 -0.0948180 0.3175050 0.0957640 Element 2 -0.1767275 0.4948120 -0.5930480 0.2366640 0.3766175 0.8498230 0.2827150 0.5174865 0.2878420 0.5396730 -0.3234560 0.4372560 0.6710510 0.9555970 0.4584655 0.6708375 0.1656190 0.6494750 -0.2931520 0.3623655 0.4996340 0.7084045 0.3234255 0.4712525 0.2794190 0.8054810 -0.3514100 0.4702455 0.5760195 0.9347840 0.4074400 0.7896425 Element 3 0.0010985 0.0315245 0.2047120 0.1258850 0.0750425 -0.1647035 -0.0781860 0.0445555 -0.0986025 -0.2112120 -0.1257935 0.0544435 -0.0020140 -0.0752870 -0.1362610 -0.1777040 0.0300600 0.4698790 0.1622010 0.2523805 0.4010620 0.2586975 0.2569580 0.2355345 0.1318055 0.3201295 0.0110170 0.2991335 0.1928100 0.1232910 0.2554320 0.0609740 - 54 - APPENDIX 5: GAIN CODEBOOK Index 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Element -5.38477 -3.68066 -2.76855 -2.09717 -1.47217 -0.33984 0.67285 1.82031 -0.88525 0.16748 1.20313 2.62549 3.80518 5.64551 8.70605 11.85156 - 55 - Relative log-gain of previous frame, [dB2] APPENDIX 6: GAIN CHANGE THRESHOLD MATRIX -24 to –22 -22 to –20 -20 to –18 -18 to –16 -16 to –14 -14 to –12 -12 to –10 -10 to –8 -8 to –6 -6 to –4 -4 to –2 -2 to 0 0 to 2 2 to 4 4 to 6 6 to 8 8 to 10 10 to 12 i=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8 i=9 i=10 i=11 i=12 i=13 i=14 i=15 i=16 i=17 i=18 -8 to -6 j=1 0.00000 0.00000 -1.96094 -1.96094 -1.47266 4.60547 10.66016 6.59375 2.64063 6.24805 6.18945 4.40430 3.39648 0.00000 0.00000 0.00000 0.00000 0.00000 -6 to -4 j=2 0.79102 13.85156 8.91211 8.66992 9.29297 12.33398 10.72656 10.19531 9.52539 8.26758 6.71875 5.46484 5.41602 3.50000 1.10156 -0.11914 0.00000 0.00000 -4 to -2 j=3 0.55664 1.73047 7.83594 13.53125 13.92578 14.09180 13.83203 13.34375 9.85547 8.78125 7.98438 6.17773 5.40039 4.60352 3.04492 -1.13672 0.00000 0.00000 -2 to 0 j=4 14.26563 13.76758 14.09961 14.09570 13.89063 14.14258 13.68359 12.87305 10.35938 9.08594 7.37109 6.04492 4.77734 3.92188 3.18945 1.41602 1.36861 0.52843 Log-gain change of previous frame, [dB2] 0 to 2 2 to 4 4 to 6 6 to 8 j=5 j=6 j=7 j=8 14.08398 0.00000 0.00000 0.00000 13.92773 0.00000 0.00000 0.00000 13.77930 0.91016 -2.41406 0.00000 13.95117 12.97461 2.14648 0.00000 13.87891 13.93750 12.20703 -4.99023 14.16016 13.48633 12.39063 2.01172 13.93945 13.77930 13.09570 10.17578 13.36719 13.36328 13.12891 12.66797 10.63086 12.92383 12.70508 12.65234 9.03125 10.34180 11.21875 11.07227 7.50391 7.69922 9.09180 8.73633 6.14063 6.84766 5.89063 5.43750 4.59375 4.63477 6.43359 3.54102 3.68164 4.21680 4.18750 3.32617 2.60156 2.43164 2.91016 1.48438 1.49609 0.72852 0.60352 -0.35352 1.18557 -0.36990 -4.01682 -2.21214 0.43190 0.00000 0.00000 -2.86324 - 56 - 8 to 10 j=9 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 -0.15430 0.72852 8.96680 8.32617 6.91211 4.67188 4.37891 3.38867 0.43555 -0.98242 0.00000 0.00000 10 to 12 j=10 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 -2.92578 0.30078 1.32422 8.41992 7.68750 5.58008 3.70117 2.32813 0.44336 -1.15039 -1.33077 0.00000 12 to 14 j=11 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 4.87109 4.86719 7.70313 7.22266 7.70898 6.64844 5.15039 1.50391 -1.99414 -3.04360 0.00000 14 to 16 j=12 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 7.85742 7.81445 7.86133 3.50977 7.46094 4.74414 1.76563 1.75391 0.00000 0.00000 0.00000 APPENDIX 7: EXCITATION VQ SHAPE CODEBOOK Index 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Element 1 -0.514526 0.389648 -0.263916 2.927368 -0.348755 1.912231 -0.180298 1.743286 -1.407593 2.324219 -0.330933 1.545532 -0.583740 0.587769 -1.211426 0.196289 Element 2 0.847412 1.125000 -0.053101 -0.262695 -0.356812 0.890869 -1.221802 -1.338379 1.109497 1.637939 -0.405396 -0.195068 0.456055 -0.129028 -0.743896 -1.870728 Element 3 0.166748 -1.070557 0.189209 -0.092896 -0.765747 -2.045654 -1.728760 0.184204 1.724487 0.742188 0.890747 0.148560 0.253296 0.616699 -0.608887 -0.309326 Element 4 0.120605 0.048584 0.177734 0.274292 -0.639038 -0.802124 -0.965210 -0.281128 -0.347900 0.526001 1.477661 0.073486 -1.269043 -0.256714 -0.219360 1.111694 - 57 -