ANSI SCTE24-232007

ENGINEERING COMMITTEE
Data Standards Subcommittee
AMERICAN NATIONAL STANDARD
ANSI/SCTE 24-23 2007
BV32 Speech Codec Specification for
Voice over IP Applications in Cable Telephony
NOTICE
The Society of Cable Telecommunications Engineers (SCTE) Standards are intended to serve the public
interest by providing specifications, test methods and procedures that promote uniformity of product,
interchangeability and ultimately the long term reliability of broadband communications facilities. These
documents shall not in any way preclude any member or non-member of SCTE from manufacturing or
selling products not conforming to such documents, nor shall the existence of such standards preclude their
voluntary use by those other than SCTE members, whether used domestically or internationally.
SCTE assumes no obligations or liability whatsoever to any party who may adopt the Standards. Such
adopting party assumes all risks associated with adoption of these Standards, and accepts full responsibility
for any damage and/or claims arising from the adoption of such Standards.
Attention is called to the possibility that implementation of this standard may require the use of subject
matter covered by patent rights. By publication of this standard, no position is taken with respect to the
existence or validity of any patent rights in connection therewith. SCTE shall not be responsible for
identifying patents for which a license may be required or for conducting inquiries into the legal validity or
scope of those patents that are brought to its attention.
Patent holders who believe that they hold patents which are essential to the implementation of this standard
have been requested to provide information about those patents and any related licensing terms and
conditions. Any such declarations made before or after publication of this document are available on the
SCTE web site at http://www.scte.org.
All Rights Reserved
© Society of Cable Telecommunications Engineers, Inc. 2007
140 Philips Road
Exton, PA 19341
ii
Contents
1
INTRODUCTION.................................................................................................................... 1
2
OVERVIEW OF THE BV32 SPEECH CODEC .................................................................. 1
2.1 Brief Introduction of Two-Stage Noise Feedback Coding (TSNFC) .......................................... 1
2.2 Overview of the BV32 Codec ......................................................................................................... 3
3
DETAILED DESCRIPTION OF THE BV32 ENCODER ................................................... 7
3.1 High-Pass Pre-Filtering .................................................................................................................. 7
3.2 Pre-emphasis Filtering .................................................................................................................... 7
3.3 Short-Term Linear Predictive Analysis ........................................................................................ 7
3.4 Conversion to LSP ........................................................................................................................ 10
3.5 LSP Quantization .......................................................................................................................... 12
3.6 Conversion to Short-Term Predictor Coefficients ..................................................................... 18
3.7 Short-Term Linear Prediction of Input Signal........................................................................... 19
3.8 Long-Term Linear Predictive Analysis (Pitch Extraction) ....................................................... 19
3.9 Long-Term Predictor Parameter Quantization ......................................................................... 27
3.10 Excitation Gain Quantization ...................................................................................................... 28
3.11 Excitation Vector Quantization ................................................................................................... 32
3.12 Bit Multiplexing............................................................................................................................. 35
4
DETAILED DESCRIPTION OF THE BV32 DECODER ................................................. 37
4.1 Bit De-multiplexing ....................................................................................................................... 37
4.2 Long-Term Predictor Parameter Decoding ................................................................................ 37
4.3 Short-Term Predictor Parameter Decoding ............................................................................... 37
4.4 Excitation Gain Decoding ............................................................................................................. 40
4.5 Excitation VQ Decoding and Scaling .......................................................................................... 43
4.6 Long-Term Synthesis Filtering .................................................................................................... 43
4.7 Short-Term Synthesis Filtering ................................................................................................... 43
4.8 De-emphasis Filtering ................................................................................................................... 44
4.9 Example Packet Loss Concealment ............................................................................................. 44
APPENDIX 1: GRID FOR LPC TO LSP CONVERSION ...................................................... 47
APPENDIX 2: FIRST-STAGE LSP CODEBOOK ................................................................... 49
APPENDIX 3: SECOND-STAGE LOWER SPLIT LSP CODEBOOK ................................. 52
APPENDIX 4: SECOND-STAGE UPPER SPLIT LSP CODEBOOK ................................... 53
APPENDIX 5: PITCH PREDICTOR TAB CODEBOOK ....................................................... 54
APPENDIX 6: GAIN CODEBOOK ........................................................................................... 55
iii
APPENDIX 7: GAIN CHANGE THRESHOLD MATRIX ..................................................... 56
APPENDIX 8: EXCITATION VQ SHAPE CODEBOOK ....................................................... 57
Figures
Figure 1 Basic codec structure of Two-Stage Noise Feedback Coding (TSNFC) ................... 2
Figure 2 Block diagram of the BV32 encoder ............................................................................ 4
Figure 3 Block diagram of the BV32 decoder ............................................................................ 5
Figure 4 BV32 short-term linear predictive analysis and quantization (block 10) ................ 8
Figure 5 BV32 LSP quantizer (block 16) ................................................................................. 13
Figure 6 BV32 long-term predictive analysis and quantization (block 20) ........................... 20
Figure 7 Prediction residual quantizer (block 30) ................................................................... 28
Figure 8 Filter structure used in BV32 excitation VQ codebook search ............................... 33
Figure 9 BV32 bit stream format .............................................................................................. 36
Figure 10 BV32 short-term predictor parameter decoder (block 120)................................. 38
Figure 11 Excitation gain decoder ............................................................................................ 41
Tables
Table 1 Bit allocation of the BV32 codec .................................................................................... 6
iv
1
INTRODUCTION
This document contains the description of the BV32 speech codec 1. BV32 compresses 16 kHz
sampled wideband speech to a bit rate of 32 kb/s (kilobits per second) by employing a speech coding
algorithm called Two-Stage Noise Feedback Coding (TSNFC), developed by Broadcom.
The rest of this document is organized as follows. Section 2 gives a high-level overview of TSNFC
and BV32. Sections 3 and 4 give detailed description of the BV32 encoder and decoder,
respectively. The BV32 codec specification given in Sections 3 and 4 contain enough details to
allow those skilled in the art to implement bit-stream compatible and functionally equivalent BV32
encoder and decoder.
2
OVERVIEW OF THE BV32 SPEECH CODEC
In this section, the general principles of Two-Stage Noise Feedback Coding (TSNFC) are first
introduced. Next, an overview of the BV32 algorithm is given.
2.1
Brief Introduction of Two-Stage Noise Feedback Coding (TSNFC)
In conventional Noise Feedback Coding (NFC), the encoder modifies a prediction residual signal by
adding a noise feedback signal to it. A scalar quantizer quantizes this modified prediction residual
signal. The difference between the quantizer input and output, or the quantization error signal, is
passed through a noise feedback filter. The output signal of this filter is the noise feedback signal
added to the prediction residual. The noise feedback filter is used to control the spectrum of the
coding noise in order to minimize the perceived coding noise. This is achieved by exploiting the
masking properties of the human auditory system.
Conventional NFC codecs typically only use a short-term noise feedback filter to shape the spectral
envelope of the coding noise, and a scalar quantizer is used universally. In contrast, Broadcom’s
Two-Stage Noise Feedback Coding (TSNFC) system uses a codec structure employing two stages of
noise feedback coding in a nested loop: the first NFC stage performs short-term prediction and
short-term noise spectral shaping (spectral envelope shaping), and the second nested NFC stage
performs long-term prediction and long-term noise spectral shaping (harmonic shaping). Such a
nested two-stage NFC structure is shown in Figure 1 below.
1
The “BV32 speech codec” specification is based on Broadcom Corporation’s BroadVoice®32 speech codec. Implementation of this
standard may require a license of Broadcom patents; information regarding these patents, and a declaration of licensing intent, may be
found at the SCTE web site.
1
Input
signal s(n ) d (n )
+
+
+
-
u(n )
v (n )
+
+
Quantizer
-
+
Ps (z )
short-term
predictor
+
Nl (z) −1
+
vq(n)
uq(n )
+
Output
signal
-
Ps (z )
q(n )
short-term
predictor
long-term
noise feedback
filter
+
+
sq(n )
Pl (z )
+
-
long-term
predictor
qs(n )
Fs (z )
short-term
noise feedback
filter
Figure 1 Basic codec structure of Two-Stage Noise Feedback Coding (TSNFC)
In Figure 1 above, the outer layer (including the two short-term predictors and the short-term noise
feedback filter) follows the structure of the conventional NFC codec. The TSNFC structure in Figure
1 is obtained by replacing the simple scalar quantizer in the conventional (single-stage) NFC
structure by a “predictive quantizer” that employs long-term prediction and long-term noise spectral
shaping. This “predictive quantizer” is represented by the inner feedback loop in Figure 1, including
the long-term predictor and long-term noise feedback filter. This inner feedback loop uses an
alternative but equivalent conventional NFC structure, where N l (z ) represents the filter whose
frequency response is the desired noise shape for long-term noise spectral shaping. In the outer
layer, the short-term noise feedback filter Fs (z ) is usually chosen as a bandwidth-expanded version
of the short-term predictor Ps (z ) . The choice of different NFC structures in the outer and inner
layers is based on complexity consideration. By combining two stages of NFC in a nested loop, the
TSNFC in Figure 1 can reap the benefits of both short-term and long-term prediction and also
achieve short-term and long-term noise spectral shaping at the same time.
It is natural and straightforward to use a scalar quantizer in Figure 1. However, to achieve better
coding efficiency, a vector quantizer is used in BV32. In the Vector Quantization (VQ) codebook
search, the u(n ) vector cannot be generated before the VQ codebook search starts. Due to the
feedback structure in Figure 1, the elements of u(n ) from the second element on will depend on the
vector-quantized version of earlier elements. Therefore, the VQ codebook search is performed by
trying out each of the candidate codevectors in the VQ codebook (i.e. fixing a candidate uq(n )
vector first), calculating the corresponding u(n ) vector and the corresponding VQ
error q( n ) = u( n ) − uq( n ) . The VQ codevector that minimizes the energy of q(n ) within the current
2
vector time span is chosen as the winning codevector, and the corresponding codebook index
becomes part of the encoder output bit stream for the current speech frame.
The TSNFC decoder structure is simply a quantizer decoder followed by the two feedback filter
structures involving the long-term predictor and the short-term predictor, respectively, shown on the
right half of Figure 1. Thus, the TSNFC decoder is similar to the decoders of other predictive
coding techniques such as Adaptive Predictive Coding (APC), Multi-Pulse Linear Predictive Coding
(MPLPC), and Code-Excited Linear Prediction (CELP).
2.2
Overview of the BV32 Codec
The BV32 codec is a purely forward-adaptive TSNFC codec. It operates at an input sampling rate of
16 kHz and an encoding bit rate of 32 kb/s, or 2 bits per sample. The BV32 uses a frame size of 5
ms, or 80 samples. There is no look ahead. Therefore, the total algorithmic buffering delay is just
the frame size itself, or 5 ms. The main design goal of BV32 is to make the coding delay and the
codec complexity as low as possible, while maintaining essentially transparent output speech
quality.
Due to the small frame size, the parameters of the short-term predictor (also called the “LPC
predictor”) and the long-term predictor (also called the “pitch predictor”) are both transmitted and
updated once a frame. Each 5 ms frame is divided equally into two 2.5 ms sub-frames (40 samples
each). The gain of the excitation signal is transmitted once every sub-frame. The excitation VQ uses
a vector dimension of 4 samples. Hence, there are 10 excitation vectors in a sub-frame, and 20
vectors in a frame. Figure 2 shows a block diagram of the BV32 encoder. More detailed description
of each functional block will be given in Section 3.
3
LSPI
10
20
95
Short-term
predictive
analysis &
quantization
Long-term
predictive
analysis &
quantization
PPI
Output bit stream
Bit
multiplexer
PPTI
GI
CI
e(n)
45
s(n)
+
+
+
-
40
5
d(n)
30
55
v(n)
75
+
+
85
Prediction
residual
quantizer
u(n)
-
dq(n)
+
65
70
Preemphasis
filter
+
ltnf(n)
q(n)
Long-term
noise
feedback
filter
60
ppv(n)
Input
signal High-pass
pre-filter
+
80
-
+
Shortterm
predictor
uq(n)
3
+
stnf(n)
+
Longterm
predictor
90
-
50
Short-term
noise feedback
filter
qs(n)
Figure 2 Block diagram of the BV32 encoder
The BV32 encoder first passes the input signal through a fixed pole-zero high-pass pre-filter to
remove possible DC bias or low frequency rumble. The filtered signal is then further passed through
a fixed pole-zero pre-emphasis filter that provides a general high-pass spectral tilt. The resulting
pre-emphasized signal is then used to derive the LPC predictor coefficients.
To keep the complexity low, the BV32 uses a relatively low LPC predictor order of 8, and the LPC
analysis window is only 10 ms (160 samples) long. The LPC analysis window is asymmetric, with
the peak of the window located at the center of the current frame, and the end of the window
coinciding with the last sample of the current frame. Autocorrelation LPC analysis based on
Levinson-Durbin recursion is used to derive the coefficients of the 8th-order LPC predictor. The
derived LPC predictor coefficients are converted to Line-Spectrum Pair (LSP) parameters, which are
then quantized by an inter-frame predictive coding scheme.
The inter-frame prediction of LSP parameters uses an 8th-order moving-average (MA) predictor.
The MA predictor coefficients are fixed. The time span that this MA predictor covers is 8 × 5 ms =
40 ms. The inter-frame LSP prediction residual is quantized by a two-stage vector quantizer. The
first stage employs an 8-dimensional vector quantizer with a 7-bit codebook. The second stage uses
4
a split vector quantizer with a 3-5 split and 5 bits each. That is, the first three elements are vector
quantized to 5 bits, and the remaining 5 elements are also quantized to 5 bits.
For long-term prediction, a three-tap pitch predictor with an integer pitch period is used. To keep
the complexity low, the pitch period and the pitch taps are both determined in an open-loop fashion.
The three pitch predictor taps are jointly quantized using a 5-bit vector quantizer. The distortion
measure used in the codebook search is the energy of the open-loop pitch prediction residual. The
32 codevectors in the pitch tap codebook have been “stabilized” to make sure that they will not give
rise to an unstable pitch synthesis filter.
The excitation gain is also determined in an open-loop fashion to keep the complexity low. The
average power of the open-loop pitch prediction residual within the current sub-frame is calculated
and converted to the logarithmic domain. The resulting log-gain is then quantized using intersubframe MA predictive coding. The MA predictor order for the log-gain is 16, corresponding to a
time span of 16 × 2.5 = 40 ms. Again, the log-gain MA predictor coefficients are fixed. The loggain prediction residual is quantized by a 5-bit scalar quantizer.
The 4-dimensional excitation VQ codebook has a simple sign-shape structure, with 1 bit for sign,
and 5 bits for shape. In other words, only 32 four-dimensional codevectors are stored, but the mirror
image of each codevector with respect to the origin is also a codevector.
In the BV32 decoder, the decoded excitation vectors are scaled by the excitation gain. The scaled
excitation signal passes through a long-term synthesis filter and a short-term synthesis filter, and
finally through a fixed pole-zero de-emphasis filter which is the inverse filter of the pre-emphasis
filter in the encoder. Figure 3 shows the block diagram of the BV32 decoder.
Long-term
synthesis filter
100
Input
bit stream
130
GI
Bit
demultiplexer
CI
Prediction
residual uq(n)
quantizer
decoder
110
PPI
PPTI
155
Short-term
synthesis filter
170
150
sq(n)
dq(n)
+
+
160
140
Longterm
predictor
Shortterm
predictor
Long-term
predictive
parameter
decoder
120
Short-term
predictive
parameter
decoder
LSPI
Figure 3 Block diagram of the BV32 decoder
5
175
180
Deemphasis
filter
Output
signal
Table 1 shows the bit allocation of BV32 in each 5 ms frame. The LSP parameters are encoded into
17 bits per frame, including 7 bits for the first-stage VQ, and 5 + 5 = 10 bits for the second-stage
split VQ. The pitch period and pitch predictor taps are encoded into 8 and 5 bits, respectively. The
two excitation gains in each frame are encoded into 5 + 5 = 10 bits. The 20 excitation vectors are
each encoded with 1 bit for sign and 5 bits for shape, resulting in 120 bits per frame for excitation
VQ. Including the other 40 bits of side information, the grand total is 160 bits per 80-sample frame,
which is 2 bits/sample, or 32 kb/s.
Parameter
Bits per frame (80 samples)
LSP
7 + (5 + 5) = 17
Pitch Period
8
3 Pitch Predictor
5
Taps
2 Excitation Gains
5 + 5 = 10
20 Excitation Vectors
(1 + 5) × 20 = 120
Total
160
Table 1 Bit allocation of the BV32 codec
6
3
DETAILED DESCRIPTION OF THE BV32 ENCODER
In this section, detailed description of each functional block of the BV32 encoder in Figure 2 is
given. When necessary, certain functional blocks will be expanded into more detailed block
diagrams. The description given in this section will be in sufficient detail that will allow those
skilled in the art to implement a mathematically equivalent BV32 encoder.
3.1
High-Pass Pre-Filtering
Refer to Figure 2. The input signal is assumed to be represented by 16-bit linear PCM. Block 3 is a
high-pass pre-filter with fixed coefficients. It is a first-order pole-zero filter with the following
transfer function.
( 255 / 256)(1 − z −1 )
H hpf ( z ) =
1 − (127 / 128) z −1
This high-pass pre-filter filters the input signal to remove undesirable low-frequency components,
and passes the filtered signal to the pre-emphasis filter (block 5).
3.2
Pre-emphasis Filtering
Block 5 is a first-order pole-zero pre-emphasis filter with fixed coefficients. It has the following
transfer function.
H pe ( z ) =
1 + 0.5z −1
1 + 0.75z −1
It filters the high-pass pre-filtered signal (the output signal of block 3) and gives an output signal
denoted as s(n ) in Figure 2, where n is the sample index.
3.3
Short-Term Linear Predictive Analysis
The high-pass filtered and pre-emphasized signal s(n ) is buffered at block 10, which performs short-term
linear predictive analysis and quantization to obtain the coefficients for the short-term predictor 40 and
the short-term noise feedback filter 50. This block 10 is further expanded in Figure 4.
7
To block 50
and block 21 To block 40
LSPI
10
a′i
17
18
a~i
Bandwidth
expansion
r(i )
White noise
correction
& spectral
smoothing
LSP
quantizer
li
Convert
to LSP
ai
12
11
Apply window
& calculate
autocorrelation
Convert to
predictor
coefficients
15
16
~
li
13
rˆ(i )
LevinsonDurbin
recursion
14
âi
Bandwidth
expansion
s(n)
Figure 4 BV32 short-term linear predictive analysis and quantization (block 10)
Refer to Figure 4. The input signal s(n ) is buffered in block 11, where a 10 ms asymmetric analysis
window is applied to the buffered s(n ) signal array. The “left window” is 7.5 ms long, and the
“right window” is 2.5 ms long. Let LWINSZ be the number of samples in the left window (LWINSZ
= 120 for 16 kHz sampling), then the left window is given by
wl ( n ) =
1⎡
nπ
⎞⎤
⎛
1 − cos⎜
⎟⎥ , n = 1, 2, …, LWINSZ.
⎢
2⎣
⎝ LWINSZ + 1 ⎠⎦
Let RWINSZ be the number of samples in the right window. Then, RWINSZ = 40 for 16 kHz
sampling. The right window is given by
⎛ ( n − 1)π ⎞
wr ( n ) = cos⎜
⎟ , n = 1, 2, …, RWINSZ .
⎝ 2 RWINSZ ⎠
The concatenation of wl(n) and wr(n) gives the 10 ms asymmetrical analysis window, with the peak
of the window located at the center of the current frame. When applying this analysis window, the
last sample of the window is lined up with the last sample of the current frame. Therefore, the codec
does not use any look ahead.
More specifically, without loss of generality, let the sampling time index range of n = 1, 2, …, FRSZ
corresponds to the current frame, where the frame size FRSZ = 80 for BV32. Then, the s(n) signal
8
buffer stored in block 11 is for n = -79, -78, …, -1, 0, 1, 2, …, 80. The asymmetrical LPC analysis
window function can be expressed as
⎧ wl ( n + 80),
w( n ) = ⎨
⎩ wr( n − 40),
n = −79, − 78, ..., 40
.
n = 41, 42, ..., 80
The windowing operation is performed as follows.
sw ( n ) = s( n )w( n ), n = -79, -78, …, -1, 0, 1, 2, …, 80.
Next, block 11 calculates the autocorrelation coefficients as follows.
r (i ) =
80
∑s
w
n = −79 + i
( n ) sw ( n − i ) , i = 0, 1, 2, …, 8.
The calculated autocorrelation coefficients are passed to block 12, which applies a Gaussian window
to the autocorrelation coefficients to perform spectral smoothing. The Gaussian window function is
given by
(
− 2π iσ / f s
gw(i ) = e
)2
, i = 1, 2, …, 8,
2
where f s is the sampling rate of the input signal, expressed in Hz, and σ is 40 Hz.
After multiplying the r(i) array by such a Gaussian window, block 12 then multiplies r(0) by a white
noise correction factor of WNCF = 1 + ε , where ε = 0.0001. In summary, the output of block 12 is
given by
i=0
⎧1.0001 × r(0),
rˆ(i ) = ⎨
i = 1,2,...,8
⎩ gw(i )r(i ),
Block 13 performs the Levinson-Durbin recursion to convert the autocorrelation coefficients rˆ(i ) to
the short-term predictor coefficients âi , i = 0, 1, …, 8. If the Levinson-Durbin recursion exits prematurely before the recursion is completed (for example, because the prediction residual energy E(i)
is less than zero), then the short-term predictor coefficients of the last frame is also used in the
current frame. To do the exception handling this way, there needs to be an initial value of the âi
array. The initial value of the âi array is set to aˆ0 = 1 and âi = 0 for i = 1, 2, …, 8. The LevinsonDurbin recursion is performed in the following algorithm.
1. If rˆ(0) ≤ 0 , use the âi array of the last frame, and exit the Levinson-Durbin recursion.
2. E (0) = rˆ(0)
3. k1 = − rˆ(1) / rˆ(0)
(1)
4. aˆ1 = k1
9
2
5. E (1) = (1 − k1 ) E (0)
6. If E (1) ≤ 0 , use the âi array of the last frame, and exit the Levinson-Durbin recursion.
7. For i = 2, 3, 4, …, 8, do the following
i −1
ki =
aˆi
− rˆ(i ) − ∑ aˆ j
rˆ(i − j )
j =1
E (i − 1)
(i )
= ki
(i )
= aˆ j
aˆ j
( i −1)
( i −1)
( i −1)
+ ki aˆi − j
, for j = 1, 2, ..., i − 1
2
E (i ) = (1 − ki ) E (i − 1)
If E (i ) ≤ 0, use the aˆi array of the last frame, and exit the Levinson - Durbin recursion.
If the recursion is exited pre-maturely, the âi array of the last frame is used as the output of block
13. If the recursion is completed successfully (which is normally the case), then the final output of
block 13 is taken as
aˆ0 = 1
(8 )
aˆi = aˆi , for i = 1, 2, …, 8
Block 14 performs bandwidth expansion as follows
ai = (0.96852)i aˆi , for i = 0, 1, …, 8.
3.4
Conversion to LSP
In Figure 4, block 15 converts the LPC coefficients ai , i = 1, 2,K, 8 of the prediction error filter
given by
8
A( z ) = 1 + ∑ ai z − i
i =1
to a set of 8 Line-Spectrum Pair (LSP) coefficients li , i = 1, 2,K, 8 . The LSP coefficients, also
known as the Line Spectrum Frequencies (LSF), are the angular positions normalized to 1, i.e. 1.0
corresponds to the Nyquist frequency, of the roots of
Ap ( z ) = A( z ) + z −9 A( z −1 )
and
10
Am ( z ) = A( z ) − z −9 A( z −1 )
on the upper half of the unit circle, z = e jω , 0 ≤ ω ≤ π , less the trivial roots in z = −1 and z = 1 of
Ap (z ) and Am (z ) , respectively. Due to the symmetry and anti-symmetric of Ap (z ) and Am (z ) ,
respectively, the roots of interest can be determined as the roots of
4
G p (ω ) = ∑ g p ,i cos(iω )
i =0
and
4
Gm (ω ) = ∑ g m,i cos(iω )
i =0
where
⎧ f p |m , 4
g p |m,i = ⎨
⎩2 f p |m , 4 − i
i=0
i = 1,K,4
in which
i=0
⎧1.0
f p ,i = ⎨
⎩ai + a9 − i − f p ,i −1 i = 1,K,4
and
i=0
⎧1.0
.
f m,i = ⎨
⎩ai − a9−i + f p ,i −1 i = 1,K ,4
The subscript "p|m" means dual versions of the equation exist, with either subscript "p" or subscript
"m". The roots of Ap (z ) and Am (z ) , and therefore the roots of G p (ω ) and Gm (ω ) , are interlaced,
with the first root belonging to G p (ω ) . The evaluation of the functions G p (ω ) and Gm (ω ) are
carried out efficiently using Chebyshev polynomial series. With the mapping x = cos(ω ) ,
cos(mω ) = Tm (x )
where Tm ( x ) is the mth-order Chebyshev polynomial, the two functions G p (ω ) and Gm (ω ) can be
expressed as
11
4
G p |m ( x ) = ∑ g p |m,i Ti ( x ) .
i =0
Due to the recursive nature of Chebyshev polynomials the functions can be evaluated as
G p|m ( x ) =
b p|m,0 ( x ) − bp|m, 2 ( x ) + g p|m,0
2
where bp|m,0 ( x ) and bp|m, 2 ( x ) are calculated using the following recurrence
b p | m , i ( x ) = 2 x b p | m , i +1 ( x ) − b p | m , i + 2 ( x ) + g p | m , i
with initial conditions bp|m,5 ( x ) = bp|m,6 ( x ) = 0 .
The roots of G p (x ) and Gm (x ) are determined in an alternating fashion starting with a root in
G p (x ) . Each root of G p (x ) and Gm (x ) is located by identifying a sign change of the relevant
function along a grid of 60 points, given in Appendix 1. The estimation of the root is then refined
using 4 bisections followed by a final linear interpolation between the two points surrounding the
root. It should be noted that the roots and grid points are in the cosine domain. Once the 8 roots
xi = cos(ωi ),
i = 1, 2,K, 8
are determined in the cosine domain, they are converted to the normalized frequency domain
according to
li = cos −1 ( xi ) π ,
i = 1, 2,K, 8
in order to obtain the LSP coefficients. In the rare event that less than 8 roots are found the routine
returns the LSP coefficients of the previous frame, li ( k − 1), i = 1, 2,K, 8 , where the additional
parameter k represents the frame index of the current frame. The LSP coefficients of the previous
frame at the very first frame are initialized to
li (0) = i / 9,
3.5
i = 1, 2,K, 8 .
LSP Quantization
Block 16 of Figure 4 vector quantizes and encodes the LSP coefficient vector, l = [l1 l2 K l8 ] T ,
to a total of 17 bits. The output LSP quantizer index array, LSPI = {LSPI 1 , LSPI 2 , LSPI 3 }, is passed
12
~
~ ~ ~
to the bit multiplexer (block 95), while the quantized LSP coefficient vector, l = [ l1 l2 K l8 ] T ,
is passed to block 17.
The LSP quantizer is based on mean-removed inter-frame moving-average (MA) prediction with
two-stage vector quantization (VQ) of the prediction error. The quantizer enables bit-error detection
at the decoder by constraining the codevector selection at the encoder. It should be noted that the
encoder must perform the specified constrained VQ in order to maintain interoperability properly.
The first-stage VQ is searched using the simple mean-squared error (MSE) distortion criterion, while
both lower and upper splits of the second-stage split VQ are searched using the weighted meansquare error (WMSE) distortion criterion.
1614
LSP spacing
(
l
~
e2
+
163
l
l̂
Input LSP vector +
l
8th order MA
prediction
1612
+
-
e1
+
162
Quantized LSP vector
1615
Second stage VQ
1613
Mean LSP vector
~
l
+
+
Append subvectors
1611
First stage VQ
167
165
ê1
-
+
e2
Regular 8
dimensional
MSE VQ
166
~
e21-
164
+
+
e 22
~
e22 , 2
e 22,1
168
Constrained 3
dimensional
WMSE VQ
Split vector into
two sub-vectors
with lower 3 and
upper 5 elements
e 22, 2
161
Calculate
LSP weights
~e
22 ,1
1610
~
e22
w
169
Regular 5
dimensional
WMSE VQ
w
LSPI1
LSPI3
Index sub-quantizer 1
Index sub-quantizer 3
LSPI2
Index sub-quantizer 2
Figure 5 BV32 LSP quantizer (block 16)
Block 16 is further expanded in Figure 5. The first stage VQ takes place in block 165, and the
second stage split VQ takes place in block 1615. Except for the LSP quantizer indices
LSPI1 , LSPI 2 , LSPI 3 and split vectors, all signal paths in Figure 5 are for vectors of dimension 8.
Block 161 uses the unquantized LSP coefficient vector to calculate the weights to be used later in
the second-stage WMSE VQs. The weights are determined as
i =1
⎧ 1 /( l2 − l1 ),
⎪
wi = ⎨ 1 / min( li − li −1 , li +1 − li ), 1 < i < 8 .
⎪ 1 /( l − l ),
i =8
M
M −1
⎩
Basically, the i-th weight is the inverse of the distance between the i-th LSP coefficient and its
nearest neighbor LSP coefficient.
13
Adder 162 subtracts the constant LSP mean vector,
l = [0.0551453 0.1181030 0.2249756 0.3316040 0.4575806 0.5720825 0.7193298 0.8278198]
T
,
from the unquantized LSP coefficient vector to get the mean-removed LSP vector,
e1 = l − l .
In Figure 5, block 163 performs 8th order inter-frame MA prediction of the mean-removed LSP
vector e1 based on the ~e2 vectors in the previous 8 frames, where ~e2 is the quantized version of the
inter-frame LSP prediction error vector 2. Let ~e2,i ( k ) denote the i-th element of the vector ~e2 in the
frame that is k frames before the current frame. Let eˆ1,i be the i-th element of the inter-framepredicted mean-removed LSP vector ê1 . Then, block 163 calculates the predicted LSP vector
according to
T
eˆ1,i = p LSP ,i ⋅ [~
e2,i (1)
T
~
e2,i ( 2) ~
e2,i (3) ~
e2,i ( 4) ~
e2,i (5) ~
e2,i (6) ~
e2,i (7) ~
e2,i (8) ] , i = 1, 2,K, 8 ,
where p LSP ,i holds the 8 prediction coefficients for the i-th LSP coefficient and is given by
p LSP ,1T = [ 0.7401123 0.6939697 0.6031494 0.5333862 0.4295044 0.3234253 0.2177124 0.1162720 ]
p LSP ,2T = [ 0.7939453 0.7693481 0.6712036 0.5919189 0.4750366 0.3556519 0.2369385 0.1181030 ]
p LSP ,3T = [ 0.7534180 0.7318115 0.6326294 0.5588379 0.4530029 0.3394775 0.2307739 0.1201172 ]
p LSP ,4T = [ 0.7188110 0.6765747 0.5792847 0.5169067 0.4223022 0.3202515 0.2235718 0.1181030 ]
p LSP ,5T = [ 0.6431885 0.6023560 0.5112305 0.4573364 0.3764038 0.2803345 0.2060547 0.1090698 ]
p LSP ,6T = [ 0.5687866 0.5837402 0.4616089 0.4351196 0.3502808 0.2602539 0.1951294 0.0994263 ]
p LSP ,7T = [ 0.5292969 0.4835205 0.3890381 0.3581543 0.2882080 0.2261353 0.1708984 0.0941162 ]
.
p LSP ,8T = [ 0.5134277 0.4365845 0.3521729 0.3118896 0.2514038 0.1951294 0.1443481 0.0841064 ]
Adder 164 calculates the prediction error vector
e 2 = e1 − ê1 ,
which is the input to the first-stage VQ. In block 165 the 8-dimensional prediction error vector, e 2 ,
{
(0)
(1)
(127 )
}
is vector quantized with the 128-entry, 8-dimensional codebook, CB1 = cb1 , cb1 ,K, cb1
,
listed in Appendix 2. The codevector minimizing the MSE is denoted ~e21 and the corresponding
index is denoted LSPI 1 :
2
At the first frame, the previous, non-existing, quantized interframe LSP prediction error vectors are set to zero-vectors.
14
{(
) (e
(k ) T
LSPI1 = arg min e 2 − cb1
k ∈{0 ,1,K,127}
(k )
2
− cb1
)},
( LSPI )
~
e21 = cb1 1 ,
where the notation I = arg min{D(i )} means that I is the argument that minimizes the entity D(i ) ,
i
i.e.
D( I ) ≤ D(i ) for all i .
Adder 166 subtracts the first-stage codevector from the prediction error vector to form the
quantization error vector of the first stage,
e 22 = e 2 − ~
e21 .
This is the input to the second-stage VQ, which is a two-split VQ. Block 167 splits the quantization
error vector of the first stage into a lower sub-vector, e 22,1 , with the first 3 elements of e 22 ,
e 22,1 = [ e22,1
e22, 2
e22,3 ] T ,
and an upper sub-vector, e 22, 2 , with the last 5 elements of e 22 ,
e 22, 2 = [e22, 4
e22,5
e22,6
e22,7
e22,8 ] T .
e22,1 and ~
e22, 2 , respectively.
The two sub-vectors are quantized independently into ~
Block 168 performs a constrained VQ of the 3-dimensional vector, e 22,1 , using the 32-entry
{
(0)
(1)
( 31)
}
codebook, CB 21 = cb 21 , cb 21 ,K, cb 21 , of Appendix 3. The codevector that minimizes the
WMSE subject to the constraint that the 3 first elements of the intermediate quantized LSP vector,
(
l = ˆl + ~
e2
e22,1 ⎤
⎡~
= l + eˆ 1 + ~
e21 + ⎢ ~ ⎥
⎣ e22, 2 ⎦
preserve the ordering property
(
l1
(
l2
(
l3
≥ 0
(
≥ l1 ,
(
≥ l2
15
,
is selected as ~
e22,1 , and the corresponding index is denoted LSPI 2 . In the inequality above, the
(
(
symbol li represents the i-th element of the vector l . The constrained WMSE VQ is given by
LSPI 2 =
arg min
{
(
(
(
(
(
k ∈ j l1( j ) ≥ 0 , l 2( j ) ≥ l1( j ) , l 3( j ) ≥ l 2( j ) , j∈{0 ,1,K, 31}
{(e
}
)
(k ) T
22 ,1
− cb 21
(
(k )
W1 e 22,1 − cb 21
)} ,
( LSPI )
~
e22,1 = cb 21 2 ,
where
⎡w1 0
W1 = ⎢ 0 w2
⎢
⎢⎣ 0 0
0⎤
0⎥ ,
⎥
w3 ⎥⎦
(( j )
(
and li is the i-th element of the reconstructed LSP vector l that is generated by using the j-th
codevector in CB 21 . In the highly unlikely, but possible, event that no codevector satisfy the
ordering property of the intermediate quantized LSP vector, the quantizer selects the codevector
(1)
cb 21 and returns the index LSPI 2 = 1 .
Block 169 performs an unconstrained WMSE VQ of the 5-dimensional vector, e 22, 2 , using the 32-
{
(0)
(1)
entry codebook, CB 22 = cb 22 , cb 22 ,K, cb 22
( 31)
}, given in Appendix 4, according to
{(
LSPI 3 = arg min e 22, 2 − cb 22
k ∈{0 ,1,K, 31}
)
(k ) T
(
W2 e 22, 2 − cb 22
(k )
)},
( LSPI 3 )
~
e22, 2 = cb 22
,
where
⎡ w4
⎢
W2 = ⎢
⎢
⎢
⎣0
w5
0⎤
⎥
⎥.
⎥
O
⎥
w8 ⎦
The quantization is complete and the remaining operations of block 16 construct the quantized LSP
vector from the codevectors, LSP mean, and MA prediction. Block 1610 concatenates the two
quantized split vectors to obtain
e22,1 ⎤
⎡~
~
e22 = ⎢ ~ ⎥ ,
⎣ e22, 2 ⎦
16
the quantized version of the quantization error vector of the first stage VQ. Adder 1611 calculates
the quantized prediction error vector by adding the stage 1 and stage 2 quantized vectors,
~
e2 = ~
e21 + ~
e22 .
Adder 1612 adds the mean LSP vector and the predicted mean-removed LSP vector to obtain the
predicted LSP vector,
ˆl = l + eˆ .
1
Adder 1613 adds the predicted LSP vector and the quantized prediction error vector to get the
intermediate reconstructed LSP vector,
(
l = ˆl + ~e2 .
Block 1614 calculates the final quantized LSP coefficients by enforcing a minimum spacing of 100
Hz between adjacent LSP coefficients, as well as an absolute minimum of 12 Hz for the first LSP
coefficient and an absolute maximum of 7982 Hz for the eighth LSP coefficient. The spacing
constraints are given by
~
l1
≥ 0.0015
~
~
li +1 − li ≥ 0.0125 i = 1, 2,K, 7 .
~
l8
≤ 0.99775
The spacing is carried out as follows:
(i)
The elements of the intermediate reconstructed LSP vector are sorted such that
( (
(
l1 ≤ l 2 ≤ K ≤ l8 .
(ii)
(iii)
(iv)
Set lmax = 0.91025 .
(
~
If l1 < 0.0015 , set l1 = 0.0015 .
(
~
else if l1 > lmax , set l1 = lmax .
~ (
else set l1 = l1 .
for i=2, 3, … , 8 do the following:
~
1. Set lmin = li −1 + 0.0125 .
2. Set lmax ← lmax + 0.0125 .
(
~
3. If li < lmin , set li = lmin .
(
~
else if li > lmax , set li = lmax .
~ (
else set li = li .
17
3.6
Conversion to Short-Term Predictor Coefficients
~
Refer back to Figure 4. In block 17, the quantized set of LSP coefficients { li }, which is determined
once a frame, is converted to the corresponding set of linear prediction coefficients { a~i }, the
quantized linear prediction coefficients for the current frame.
With the notation
x p ,i
xm,i
~
= cos(π l2i −1 ), i = 1, 2, 3, 4
~
= cos(π l2i ), i = 1, 2, 3, 4
the 4 unique coefficients of each of the two polynomials
ApΔ ( z ) = Ap ( z ) /(1 + z −1 )
and
AmΔ ( z ) = Am ( z ) /(1 − z −1 ) can be determined using the following recursion:
For i = 1, 2, 3, 4, do the following :
(
a Δp |m,i = 2 a Δp |m,i − 2 − x p |m,i a Δp |m,i −1
)
a Δp |m, j = a Δp |m, j + a Δp |m, j − 2 − 2 x p |m,i a Δp |m, j −1 , j = i − 1, i − 2,K, 1
with initial conditions a Δp|m,0 = 1 and a Δp|m, −1 = 0 . In the recursion above, {a Δp ,i } and {amΔ ,i } are the sets
of four unique coefficients of the polynomials ApΔ (z ) and AmΔ (z ) , respectively. Similarly, let the two
sets of coefficients {a p ,i } and {am,i } , each of 4 unique coefficients except for a sign on {am,i } ,
represent the unique coefficients of the polynomials Ap (z ) and Am (z ) , respectively. Then, {a p ,i }
and {am,i } can be obtained from {a Δp ,i } and {amΔ ,i } as
a p ,i
= a Δp ,i + a Δp ,i −1 , i = 1, 2, 3, 4
am , i
= amΔ ,i − amΔ ,i −1 , i = 1, 2, 3, 4
From Ap (z ) and Am (z ) , the polynomial of the prediction error filter is obtained as
A p ( z ) + Am ( z )
~
A( z ) =
.
2
~
In terms of the unique coefficients of Ap (z ) and Am (z ) , the coefficients {a~i } of A( z ) can be
expressed as
18
⎧1.0,
i=0
⎪
~
ai = ⎨0.5 ( a p ,i + am,i ),
i = 1, 2, 3, 4
⎪0.5 ( a
i = 5, 6, 7, 8
p , 9 − i − am , 9 − i ),
⎩
where the tilde signifies that the coefficients correspond to the quantized LSP coefficients. Note that
8
~
A ( z ) = 1 − Ps ( z ) = 1 + ∑ a~i z − i ,
i =1
where
8
Ps ( z ) = −∑ a~i z − i
i =1
is the transfer function of the short-term predictor block 40 in Figure 2.
Block 18 performs further bandwidth expansion on the set of predictor coefficients { a~i } using a
bandwidth expansion factor of γ 1 = 0.75. The resulting bandwidth-expanded set of filter coefficients
is given by
i
ai′ = γ 1 a~i , for i = 1, 2, …, 8.
This bandwidth-expanded set of filter coefficients { ai′ } are used to update the coefficients of the
short-term noise feedback filter block 50 in Figure 2 and the coefficients of the weighted short-term
synthesis filter block 21 in Figure 6 (to be discussed later). This completes the description of shortterm predictive analysis and quantization block 10 in Figure 2 and Figure 4.
3.7
Short-Term Linear Prediction of Input Signal
Now refer to Figure 2. The short-term predictor block 40 predicts the input signal sample s(n) based
on a linear combination of the preceding 8 samples. The adder 45 subtracts the resulting predicted
value from s(n) to obtain the short-term prediction residual signal, or the difference signal, d(n). The
combined operation of blocks 40 and 45 is summarized in the following difference equation.
8
d ( n ) = s( n ) + ∑ a~i s( n − i )
i =1
3.8
Long-Term Linear Predictive Analysis (Pitch Extraction)
In Figure 2, the long-term predictive analysis and quantization block 20 uses the short-term
prediction residual signal d(n) of the current frame and its quantized version dq(n) in the previous
19
frames to determine the quantized values of the pitch period and the pitch predictor taps. This block
20 is further expanded in Figure 6 below.
20
21
d(n)
weighted
short-term
synthesis
filter
23
22
Decimate
to 2 kHz
sampling
rate
Low-pass
filter to
800 Hz
dw(n)
dwd(n)
24
First-stage
pitch period
search
cpp
25
Second-stage
pitch period
search
PPI
ppt1
pp
27
Calculate
long-term
noise feedback
filter
coefficient
26
Pitch predictor
taps
quantizer
λ
PPTI
ppt
To block 65
To blocks
30, 60 & 65
dq(n)
To block 30 & 60
Figure 6 BV32 long-term predictive analysis and quantization (block 20)
Now refer to Figure 6. The short-term prediction residual signal d(n) passes through the weighted
short-term synthesis filter block 21, whose output is calculated as
8
dw( n ) = d ( n ) − ∑ ai′dw( n − i )
i =1
The signal dw(n) is passed through a fixed low-pass filter block 22, which has a –3 dB cut off
frequency at about 800 Hz. A 4th-order elliptic filter is used for this purpose. The transfer function
of this low-pass filter is
H lpf ( z ) =
0.0322952 − 0.1028824 z −1 + 0.1446838 z −2 − 0.1028824 z −3 + 0.0322952 z −4
1 − 3.5602306 z −1 + 4.8558478 z − 2 − 2.9988298 z − 3 + 0.7069277 z − 4
20
Block 23 down-samples the low-pass filtered signal to a sampling rate of 2 kHz. This represents an
8:1 decimation for the BV32 codec.
The first-stage pitch search block 24 then uses the decimated 2 kHz sampled signal dwd(n) to find a
“coarse pitch period”, denoted as cpp in Figure 6. The time lag represented by cpp is in terms of
number of samples in the 2 kHz down-sampled signal dwd(n). A pitch analysis window of 15 ms is
used. The end of the pitch analysis window is lined up with the end of the current frame. At a
sampling rate of 2 kHz, 15 ms correspond to 30 samples. Without loss of generality, let the index
range of n = 1 to n = 30 correspond to the pitch analysis window for dwd(n). Block 24 first
calculates the following values
30
c( k ) = ∑ dwd ( n )dwd ( n − k ) ,
n =1
30
E ( k ) = ∑ [dwd ( n − k )] ,
2
n =1
⎧ c 2 ( k ), if c( k ) ≥ 0
c 2( k ) = ⎨ 2
⎩− c ( k ), if c( k ) < 0
for all integers from k = MINPPD - 1 to k = MAXPPD + 1, where MINPPD and MAXPPD are the
minimum and maximum pitch period in the decimated domain, respectively. For BV32, MINPPD =
1 sample and MAXPPD = 33 samples. Block 24 then searches through the range of k = MINPPD,
MINPPD + 1, MINPPD + 2, …, MAXPPD to find all local peaks 3 of the array { c 2( k ) / E ( k ) } for
which c(k) > 0. Let N p denote the number of such positive local peaks. Let k p ( j ) , j =1, 2, …, N p
be the indices where c 2( k p ( j )) / E ( k p ( j )) is a local peak and c( k p ( j )) > 0, and let
k p (1) < k p ( 2) < ... < k p ( N p ) . For convenience, the term c 2( k ) / E ( k ) will be referred to as the
“normalized correlation square”.
If N p = 0, the output coarse pitch period is set to cpp = MINPPD, and the processing of block 24 is
terminated. If N p = 1, block 24 output is set to cpp = k p (1) , and the processing of block 24 is
terminated.
If there are two or more local peaks ( N p ≥ 2 ), then block 24 uses Algorithms 3.8.1, 3.8.2, 3.8.3, and
3.8.4 (to be described below), in that order, to determine the output coarse pitch period cpp.
Variables calculated in the earlier algorithms will be carried over and used in the later algorithms.
Block 24 first uses Algorithm 3.8.1 below to identify the largest quadratically interpolated peak
around local peaks of the normalized correlation square c 2( k p ) / E ( k p ) . Quadratic interpolation is
performed for c( k p ) , while linear interpolation is performed for E ( k p ) . Such interpolation is
performed with the time resolution for the sampling rate of the input speech (16 kHz for BV32). In
3
A value is characterized as a local peak if both of the adjacent values are smaller.
21
the algorithm below, D denotes the decimation factor used when decimating dw(n) to dwd(n). Thus,
D = 8 for BV32.
Algorithm 3.8.1 Find largest quadratically interpolated peak around c 2( k p ) / E ( k p ) :
(i) Set c2max = -1, Emax = 1, and jmax = 0.
(ii) For j =1, 2, …, N p , do the following 12 steps:
[
Set b = 0.5 [c( k
]
( j ) − 1)]
1. Set a = 0.5 c( k p ( j ) + 1) + c( k p ( j ) − 1) − c( k p ( j ))
2.
p
( j ) + 1) − c( k p
3. Set ji = 0
4. Set ei = E ( k p ( j ))
5. Set c 2m = c 2( k p ( j ))
6. Set Em = E ( k p ( j ))
7. If c 2( k p ( j ) + 1) E ( k p ( j ) − 1) > c 2( k p ( j ) − 1) E ( k p ( j ) + 1) , do the remaining part of step 7:
Δ = [ E ( k p ( j ) + 1) − ei ] D
For k = 1, 2, … , D/2, do the following indented part of step 7:
ci = a ( k / D ) 2 + b ( k / D ) + c( k p ( j ))
ei ← ei + Δ
If (ci ) 2 Em > (c 2m) ei , do the next three indented lines:
ji = k
c 2m = (ci ) 2
Em = ei
8. If c 2( k p ( j ) + 1) E ( k p ( j ) − 1) ≤ c 2( k p ( j ) − 1) E ( k p ( j ) + 1) , do the remaining part of step 8:
Δ = [ E ( k p ( j ) − 1) − ei ] D
For k = -1, -2, … , -D/2, do the following indented part of step 8:
ci = a ( k / D ) 2 + b ( k / D ) + c( k p ( j ))
ei ← ei + Δ
If (ci ) 2 Em > (c 2m) ei , do the next three indented lines:
ji = k
c 2m = (ci ) 2
Em = ei
9. Set lag ( j ) = k p ( j ) + ji / D
10. Set c 2i ( j ) = c 2m
11. Set Ei ( j ) = Em
12. If c2m × Emax > c2max × Em, do the following three indented lines:
jmax = j
c2max = c2m
22
Emax = Em
(iii) Set the first candidate for coarse pitch period as cpp = k p ( jmax ) .
The symbol ← indicates that the parameter on the left-hand side is being updated with the value on
the right-hand side 4.
To avoid picking a coarse pitch period that is around an integer multiple of the true coarse pitch
period, a search through the time lags corresponding to the local peaks of c 2( k p ) / E ( k p ) is
performed to see if any of such time lags is close enough to the output coarse pitch period of block
24 in the last frame, denoted as cpplast 5. If a time lag is within 25% of cpplast, it is considered
close enough. For all such time lags within 25% of cpplast, the corresponding quadratically
interpolated peak values of the normalized correlation square c 2( k p ) / E ( k p ) are compared, and the
interpolated time lag corresponding to the maximum normalized correlation square is selected for
further consideration. The following algorithm performs the task described above. The interpolated
arrays c2i( j) and Ei( j) calculated in Algorithm 3.8.1 above are used in this algorithm.
Algorithm 3.8.2 Find the time lag maximizing interpolated c 2( k p ) / E ( k p ) among all time lags close
to the output coarse pitch period of the last frame:
(i) Set index im = -1
(ii) Set c2m = -1
(iii) Set Em = 1
(iv) For j =1, 2, …, N p , do the following:
If | k p ( j ) − cpplast | ≤ 0.25 × cpplast , do the following:
If c 2i ( j ) × Em > c2m × Ei ( j ) , do the following three lines:
im = j
c 2m = c 2i ( j )
Em = Ei ( j )
Note that If there is no time lag k p ( j ) within 25% of cpplast, then the value of the index im will
remain at –1 after Algorithm 3.8.2 is performed. If there are one or more time lags within 25% of
cpplast, the index im corresponds to the largest normalized correlation square among such time lags.
Next, block 24 determines whether an alternative time lag in the first half of the pitch range should
be chosen as the output coarse pitch period. Basically, block 24 searches through all interpolated
time lags lag( j) that are less than 16, and checks whether any of them has a large enough local peak
of normalized correlation square near every integer multiple of it (including itself) up to 32. If there
4
5
An equal sign is not applicable due to a potential mathematical conflict.
For the first frame cpplast is initialized to 12.
23
are one or more such time lags satisfying this condition, the smallest of such qualified time lags is
chosen as the output coarse pitch period of block 24.
Again, variables calculated in Algorithms 3.8.1 and 3.8.2 above carry their final values over to
Algorithm 3.8.3 below. In the following, the parameter MPDTH is 0.06, and the threshold array
MPTH(k) is given as MPTH(2) = 0.7, MPTH(3) = 0.55, MPTH(4) = 0.48, MPTH(5) = 0.37, and
MPTH(k) = 0.30, for k > 5.
Algorithm 3.8.3 Check whether an alternative time lag in the first half of the range of the coarse
pitch period should be chosen as the output coarse pitch period:
For j = 1, 2, 3, …, N p , in that order, do the following while lag( j) < 16:
(i) If j ≠ im, set threshold = 0.73; otherwise, set threshold = 0.4.
(ii) If c2i( j) × Emax ≤ threshold × c2max × Ei( j), disqualify this j, skip step (iii) for this j,
increment j by 1 and go back to step (i).
(iii) If c2i( j) × Emax > threshold × c2max × Ei( j), do the following:
a) For k = 2, 3, 4, … , do the following while k × lag( j) < 32:
1. s = k × lag( j)
2. a = (1 – MPDTH) s
3. b = (1 + MPDTH) s
4. Go through m = j+1, j+2, j+3, …, N p , in that order, and see if any of the
time lags lag(m) is between a and b. If none of them is between a and b,
disqualify this j, stop step (iii), increment j by 1 and go back to step (i). If
there is at least one such m that satisfies a < lag(m) ≤ b and c2i(m) × Emax >
MPTH(k) × c2max × Ei(m), then it is considered that a large enough peak of
the normalized correlation square is found in the neighborhood of the k-th
integer multiple of lag( j); in this case, stop step (iii) a) 4., increment k by 1,
and go back to step (iii) a) 1.
b) If step (iii) a) is completed without stopping prematurely, that is, if there is a large
enough interpolated peak of the normalized correlation square within ±100×MPDTH%
of every integer multiple of lag( j) that is less than 32, then stop this algorithm and
stop the operation of block 24, and set cpp = k p ( j ) as the final output coarse pitch
period of block 24.
24
If Algorithm 3.8.3 above is completed without finding a qualified output coarse pitch period cpp,
then block 24 examines the largest local peak of the normalized correlation square around the coarse
pitch period of the last frame, found in Algorithm 3.8.2 above, and makes a final decision on the
output coarse pitch period cpp using the following algorithm. Algorithm 3.8.4 performs this final
decision. Again, variables calculated in Algorithms 3.8.1 and 3.8.2 above carry their final values
over to Algorithm 3.8.4 below. In the following, the parameters are SMDTH = 0.095 and LPTH1=
0.78.
Algorithm 3.8.4: Final decision of the output coarse pitch period
(i) If im = -1, that is, if there is no large enough local peak of the normalized correlation square
around the coarse pitch period of the last frame, then use the cpp calculated at the end of
Algorithm 3.8.1 as the final output coarse pitch period of block 24, and exit this algorithm.
(ii) If im = jmax, that is, if the largest local peak of the normalized correlation square around the
coarse pitch period of the last frame is also the global maximum of all interpolated peaks of the
normalized correlation square within this frame, then use the cpp calculated at the end of
Algorithm 3.8.1 as the final output coarse pitch period of block 24, and exit this algorithm.
(iii) If im < jmax, do the following indented part:
If c2m × Emax > 0.43 × c2max × Em, do the following indented part of step (iii):
a) If lag(im) > MAXPPD/2, set block 24 output cpp = k p (im) and exit this algorithm.
b) Otherwise, for k = 2, 3, 4, 5, do the following indented part:
1. s = lag(jmax) / k
2. a = (1 – SMDTH) s
3. b = (1 + SMDTH) s
4. If lag(im) > a and lag(im) < b, set block 24 output cpp = k p (im) and exit
this algorithm.
(iv) If im > jmax, do the following indented part:
If c2m × Emax > LPTH1 × c2max × Em, set block 24 output cpp = k p (im) and exit this
algorithm.
(v) If algorithm execution proceeds to here, none of the steps above have selected a final output
coarse pitch period. In this case, just accept the cpp calculated at the end of Algorithm 3.8.1 as
the final output coarse pitch period of block 24.
25
Block 25 takes cpp as its input and performs a second-stage pitch period search in the undecimated
signal domain to get a refined pitch period pp. Block 25 first converts the coarse pitch period cpp to
the undecimated signal domain by multiplying it by the decimation factor D, where D = 8 for BV32.
Then, it determines a search range for the refined pitch period around the value cpp × D. The lower
bound of the search range is lb = max(MINPP, cpp × D – D + 1) , where MINPP = 10 samples is the
minimum pitch period. The upper bound of the search range is ub = min(MAXPP, cpp × D + D –
1), where MAXPP is the maximum pitch period, which is 264 samples for BV32.
Block 25 maintains a signal buffer with a total of MAXPP + 1 + FRSZ samples, where FRSZ is the
frame size, which is 80 samples for BV32. The last FRSZ samples of this buffer are populated with
the open-loop short-term prediction residual signal d(n) in the current frame. The first MAXPP + 1
samples are populated with the MAXPP + 1 samples of quantized version of d(n), denoted as dq(n),
immediately preceding the current frame. For convenience of writing equations later, the symbol
dq(n) will be used to denote the entire buffer of MAXPP + 1 + FRSZ samples, even though the last
FRSZ samples are really d(n) samples. Again, let the index range from n = 1 to n = FRSZ denotes
the samples in the current frame.
After the lower bound lb and upper bound ub of the pitch period search range are determined, block
25 calculates the following correlation and energy terms in the undecimated dq(n) signal domain for
time lags k within the search range [lb, ub].
c~( k ) =
FRSZ
∑ dq(n)dq(n − k )
n =1
FRSZ
~
E ( k ) = ∑ dq( n − k ) 2
n =1
~
The time lag k ∈ [lb, ub] that maximizes the ratio c~ 2 (k ) / E (k ) is chosen as the final refined pitch
period. That is,
⎡ c~ 2 ( k ) ⎤
pp = arg max ⎢ ~
⎥ .
k ∈[ lb , ub ] ⎣ E ( k ) ⎦
Once the refined pitch period pp is determined, it is encoded into the corresponding output pitch
period index PPI, calculated as
PPI = pp − 10 .
Possible values of PPI are all integers from 0 to 254 for the BV32 codec. Therefore, the refined
pitch period pp is encoded into 8 bits, without any distortion. The value of PPI = 255 is reserved for
signaling purposes and therefore is not used by the BV32 codec.
Block 25 also calculates ppt1, the optimal tap weight for a single-tap pitch predictor, as follows
26
c~( pp )
ppt1 = ~
.
E ( pp )
~
In the degenerate case where E ( pp ) = 0, ppt1 is set to zero. Block 27 calculates the long-term noise
feedback filter coefficient λ as follows.
ppt1 ≥ 1
⎧0.5,
⎪
λ = ⎨0.5 × ppt1, 0 < ppt1 < 1
⎪0,
ppt1 ≤ 0
⎩
3.9
Long-Term Predictor Parameter Quantization
Pitch predictor taps quantizer block 26 quantizes the three pitch predictor taps to 5 bits using vector
quantization. The pitch predictor has a transfer function of
3
Pl ( z ) = ∑ bi z − pp + 2 − i ,
i =1
where pp is the pitch period calculated in Section 3.8.
Rather than minimizing the mean-square error of the three taps b1 , b2 , and b3 as in conventional VQ
codebook search, block 26 finds from the VQ codebook the set of candidate pitch predictor taps that
minimizes the pitch prediction residual energy in the current frame. Using the same dq(n) buffer and
time index convention as in block 25, and denoting the set of three taps corresponding to the j-th
codevector, b j = [b j1 b j 2 b j 3 ]T , as { b j1 , b j 2 , b j 3 }, we can express such pitch prediction residual
energy as
Ej =
FRSZ
∑
n =1
2
3
⎡
⎤
dq
(
n
)
b ji dq( n − pp + 2 − i )⎥
−
∑
⎢
i =1
⎣
⎦ .
The codevector is selected from a 3-dimensional codebook of 32 codevectors, { b0 , b1 ,K, b31} , listed
in Appendix 5. The codevector that minimizes the pitch prediction residual energy is selected. The
index of the selected codevector is given by
PPTI = j* = arg min {E j }
j∈{0 ,1,K, 31}
and the corresponding set of three quantized pitch predictor taps, denoted as ppt = {b1 , b2 , b3} in
Figure 6, is given by
27
⎡ b1 ⎤
⎢b ⎥ = b .
j*
⎢ 2⎥
⎢⎣b3 ⎥⎦
This completes the description of block 20, long-term predictive analysis and quantization.
3.10 Excitation Gain Quantization
In BV32 coding, there are two 2.5 ms sub-frames in each 5 ms frame, with one residual gain for each
sub-frame. The sub-frame size is therefore SFRSZ = FRSZ/2 = 40 samples. The unquantized
residual gains are based on the pitch prediction residuals of the respective sub-frames and are
quantized open-loop in the base-2 logarithmic domain. The quantization of the residual gain is part
of the prediction residual quantizer block 30 in Figure 2. Block 30 is further expanded in Figure 7.
All the operations in Figure 7 are performed sub-frame by sub-frame.
dq(n), ppt, pp
CI
GI
309
300
lv(m -1)
Compare
with
threshold
303
305
mrlg(m)
lge(m)
+
+
+
+
-
Residual
quantizer
codebook
search
-
311
lgeq(m)
307
+
308
+
lgq(m)
Convert to
linear gain
304
MA
log-gain
predictor
302
Log-gain
mean
value
Scale
residual
quantizer
codebook
gq(m)
306
Scalar
quantizer
312
313
310
301
Calculate
logarithmic
gain
lg(m)
uq(n)
Estimate
signal
level
Calculate
pitch prediction
residual
em(n)
30
elg(m)
lgmean
u(n)
Figure 7 Prediction residual quantizer (block 30)
28
Block 300 in Figure 7 calculates the pitch prediction residual signal. For the first sub-frame the
pitch prediction residual signal is given by
3
e1 ( n ) = dq( n ) − ∑ bi dq( n − pp + 2 − i ), n = 1,2,..., SFRSZ ,
i =1
where the same dq(n) buffer and time index convention of block 25 is used. Hence, the first subframe of dq(n) for n = 1, 2, …, SFRSZ is the unquantized open-loop short-term prediction residual
signal d(n). For the second sub-frame the pitch prediction residual signal is given by
3
e2 ( n ) = dq( SFRSZ + n ) − ∑ bi dq( SFRSZ + n − pp + 2 − i ), n = 1, 2, ..., SFRSZ .
i =1
Again, the second sub-frame of dq(n), n = SFRSZ+1, SFRSZ+2, …, FRSZ, is the unquantized openloop short-term prediction residual signal d(n). However, at the time the pitch prediction residual of
the second sub-frame is calculated, the excitation of the first sub-frame is fully quantized and is
located at dq(n), n = 1, 2, …, SFRSZ, which is then no longer the unquantized open-loop short-term
prediction residual signal.
Block 301 calculates the residual gain in the base-2 logarithmic domain. First, the average power of
the pitch prediction residual signal in the m-th sub-frame is calculated as
1 SFRSZ 2
∑ em (n) ,
SFRSZ n =1
where m = 1 and 2 for the first and second sub-frames of the current frame, respectively. Then, to
avoid taking logarithm of zero or a very small number, the logarithmic gain (log-gain) of the m-th
sub-frame is calculated as
Pe ( m) =
⎧log 2 Pe ( m), if Pe ( m) > 1 / 4
.
lg ( m) = ⎨
if Pe ( m) ≤ 1 / 4
⎩ − 2,
The long-term mean value of the log-gain is calculated off-line and stored in block 302. This loggain mean value for BV32 is lgmean = 11.82031. The adder 303 calculates the mean-removed
version of the log-gain as mrlg(m) = lg(m) - lgmean. The MA log-gain predictor block 304 is a 16thorder FIR filter with its memory initialized to zero at the very first frame. The coefficients of this
log-gain predictor lgp(k), k = 1, 2, 3, …, 16, are fixed, as given below:
lgp(1) = 0.5913086
lgp(2) = 0.5251160
lgp(3) = 0.5724792
lgp(4) = 0.5977783
29
lgp(5) = 0.4800720
lgp(6) = 0.4939270
lgp(7) = 0.4729614
lgp(8) = 0.4158936
lgp(9) = 0.3805847
lgp(10) = 0.3395081
lgp(11) = 0.2780151
lgp(12) = 0.2455139
lgp(13) = 0.1916199
lgp(14) = 0.1470032
lgp(15) = 0.1138611
lgp(16) = 0.0664673
Block 304 calculates its output, the estimated log-gain, as
elg ( m) =
GPO
∑ lgp(k )lgeq(m − k ) ,
k =1
where GPO = 16 is the gain predictor order for BV32, and lgeq(m - k) is the quantized version of the
log-gain prediction error at sub-frame m – k. Here it is assumed that the sub-frame indices of
different frames form a continuous sequence of integers. For example, if the sub-frame indices in
the current frame is 1 and 2, then the sub-frame indices of the last frame is –1 and 0, and the subframe indices of the frame before that is –3 and –2.
The adder 305 calculates the log-gain prediction error as
lge(m) = mrlg(m) - elg(m).
The scalar quantizer block 306 performs 5-bit scalar quantization of the resulting log-gain prediction
error lge(m). The codebook entries of this gain quantizer, along with the corresponding codebook
indices, are listed in Appendix 6. The operation of this quantizer is controlled by block 310, whose
purpose is to achieve a good trade-off between clear-channel performance and noisy-channel
performance of the excitation gain quantizer. The operation of block 310 will be described later.
For each temporarily quantized lgeq(m), the adders 307 and 308 together calculate the corresponding
temporarily quantized log-gain as
lgq(m) = lgeq(m) + elg(m) + lgmean
Block 309 estimates the signal level based on the final quantized log-gain, to be determined later
subject to the constraint imposed by block 310. Let lv(m) denote the output estimated signal level of
block 309 at sub-frame m. Since the final value of lgq(m) has not been determined yet at this point,
block 310 can only use the estimated signal level at the last sub-frame, namely, lv(m – 1). One way
to think of this situation is that block 309 has a one-sample delay unit for its input lgq(m).
30
At sub-frame m, block 310 controls the quantization operation of block 306 based on lv(m – 1),
lgq(m – 1), and lgq(m – 2) 6. It uses an NG × NGC gain change threshold matrix T(i, j), i = 1, 2, …,
NG, j = 1, 2, …, NGC to limit how high lgq(m) can go. For BV32, the parameter values are NG = 18
and NGC = 11. The threshold matrix T(i, j) is given in Appendix 7.
Block 310 and block 306 work together to perform the quantization of lge(m) in the following way.
First, the row index into the threshold matrix T(i, j) is calculated as
⎡ lgq( m − 1) − lv ( m − 1) − GLB ⎤
i=⎢
⎥⎥
2
⎢
,
where GLB = –24, and the symbol ⎡.⎤ means “take the next larger integer” or “rounding to the
nearest integer toward infinity”. If i > NG, i is clipped to NG. If i < 1, i is clipped to 1.
Second, the column index into the threshold matrix T(i, j) is calculated as
⎡ lgq( m − 1) − lgq( m − 2) − GCLB ⎤
j=⎢
⎥⎥
2
⎢
,
where GCLB = –8. If j > NGC, j is clipped to NGC. If j < 1, j is clipped to 1.
Third, with the row and column indices i and j calculated above, a gain quantization limit is
calculated as
GL = lgq(m – 1) + T(i, j) – elg(m) – lgmean
Fourth, block 306 performs normal scalar quantization of lge(m) into its nearest neighbor in the
quantizer codebook. If the resulting quantized value is not greater than GL, this quantized value is
accepted as the final quantized log-gain prediction error lgeq(m), and the corresponding codebook
index is the output gain index GI m . On the other hand, if the quantized value is greater than GL, the
next smaller gain quantizer codebook entry is compared with GL. If it is not greater than GL, it is
accepted as the final output lgeq(m) of block 306, and the corresponding codebook index is accepted
as GI m . However, if it is still greater the GL, then block 306 keeps looking for the next smaller
quantizer codebook entry (in descending order of codebook entry value), until it finds one that is not
greater than GL. In such a search, the first one (that is, the largest one) that it finds to be no greater
than GL is chosen as the final output lgeq(m) of block 306, and the corresponding codebook index is
accepted as GI m . In the rare occasion when all the gain quantizer codebook entries are greater than
GL, then the smallest gain quantizer codebook entry is chosen as the final output lgeq(m) of block
306, and the corresponding codebook index (0 in this case) is chosen as the output GI m . The final
gain quantizer codebook index GI m is passed to the bit multiplexer block 95 of Figure 2.
6
The initial value of lgq(m – 1) and lgq(m – 2) is –2, i.e. lgq(0)= –2 and lgq(-1)= –2.
31
Once the quantized log-gain prediction error lgeq(m) is determined in this way, adders 307 and 308
add elg(m) and lgmean to lgeq(m) to obtain the quantized log-gain lgq(m) as
lgq(m) = lgeq(m) + elg(m) + lgmean
After this final quantized log-gain lgq(m) subject to the constraint imposed by block 310 is
calculated, it is used by block 309 to update the estimated signal level lv(m). This value lv(m) is used
by block 310 in the next sub-frame (the (m + 1)-th sub-frame).
At sub-frame m, after the final quantized log-gain lgq(m) is calculated, block 309 estimates the
signal level using the following algorithm. The parameter values used are α = 8191/8192, β =
1023/1024, and γ = 511/512. At codec initialization, the related variables are initialized as: lmax(m 1) = -100, lmin(m - 1) = 100, lmean(m - 1) = 8, lv(m - 1) = 13.5, and x(m - 1) =13.5.
Algorithm for updating estimated long-term average signal level:
(i) If lgq(m) > lmax(m - 1), set lmax(m) = lgq(m);
otherwise; set lmax(m) = lmean(m - 1) + α [lmax(m - 1) - lmean(m - 1)].
(ii) If lgq(m) < lmin(m - 1), set lmin(m) = lgq(m);
otherwise; set lmin(m) = lmean(m - 1) + α [lmin(m - 1) - lmean(m - 1)].
(iii) Set lmean(m) = β × lmean(m - 1) + (1 - β) [lmax(m) + lmin(m)]/2 .
(iv) Set lth = lmean(m) + 0.2 [lmax(m) – lmean(m)] .
(v) If lgq(m) > lth, set x(m) = γ × x(m - 1) + (1- γ)lgq(m), and set lv(m) = γ × lv(m - 1) + (1- γ) x(m);
Otherwise, set x(m) = x(m - 1) and lv(m) = lv(m - 1).
Block 311 converts the quantized log-gain lgq(m) to the quantized gain gq(m) in the linear domain
as follows.
gq( m) = 2
lgq ( m )
2
Block 312 scales the residual vector quantization (also called excitation VQ) codebook by simply
multiplying every element of every codevector in the excitation VQ codebook by gq(m). The
resulting scaled codebook is then used by block 313 to perform Excitation VQ codebook search, as
described in the next section.
3.11 Excitation Vector Quantization
32
The excitation VQ codebook of BV32 has a sign-shape structure, with 1 bit for sign and 5 bits for
shape. The vector dimension is 4. Thus, there are 32 independent shape codevectors stored in the
codebook, but the negated version of each shape codevector (i.e., the mirror image with respect to
the origin) is also a valid codevector for excitation VQ. The 32 shape codevectors, along with the
corresponding codebook indices, are listed in Appendix 8.
Block 313 in Figure 7 performs the excitation VQ codebook search using the filter structure shown
in Figure 8, which is essentially a subset of the BV32 encoder shown in Figure 2. The only
difference is that the prediction residual quantizer (block 30) in Figure 2 is replaced by block 48 in
Figure 8, which is labeled as “scaled VQ codebook”. This scaled VQ codebook is calculated in
Section 3.10 above.
48
55
d(n)
+
75
v(n)
+
+
Scaled
VQ
codebook
u(n)
-
85
uq(n)
dq(n)
+
ltfv(n)
80
+
70
+
+
q(n)
65
ltnf(n)
-
N l ( z) − 1
60
ppv(n)
Pl (z )
90
+
50
stnf(n)
+
qs(n)
Fs (z )
Figure 8 Filter structure used in BV32 excitation VQ codebook search
The three filters of blocks 50, 60, and 65 have transfer functions given by
8
Fs ( z ) = − ∑ ai′ z − i ,
i =1
where ai′ = (0.75)i a~i , and a~i is the i-th coefficient of the short-term prediction error filter;
3
Pl ( z ) = ∑ bi z − pp + 2 − i ,
i =1
where pp is the pitch period, and bi is the i-th long-term predictor coefficient;
33
N l ( z ) − 1 = λ z − pp ,
where λ is the long-term noise feedback filter coefficient calculated in Section 3.8.
Using the filter structure in Figure 8, block 313 in Figure 7 performs excitation VQ codebook search
one excitation vector at a time. Each excitation vector contains four samples. The excitation gain
gq(m) is updated once a sub-frame. Each sub-frame contains 10 excitation vectors. Therefore, for
each sub-frame, the same scaled VQ codebook is used in 10 separate VQ codebook searches
corresponding to the 10 excitation vectors in that sub-frame.
Let n = 1, 2, 3, 4 denote the sample time indices corresponding to the current four–dimensional
excitation vector. Before the excitation VQ codebook search for the current excitation vector starts,
the open-loop short-term prediction residual d(n), n = 1, 2, 3, 4 has been calculated in Section 3.7.
In addition, before the VQ codebook search starts, the initial filter states (also called “filter
memory”) of the three filters in Figure 8 (blocks 50, 60, and 65) are also known. All the other
signals in Figure 8 are not determined yet for n = 1, 2, 3, 4.
The basic ideas of the excitation VQ codebook search are explained below. Refer to Figure 8. Block
48 stores the N scaled shape codevectors, where N = 32. Counting also the negated version of each
scaled shape codevector, it is equivalent to having 2N scaled codevectors available for excitation
VQ. From these 2N scaled codevectors, block 48 puts out one scaled codevector at a time as uq(n), n
= 1, 2, 3, 4. With the initial filter memories in blocks 50, 60, and 65 set to what were left after
vector-quantizing the last excitation vector, this uq(n) vector then “drives” the rest of the filter
structure until the corresponding quantization error vector q(n), n = 1, 2, 3, 4 is obtained. The
energy of this q(n) vector is calculated and stored. This process is repeated for each of the 2N scaled
codevectors, with the filter memories reset to their initial values before the process is repeated each
time. After all 2N codevectors have been tried, we have calculated 2N corresponding quantization
error energy values. The scaled codevector that minimizes the energy of the quantization error
vector q(n), n = 1, 2, 3, 4 is the winning scaled codevector and is used as the VQ output vector. The
corresponding output VQ codebook index is a 6-bit index consisting of a sign bit as the most
significant bit (MSB), followed by 5 shape bits. If the winning scaled codevector is a negated
version of a scaled shape codevector, then the sign bit is 1, otherwise, the sign bit is 0. The 5 shape
bits are simply the binary representation of the codebook index of the winning shape codevector, as
defined in Appendix 8. Note that there are 20 such excitation codebook indices in a frame, since
each frame has 20 excitation vectors. These 20 indices are grouped in an excitation codebook index
array, denoted as CI = {CI (1), CI ( 2),..., CI ( 20)} , where CI (k ) is the excitation codebook index for
the k-th excitation vector in the current frame. This excitation codebook index array CI is passed to
the bit multiplexer block 95.
Given a uq(n) vector (taking the value of one of the 2N scaled codevectors), the way to derive the
corresponding energy of the q(n) vector is now described in more detail below. First, block 60
performs pitch prediction to produce the pitch-predicted vector ppv(n) as
3
ppv ( n ) = ∑ bi dq( n − pp + 2 − i ) , n = 1, 2, 3, 4
i =1
34
Adder 85 then updates the dq(n) vector as
dq(n) = uq(n) + ppv(n) , n = 1, 2, 3, 4
Next, block 50 and adders 90 and 55 work together to update the v(n) vector as
8
v ( n ) = d ( n ) − ∑ ai′ [v( n − i ) − dq( n − i )] , n = 1, 2, 3, 4.
i =1
Finally, the corresponding q(n) vector is calculated as
q( n ) = v ( n ) − ppv( n ) − λ q( n − pp ) − uq( n ) , n = 1, 2, 3, 4.
The energy of the q(n) vector is calculated as
4
Eq = ∑ q 2 ( n ) .
n =1
Such calculation from a given uq(n) vector to the corresponding energy term Eq is repeated 2N
times for the 2N scaled VQ codevectors. After the winning scaled codevector that minimizes the Eq
term is selected, the filter memories of blocks 50, 60, and 65 are updated by using the filter
memories that were left after the calculation of the Eq term for that particular winning codevector
was done. Such updated filter memories become the initial filter memories used for the excitation
VQ codebook search for the next excitation vector.
3.12 Bit Multiplexing
The bit multiplexer block 95 in Figure 2 packs the five sets of indices LSPI, PPI, PPTI, GI, and CI
into a single bit stream. This bit stream is the output of the BroadVoice32 encoder. It is passed to
the communication channel.
Figure 9 shows the BV32 bit stream format in each frame. In Figure 9, the bit stream for the current
frame is the shaded area in the middle. The bit stream for the last frame is on the left, while the bit
stream for the next frame is on the right. Although the bit stream of different frames may not be sent
next to each other in a packet voice system, this illustration is meant to show that time goes from left
to right, and the 40 side information bits consisting of LSPI, PPI, PPTI, and GI goes before the
excitation codebook indices CI(k), k =1, 2, …, 20 when the bit stream is transmitted in a serial
manner. Note that for each index, the most significant bit (MSB) goes first (on the left), while the
least significant bit (LSB) goes last.
This completes the detailed description of the BV32 encoder.
35
PPI
PPTI
7 5 5 8 5 5 5 6 6 6 ...
0
7 5 5 8 5 5 5 6 6 6 ...
40
160 bits
...
LSPI3
CI(1)
LSPI2 GI GI CI(2)
1
2
LSPI1
CI(3) ...
CI(20)
Frame M - 1
Frame M
Frame M + 1
(Previous Frame)
(Current Frame)
(Next Frame)
Figure 9 BV32 bit stream format
36
4
DETAILED DESCRIPTION OF THE BV32 DECODER
This section gives a detailed description of each functional block in the BV32 decoder shown in
Figure 3. Those blocks or signals that have the same labels as their counterparts in the encoder of
Figure 2 have the same meaning as those counterparts.
4.1
Bit De-multiplexing
The bit de-multiplexer block 100 takes one frame of input bit stream at a time, and de-multiplexes,
or separates, the five sets of indices LSPI, PPI, PPTI, GI, and CI from the current frame of input bit
stream. As described in Section 3 above, LSPI contains three indices: a 7-bit first-stage VQ index, a
5-bit second-stage lower VQ index, and a 5-bit second-stage upper VQ index. PPI is an 8-bit pitch
period index. PPTI is a 5-bit pitch predictor tap VQ index. GI contains two 5-bit gain indices, and
CI contains twenty 6-bit excitation VQ indices, each with 1 sign bit and 5 shape bits.
4.2
Long-Term Predictor Parameter Decoding
The long-term predictor parameter decoder (block 110) decodes the indices PPI and PPTI. The
pitch period is decoded from PPI as
pp = PPI + 10
Let { b0 , b1 ,K, b31} be the 3-dimensional, 32-entry codebook used for pitch predictor tap VQ, as
listed in Appendix 5. Let b j be the j-th codevector in this codebook, where the subscript j is the
codebook index listed in the first column of the table in Appendix 5. The three pitch predictor taps
b1 , b2 , and b3 are decoded from PPTI as
⎡ b1 ⎤
⎢b ⎥ = b
PPTI .
⎢ 2⎥
⎢⎣b3 ⎥⎦
4.3
Short-Term Predictor Parameter Decoding
The short-term predictor parameter decoding takes place in block 120 of Figure 3. Block 120
receives the set of decoded LSP indices, LSPI = {LSPI 1 , LSPI 2 , LSPI 3 }, from the bit de-multiplexer,
~
block 100 in Figure 3. First, block 120 reconstructs the LSP coefficients, {li } , from the LSP indices,
and then it produces the coefficients of the short-term prediction error filter, {a~i } , from the LSP
coefficients according to the conversion procedure specified in Section 3.6.
37
1206
~
e2
LSPI1
Index subquantizer 2
LSPI2
Index subquantizer 3
LSPI3
First stage VQ
Regular 8
dimensional
inverse subquantizer
1202
1204
1205
~
e21
~
e22
Second stage VQ
Regular 3
dimensional
inverse subquantizer
~e
22 ,1
order MA
prediction
12012
TEI
Index subquantizer 1
8th
Mean LSP vector
~
e2(1)
+
1209
l̂
+
~
e2(1)
+
( (1)
l
12010
+
12014 12015
Buffer
( ( 2)
l
LP coefficients
LSP to LP
conversion
{a~i }
Check ordering
property of lower
3 LSP coefficients
Append subvectors
TEI
1201
Regular 5
dimensional
inverse subquantizer
+
1208
12016
1203
~
e2( 2 )
ê1 1207
l
(
l
TEI
~
e22 , 2
~
l
12013
LSP spacing
Reconstructed
LSP vector
12011
Figure 10 BV32 short-term predictor parameter decoder (block 120)
Block 120 of Figure 3 is expanded in Figure 10. The reconstruction of the LSP coefficients from the
LSP indices is the inverse of the LSP quantization, and many operations have equivalents in Section
3.5 and Figure 5. The first-stage VQ is decoded in block 1204, and the second-stage split VQ is
decoded in block 12016.
Based on the index for the second-stage upper split VQ, block 1201 looks up the quantized upper
(0)
(1)
( 31)
, producing
split vector from the codebook CB 22 = cb 22 , cb 22 ,K, cb 22
{
}
( LSPI 3 )
~
e22, 2 = cb 22
.
Similarly, based on the index for the second-stage lower split VQ, block 1202 looks up the quantized
(0)
(1)
( 31)
, producing
lower split vector from the codebook CB 21 = cb 21 , cb 21 ,K, cb 21
{
}
( LSPI )
~
e22,1 = cb 21 2 .
Block 1203 performs the identical operation of block 1610 in Figure 5 and appends the two secondstage sub-vectors to form the second-stage output vector,
~
~e = ⎡ e22,1 ⎤ .
22
⎢~e ⎥
⎣ 22, 2 ⎦
38
From the index for the first stage VQ, block 1204 looks up the quantized first stage vector from the
(0)
(1)
(127 )
,
codebook CB1 = cb1 , cb1 ,K, cb1
{
}
( LSPI )
~
e21 = cb1 1 .
Adder 1205 performs the equivalent operation of Adder 1611 in Figure 5. It adds the first-stage and
second-stage vectors to obtain a first reconstructed prediction error vector,
~
e2(1) = ~
e21 + ~
e22 .
Equivalent to block 163 in Figure 5, block 1206 performs the 8th-order MA prediction of the meanremoved LSP vector according to
T
eˆ1,i = p LSP ,i [e~2,i (1)
T
~
e2,i ( 2) e~2,i (3) e~2,i ( 4) ~
e2,i (5) ~
e2,i (6) ~
e2,i (7) ~
e2,i (8) ] , i = 1, 2,K, 8 ,
where ~e2,i ( k ) and p LSP,i are defined in Section 3.5. Adder 1207, equivalent to Adder 1612 in Figure
5, generates the predicted LSP vector by adding the mean LSP vector and the mean-removed
predicted LSP vector,
ˆl = l + eˆ .
1
Subsequently, adder 1208 adds the predicted LSP vector to the first reconstructed prediction error
vector to obtain a first intermediate reconstructed LSP vector,
( (1)
l = ˆl + ~
e2(1) .
(
Adder 1209 subtracts the predicted LSP vector from a second intermediate reconstructed LSP l ( 2 ) ,
to calculate a second reconstructed prediction error vector
(
~
e2( 2 ) = l ( 2 ) − ˆl ,
to be used to update the MA predictor memory in the presence of bit-errors. Block 12010
determines the ordering property of the first 3 first intermediate reconstructed LSP coefficients,
(
l1(1)
(
l2(1)
(
l3(1)
≥ 0
(
≥ l1(1) ,
(
≥ l2(1)
This ordering property was enforced during the encoding operation of the constrained 3-dimensional
VQ of the lower split vector, block 168 of Figure 5. If the ordering is found to be preserved, the
Transmission-Error-Indicator, TEI , is set to 0 to indicate that no bit-errors in the LSP bits have
been detected. Otherwise, if it is not preserved, the Transmission-Error-Indicator is set to 1 to
indicate the likely presence of bit-errors in the LSP bits.
39
If the Transmission-Error-Indicator is 0, the switches 12011 and 12012 are in the left position, and
they route the first reconstructed prediction error vector ~e2(1) and the first intermediate reconstructed
(
LSP vector l (1) to the reconstructed prediction error vector ~e2 and the intermediate reconstructed
(
LSP vector l , respectively. Otherwise, if the Transmission-Error-Indicator is 1, the switches 12011
and 12012 are in the right position, and they route the second reconstructed prediction error vector
~e ( 2 ) and the second intermediate reconstructed LSP vector (l ( 2 ) to the reconstructed prediction error
2
(
vector ~e2 and the intermediate reconstructed LSP vector l , respectively. Hence, the reconstructed
prediction error vector and the intermediate reconstructed LSP vector are obtained as
⎧~
e (1) , if TEI = 0
~
e2 = ⎨ ~2( 2 )
⎩ e2 , if TEI = 1
and
(
( ⎧ l (1) , if TEI = 0
l = ⎨( ( 2)
,
,
if
=
1
l
TEI
⎩
respectively. Block 12013 enforces LSP spacing; it is functionally identical to block 1614 in Figure
5, as specified in Section 3.5. Block 12014 buffers the reconstructed LSP vector for future use in the
presence of bit-errors. The reconstructed LSP vector of the current frame becomes the second
intermediate reconstructed LSP vector of the next frame,
( ( 2)
~
l ( k + 1) = l ( k ) ,
where the additional parameter k here represents the frame index of the current frame. For the very
first frame the second intermediate reconstructed LSP vector is initialized to
( ( 2)
T
l = [1 / 9 2 / 9 K 8 / 9]
The final step of the short-term predictor parameter decoding is to convert the reconstructed LSP
coefficients to linear prediction coefficients. This operation takes place in block 12015, which is
functionally identical to block 17 of Figure 4, described in Section 3.6.
4.4
Excitation Gain Decoding
The excitation gain decoder is shown in Figure 11. It is part of block 130 in Figure 3. It decodes the
two gain indices in GI into the two corresponding decoded sub-frame excitation gains gq(m), m = 1,
2 in the linear domain. All operations in Figure 11 are performed sub-frame by sub-frame.
40
507
Estimate
signal
level
lv(m -1)
508
Compare
with
threshold
501
GI
Gain
prediction
error
decoder
509
506
505
lgeq(m)
+
+
lgq'(m)
Determine
final
decoded
log-gain
510
lgq(m)
Convert
to
linear
gain
qg(m)
502
lgeq'(m)
+
-
503
MA
log-gain
predictor
512
511
+
+
-
+
elg(m)
lgmean
504
Log-gain
mean
value
Figure 11 Excitation gain decoder
Refer to Figure 11. Let m be the sub-frame index of the current sub-frame, and assume the same
convention for the sub-frame index m as in Section 3.10. Block 501 decodes the 5-bit gain index
GI m into the log-gain prediction error lgeq(m) using the codebook in Appendix 6. Switch 502 is
normally in the upper position, connecting the output of block 501 to the input of block 503. Then,
the MA log-gain predictor (block 503) calculates the estimated log-gain for the current sub-frame as
elg ( m) =
GPO
∑ lgp(k )lgeq(m − k ) ,
k =1
where GPO = 16, and lgp(k), k = 1, 2, …, GPO are the MA log-gain predictor coefficients given in
Section 3.10.
Block 504 holds the long-term average log-gain value lgmean = 11.82031. Adders 505 and 506
adds elg(m) and lgmean, respectively, to lgeq(m), resulting in the temporarily decoded log-gain of
lgq′(m) = lgeq(m) + elg(m) + lgmean .
Block 507 is functionally identical to block 309 in Figure 7, described in Section 3.10. It is
important to note that equivalently to the encoder, the log-gain value passed to block 507 for
41
updating its estimate of the long-term average signal level is the final value of the decoded log-gain
lgq(m), i.e. after the threshold check of block 508 and potential log-gain extrapolation and
substitution of block 509, respectively, as described below.
Block 508 calculates the row and column indices i and j into the threshold matrix T(i, j) in the same
way as block 310 in Figure 7. Namely, the row index is calculated as
⎡ lgq( m − 1) − lv ( m − 1) − GLB ⎤
i=⎢
⎥⎥
2
⎢
,
where GLB = –24. If i > NG, i is clipped to NG. If i < 1, i is clipped to 1. The column index is
calculated as
⎡ lgq( m − 1) − lgq( m − 2) − GCLB ⎤
j=⎢
⎥⎥
2
⎢
,
where GCLB = –8. If j > NGC, j is clipped to NGC. If j < 1, j is clipped to 1.
Block 508 controls the actions of block 509 and switch 502 in the following way. If GI m = 0 or
lgq′(m) ≤ T(i, j) + lgq(m – 1), then switch 502 is in the upper position, block 509 determines the
final decoded log-gain as
lgq( m) = lgq′( m) ,
and the filter memory in the MA log-gain predictor (block 503) is updated by shifting the old
memory values by one position, and then assigning lgeq(m) to the newest position of the filter
memory.
If, on the other hand, GI m > 0 and lgq(m) > T(i, j) + lgq(m – 1), then the temporarily decoded loggain lgq′(m) is discarded, block 509 determines the final decoded log-gain as
lgq( m) = lgq( m − 1)
(by extrapolating the decoded log-gain of the last sub-frame); furthermore, switch 502 is moved to
the lower position, adders 511 and 512 subtract lgmean and elg(m), respectively, from lgq(m) to get
lgeq′( m) = lgq( m) − lgmean − elg ( m) ,
and this lgeq′(m) is used to update the newest position of the filter memory of block 503, after the
old memory values are shifted by one position.
Once the final decoded log-gain lgq(m) subject to the constraint imposed by block 509 is determined
as described above, it is used by block 508 to update the estimated signal level lv(m). This value
lv(m) is then used by block 509 in the next sub-frame (the (m + 1)-th sub-frame).
42
Block 510 converts final decoded log-gain lgq(m) to the linear domain as
gq( m) = 2
4.5
lgq ( m )
2
.
Excitation VQ Decoding and Scaling
The excitation codebook index array CI of each frame contains 20 excitation codebook indices,
CI(k), k = 1, …, 20, each containing 1 sign bit and 5 shape bits. The excitation vectors are decoded
vector-by-vector, and then sub-frame-by-sub-frame, since the excitation gain is updated once a subframe.
Suppose the current excitation vector that needs to be decoded is in the m-th sub-frame and has a
corresponding excitation codebook index of CI(k). This index assumes a value between 0 and 63.
The most significant bit of this index is the sign bit. Therefore, if CI(k) < 32, the sign bit is 0;
otherwise, the sign bit is 1. Let c j (n ), n = 1, 2, 3, 4 represent the j-th shape codevector in Appendix
8, with a shape codebook index of j. Furthermore, without loss of generality, let n = 1, 2, 3, 4
correspond to the sample time indices of the current vector. Then, in Figure 3, the decoded and
scaled excitation vector, or uq(n), n = 1, 2, 3, 4, is obtained as
⎧ gq( m) cCI ( k ) ( n ), n = 1, 2, 3, 4,
uq( n ) = ⎨
⎩− gq( m) cCI ( k ) − 32 ( n ), n = 1, 2, 3, 4,
4.6
if CI ( k ) < 32
if CI ( k ) ≥ 32
Long-Term Synthesis Filtering
Let n = 1, 2, …, FRSZ correspond to the sample time indices of the current frame. In Figure 3, the
long-term synthesis filter (block 155, consisting of block 140 and adder 150 in a feedback loop)
performs sample-by-sample long-term synthesis filtering as follows.
3
dq( n ) = uq( n ) + ∑ bi dq( n − pp + 2 − i ) , n = 1, 2, … FRSZ.
i =1
4.7
Short-Term Synthesis Filtering
The short-term synthesis filter (block 175, consisting of block 160 and adder 170 in a feedback loop)
performs sample-by-sample short-term synthesis filtering as follows.
43
8
sq( n ) = dq( n ) − ∑ a~i sq( n − i ) , n = 1, 2, … FRSZ.
i =1
4.8
De-emphasis Filtering
The de-emphasis filter (block 180) is a first-order pole-zero filter with fixed coefficients. It is
exactly the inverse filter of the pre-emphasis filter H pe (z ) described in Section 3.2. This deemphasis filter has the following transfer function.
H de ( z ) =
1 + 0.75z −1
1 + 0.5z −1
Block 180 filters the short-term synthesis filter output signal sq(n) to produce the output signal of
the entire decoder in Figure 3.
This completes the detailed description of the BV32 decoder.
4.9
Example Packet Loss Concealment
The packet loss concealment is not a mandatory component of this BV32 Codec Specification, since
packet lost concealment does not affect bit-stream compatibility or encoder-decoder interoperability. However, an example packet loss concealment technique is described in this section for
reference purposes only. An implementer of BV32 can utilize other packet loss concealment
techniques without affecting inter-operability.
The example packet loss concealment technique utilizes the synthesis model of the decoder. In
principle, all side information of the previous frame is repeated while the excitation of the cascaded
long-term and short-term synthesis filters is from a random source, scaled to a proper level. Hence,
with the additional index m denoting the m-th frame, during packet-loss:
•
The pitch period, pp, is set to the pitch period of the last frame 7:
pp = ppm −1 .
•
The pitch taps, b1 b2 and b3, are set to the pitch taps of the last frame 8.
bi = bm −1,i , i=1,2,3.
The short-term synthesis filter coefficients, a~ , i = 1,...,8 , are set to those of the last frame 9:
•
i
a~i = a~m −1,i , i=1,…,8.
•
A properly scaled random sequence is used as long-term synthesis filter excitation, uq(n), n =
1, 2, … FRSZ.
7
If the first frame is lost a value of 100 is used for the pitch period.
If the first frame is lost the pitch taps are set to zero.
9
If the first frame is lost the short-term filter coefficients are set to zero.
8
44
The speech synthesis of the bad frame (part of lost packet) now takes place exactly as specified in
Sections 4.6, 4.7, and 4.8.
The random sequence is scaled according to
uq (n) = g plc ⋅
Em −1
FRSZ
∑ [r (n)]
⋅ r (n) , n = 1, 2, … FRSZ,
2
n =1
where r(n), n = 1, 2, … FRSZ, is a random sequence, Em-1 is in principle the energy of the long-term
synthesis filter excitation of the previous frame 10, and the scaling factor, gplc, is calculated as
detailed below.
During good frames an estimate of periodicity is updated as
perm = 0.5 perm −1 + 0.5 bs ,
where bs is the sum of the three pitch taps clipped at a lower threshold of zero and an upper
threshold of one 11, while it is maintained during bad frames: perm = perm −1 . Based on the
periodicity the scaling factor is calculated as
g plc = −2 perm −1 + 1.9
with gplc clipped at a lower threshold of 0.1 and an upper threshold of 0.9.
After synthesis of the signal output of a lost frame, memories of predictive quantizers are updated.
The memory of the inverse LSP quantizer is updated with
~
~
e2,i = I m −1,i − eˆ1,i − I i , i=1,2,…,8,
~
where eˆ1,i is given in Section 4.3, I i in Section 3.5, and I m −1,i denotes the i-th LSP coefficients of the
(m-1)-th frame (as decoded according to Section 4.3 for a good frame, or repeated for a bad frame).
The memory of the inverse gain quantizer is updated with
lgeq(m) = lgq(m) − lgmean − elg (m) ,
where elg (m) is given in Section 4.4, lgmean in Section 3.10, and lgq(m) is calculated as
10
11
The energy is initialized to zero, i.e. E0=0.
The estimate of periodicity is initialized to zero, i.e. per0=0.
45
E m −1
E
⎧
log
, if m −1 > 1 4
⎪⎪ 2 FRSZ
FRSZ
lgq (m) = ⎨
E
⎪- 2,
if m −1 ≤ 1 4
⎪⎩
FRSZ
.
The level estimation for a bad frame is updated exactly as for a good frame, see Section 4.4.
At the end of a good frame (after synthesis of the output) the estimate of periodicity is estimated as
explained above, and the energy of the long-term synthesis filter excitation is updated as
Em =
FRSZ
∑ [uq(n)]
2
.
n =1
At the end of the processing of a bad frame (after synthesis of the output and update of predictive
quantizers), the energy of the long-term synthesis filter excitation and the long-term synthesis filter
coefficients are scaled down when 8 or more consecutive frames are lost:
⎧⎪ E
Em = ⎨ m −1 2
⎪⎩(β Nclf ) Em −1
⎧bm −1,i
bm, i = ⎨
⎩β Nclf bm −1,i
Nclf < 8
,
Nclf ≥ 8
Nclf < 8
, i=1,2,3,
Nclf ≥ 8
where Nclf is the number of consecutive lost frames, and the scaling, β Nclf , is given by
⎧1 − 0.02 ( Nclf − 7) 8 ≤ Nclf ≤ 57
.
Nclf > 57
⎩0
β Nclf = ⎨
This will gradually mute the output signal when consecutive packets are lost for an extended period
of time.
46
APPENDIX 1: GRID FOR LPC TO LSP CONVERSION
Grid point
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
Grid value
0.9999390
0.9935608
0.9848633
0.9725342
0.9577942
0.9409180
0.9215393
0.8995972
0.8753662
0.8487854
0.8198242
0.7887573
0.7558899
0.7213440
0.6853943
0.6481323
0.6101379
0.5709839
0.5300903
0.4882507
0.4447632
0.3993530
0.3531189
0.3058167
0.2585754
0.2109680
0.1630859
0.1148682
0.0657349
0.0161438
-0.0335693
-0.0830994
-0.1319580
-0.1804199
-0.2279663
-0.2751465
-0.3224487
-0.3693237
-0.4155884
-0.4604187
-0.5034180
-0.5446472
-0.5848999
-0.6235962
-0.6612244
-0.6979980
-0.7336731
-0.7675781
-0.7998962
-0.8302002
-0.8584290
-0.8842468
-0.9077148
-0.9288635
-0.9472046
-0.9635010
-0.9772034
-0.9883118
-0.9955139
47
59
-0.9999390
48
APPENDIX 2: FIRST-STAGE LSP CODEBOOK
Index
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
Element 1
-0.00384521
-0.00511169
-0.00367737
-0.00312805
-0.00250244
-0.00090027
0.00399780
-0.00151062
-0.00547791
-0.00393677
-0.00331116
-0.00222778
-0.00325012
-0.00054932
-0.00546265
-0.00254822
-0.00204468
-0.00289917
-0.00028992
-0.00028992
-0.00030518
0.00292969
0.01701355
0.00061035
-0.00398254
-0.00177002
0.00256348
0.00088501
0.00035095
0.00041199
0.01571655
0.00881958
-0.00672913
-0.00515747
-0.00303650
-0.00221252
-0.00231934
-0.00123596
0.00738525
0.00349426
-0.00619507
-0.00459290
-0.00405884
-0.00209045
-0.00489807
-0.00341797
-0.00016785
-0.00109863
-0.00527954
-0.00317383
0.00453186
0.00018311
-0.00143433
0.00213623
0.01060486
0.00221252
-0.00749207
-0.02087402
-0.00173950
-0.00424194
Element 2
-0.00849915
-0.01313782
-0.00166321
-0.00488281
-0.00323486
-0.00347900
0.01086426
-0.00581360
-0.00958252
-0.00848389
-0.00723267
-0.00709534
-0.00445557
-0.00219727
0.00070190
0.00099182
0.00265503
-0.00740051
0.00064087
0.00001526
0.00170898
0.00251770
0.02578735
-0.00015259
0.00350952
-0.00981140
0.01017761
-0.00016785
0.00717163
0.00004578
0.02601624
0.02149963
-0.01612854
-0.01365662
-0.00975037
-0.00933838
-0.00257874
-0.00584412
0.02700806
0.00294495
-0.01358032
-0.00839233
-0.00538635
-0.00204468
-0.00955200
-0.00909424
0.01191711
0.00473022
-0.00999451
-0.00744629
0.01782227
-0.00355530
0.00292969
0.00561523
0.05717468
0.00817871
-0.00627136
-0.04931641
0.01293945
-0.00158691
Element 3
-0.01591492
-0.01698303
0.00045776
0.00282288
0.00154114
-0.00909424
0.00677490
-0.00186157
0.00094604
-0.01943970
0.00175476
-0.00581360
0.00651550
-0.00631714
0.02934265
0.02000427
-0.00135803
-0.01710510
0.00022888
-0.00805664
-0.00651550
-0.00447083
-0.00593567
0.00686646
0.01591492
-0.03118896
0.01966858
-0.00163269
0.00427246
-0.01815796
0.01066589
0.01010132
-0.02481079
-0.01542664
-0.02221680
-0.02006531
0.00263977
-0.01034546
0.01812744
-0.00387573
-0.01676941
-0.02026367
0.00645447
-0.00219727
-0.00572205
-0.00500488
0.03486633
0.01737976
-0.00939941
-0.00877380
0.00762939
-0.01539612
0.01277161
0.00642395
0.03829956
0.01704407
0.02369690
0.00619507
0.04112244
0.02459717
Element 4
-0.00360107
-0.00103760
-0.00309753
-0.00173950
0.00422668
-0.00746155
0.00090027
-0.00430298
0.01203918
-0.01473999
0.03128052
0.01132202
0.00497437
-0.01139832
0.01412964
-0.00164795
-0.02322388
-0.02655029
-0.00819397
-0.02310181
-0.01683044
-0.01782227
0.00595093
0.00129700
-0.00076294
-0.01042175
0.01533508
-0.00199890
0.00279236
-0.03132629
0.03164673
0.00360107
-0.00184631
-0.01049805
0.01498413
0.00033569
-0.00134277
-0.01982117
0.02203369
-0.01075745
0.01498413
-0.02606201
0.03422546
0.00228882
0.00482178
-0.00860596
0.03454590
0.00859070
-0.00805664
-0.02050781
-0.00749207
-0.02656555
0.00936890
-0.00889587
0.03216553
-0.00007629
0.02711487
0.00404358
0.03024292
0.01078796
49
Element 5
-0.00013733
-0.01216125
0.01814270
0.00004578
-0.00964355
-0.00656128
0.00244141
-0.01788330
0.00695801
0.01364136
0.00772095
-0.00482178
-0.01744080
-0.01916504
0.00656128
-0.01643372
0.00332642
-0.01350403
0.00061035
-0.00082397
0.00083923
-0.02940369
0.01370239
-0.00637817
0.02429199
-0.00013733
0.01405334
-0.00700378
0.02046204
-0.00378418
0.03356934
0.00122070
0.00761414
-0.01742554
0.02423096
0.00292969
-0.00151062
-0.02880859
0.00323486
-0.02171326
0.02687073
0.02151489
0.03749084
0.02597046
-0.00778198
-0.04263306
0.02195740
-0.00253296
-0.00268555
-0.03236389
0.03543091
-0.00277710
0.00128174
-0.03330994
0.02561951
-0.00616455
0.03462219
0.01080322
0.03976440
0.00611877
Element 6
0.00610352
-0.00427246
-0.00053406
-0.00094604
-0.01895142
-0.02726746
-0.00988770
-0.01603699
0.02105713
-0.00468445
-0.00163269
-0.00050354
0.01000977
-0.00711060
0.00003052
-0.00813293
0.01715088
0.00151062
0.02536011
-0.00106812
-0.00955200
-0.02981567
0.01223755
-0.02079773
0.02890015
0.00044250
0.01646423
-0.00726318
0.00689697
-0.02220154
0.02770996
-0.00657654
0.01754761
0.02040100
0.00935364
-0.01268005
-0.00566101
-0.02052307
-0.00514221
-0.03224182
0.02645874
0.02061462
0.02166748
0.00415039
0.01531982
-0.00547791
0.01472473
-0.03044128
0.04862976
0.01905823
0.01852417
-0.01931763
-0.00985718
-0.05546570
0.02203369
-0.04737854
0.04241943
0.00926208
0.03063965
0.00105286
Element 7
0.01640320
-0.00271606
0.00256348
-0.01976013
0.01704407
-0.00769043
0.00549316
-0.03099060
0.00720215
-0.00344849
0.00566101
-0.01037598
0.01194763
0.00106812
0.01229858
-0.00671387
0.01350403
-0.00038147
-0.00822449
-0.02081299
0.02677917
0.00372314
0.00622559
-0.05078125
0.01559448
-0.00659180
-0.00257874
-0.02569580
0.02848816
0.00140381
0.01812744
-0.01893616
0.00720215
-0.00880432
-0.00544739
-0.02940369
0.00665283
-0.01663208
0.01075745
-0.02403259
0.01818848
-0.00651550
0.00497437
-0.02684021
0.03317261
0.00357056
0.03034973
-0.00776672
0.01870728
0.01884460
-0.00367737
-0.03083801
0.04154968
0.00897217
0.01969910
-0.03558350
0.02859497
0.00779724
0.00881958
-0.02471924
Element 8
-0.00166321
0.00846863
-0.00833130
0.00306702
0.00219727
-0.00224304
-0.00628662
-0.00659180
0.00140381
0.00566101
-0.00460815
-0.01887512
-0.00160217
-0.01481628
0.00367737
-0.01013184
0.00199890
0.00778198
-0.02096558
-0.01762390
0.00958252
-0.00421143
-0.00111389
-0.01544189
0.00701904
-0.01545715
-0.01338196
-0.03907776
0.01043701
-0.00294495
0.00709534
-0.02380371
0.01480103
-0.00152588
-0.00675964
-0.00543213
0.03112793
0.00572205
0.00660706
-0.02343750
0.01010132
-0.00538635
-0.00592041
-0.01873779
0.01727295
0.00357056
0.02073669
-0.01104736
0.00442505
0.00524902
-0.01086426
0.00360107
0.02775574
0.00265503
0.00923157
0.00561523
0.01635742
0.00225830
-0.00358582
-0.02410889
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
-0.00451660
-0.00369263
0.01309204
0.00236511
-0.00433350
-0.00448608
-0.00320435
-0.00141907
-0.00105286
-0.00238037
0.01274109
0.00193787
-0.00361633
-0.00132751
-0.00360107
-0.00175476
-0.00105286
0.00044250
0.00265503
0.00163269
-0.00450134
-0.00181580
-0.00350952
0.00189209
-0.00234985
0.00024414
0.05004883
0.00430298
0.00041199
-0.00491333
0.01916504
0.00370789
0.01432800
0.00350952
0.05522156
0.03974915
-0.00840759
-0.00660706
-0.00529480
-0.00257874
-0.00048828
0.00151062
0.01791382
0.00483704
-0.00576782
-0.00526428
-0.00202942
-0.00016785
-0.00460815
0.00132751
-0.00267029
0.00160217
-0.00639343
-0.00352478
-0.00077820
0.00099182
-0.00256348
0.00869751
0.02838135
0.04316711
-0.00311279
-0.02593994
0.00602722
0.00640869
-0.00239563
-0.00415039
-0.01776123
0.03527832
0.01925659
-0.01188660
-0.01176453
0.00030518
-0.00477600
-0.00151062
-0.00828552
0.02935791
-0.00151062
-0.00816345
-0.00799561
-0.00260925
-0.00695801
0.00723267
-0.00314331
0.02203369
0.00399780
-0.00598145
-0.00743103
-0.00422668
-0.00177002
-0.00527954
-0.00541687
0.03166199
0.00444031
0.00929260
-0.02757263
0.03500366
0.00387573
0.03143311
0.00082397
0.04231262
0.03291321
-0.02593994
-0.02162170
-0.00663757
-0.00895691
0.00041199
-0.00248718
0.06657410
0.01110840
-0.01533508
-0.01495361
-0.00065613
0.00070190
-0.00889587
-0.00230408
0.01330566
0.01959229
-0.01054382
-0.01866150
0.01048279
-0.01753235
-0.00033569
0.00711060
0.09507751
0.05152893
0.00325012
-0.07890320
0.03062439
0.01506042
0.00444031
0.00253296
-0.03298950
0.04226685
0.04072571
-0.02235413
-0.02374268
0.00944519
-0.00032043
-0.00180054
-0.00988770
0.00981140
-0.00468445
0.00148010
-0.02526855
0.03428650
0.00030518
0.02352905
-0.00833130
0.05549622
0.01559448
-0.01719666
-0.02114868
-0.00115967
-0.01432800
-0.01350403
-0.01794434
0.02220154
0.00691223
0.00347900
-0.06730652
0.02929688
0.00061035
0.01612854
-0.03111267
0.04219055
0.01431274
-0.04820251
-0.03446960
-0.01538086
-0.00971985
-0.00028992
-0.01004028
0.02952576
0.01173401
-0.01855469
-0.03617859
0.02978516
0.00903320
-0.00535583
-0.00685120
0.07746887
0.05360413
-0.00729370
-0.02894592
0.01577759
-0.02914429
0.00053406
-0.00775146
0.06649780
0.01130676
0.02406311
-0.02648926
0.07060242
0.02645874
0.01907349
0.01228333
-0.01219177
0.04809570
0.02778625
0.01066589
-0.01464844
0.01014709
-0.00436401
-0.00811768
0.00376892
-0.00921631
-0.01261902
0.03401184
-0.03221130
0.04959106
0.02726746
0.00462341
-0.02253723
0.02410889
-0.00083923
-0.02134705
-0.03652954
0.00111389
-0.02612305
-0.01855469
-0.02980042
0.01562500
-0.00653076
0.00259399
-0.02465820
0.03329468
-0.00419617
0.00932312
-0.06707764
0.03793335
0.00024414
0.00361633
-0.01261902
-0.00068665
-0.00666809
-0.00938416
-0.02021790
0.01698303
-0.00868225
0.03782654
-0.05659485
0.07008362
0.03486633
-0.00605774
-0.03175354
0.05206299
0.01110840
0.00303650
-0.03585815
0.02217102
-0.03764343
0.00070190
-0.03376770
0.05419922
-0.02204895
0.04458618
-0.02957153
0.06431580
0.01609802
0.03089905
50
0.02276611
-0.03230286
0.04991150
0.01647949
0.01145935
-0.01629639
0.03031921
-0.00563049
-0.02941895
-0.02708435
-0.01629639
-0.02470398
0.01333618
0.00328064
0.01815796
-0.00277710
-0.01211548
-0.03590393
0.00866699
-0.03933716
0.02500916
-0.03193665
0.03088379
-0.01161194
-0.00726318
-0.03829956
0.00930786
-0.01719666
0.05375671
-0.00869751
0.02725220
-0.01568604
0.01620483
-0.02024841
0.03443909
0.00086975
0.01782227
-0.02426147
0.05569458
0.00314331
-0.00831604
-0.03338623
0.00154114
-0.03361511
0.04870605
0.00096130
0.05216980
0.03285217
-0.01533508
-0.06939697
0.03462219
-0.02023315
0.02307129
-0.06474304
0.07875061
-0.03930664
-0.00105286
-0.06314087
0.04470825
-0.04406738
0.07960510
-0.01586914
0.04623413
0.00148010
0.03352356
0.02371216
-0.02035522
0.04533386
-0.01173401
-0.00656128
-0.01852417
0.00007629
-0.02128601
-0.01837158
-0.03489685
-0.00587463
-0.03384399
0.01911926
0.00810242
0.00881958
-0.01660156
0.02276611
0.00534058
0.00965881
-0.01277161
0.02310181
-0.00167847
0.04490662
-0.01190186
-0.00196838
-0.04582214
0.00764465
-0.04112244
0.03878784
-0.00566101
0.01902771
-0.02262878
0.02969360
-0.01860046
0.03150940
-0.01142883
0.03044128
0.01382446
0.01844788
-0.00125122
-0.03677368
-0.05041504
-0.00361633
-0.06233215
0.04002380
0.01994324
0.03585815
0.00875854
0.03904724
-0.02249146
0.02912903
-0.03753662
0.07855225
-0.00245667
0.04440308
-0.04081726
0.00088501
-0.08934021
0.03926086
-0.06632996
0.07987976
-0.01681519
0.02545166
-0.01939392
0.05075073
0.05001831
-0.01049805
0.03337097
-0.02360535
0.02409363
0.01446533
-0.00328064
-0.03314209
0.03617859
-0.00431824
0.00247192
-0.04949951
0.02272034
0.00950623
-0.00042725
-0.02694702
0.02523804
-0.01576233
0.00958252
-0.02479553
0.02972412
0.00451660
-0.00390625
-0.04681396
0.04997253
0.01480103
0.00833130
-0.09020996
0.02937317
-0.00590515
-0.00694275
-0.05206299
0.03417969
-0.00958252
0.02209473
-0.03588867
0.02810669
0.01550293
0.00303650
-0.05572510
0.01962280
-0.02108765
0.00166321
-0.03771973
0.02944946
0.01362610
0.01889038
-0.03100586
0.05235291
-0.01206970
0.02680969
-0.01402283
0.04028320
0.02102661
0.00306702
-0.06845093
0.08264160
-0.01795959
0.02592468
-0.08439636
0.04357910
-0.00686646
-0.00128174
-0.05572510
0.07539368
0.02963257
-0.00700378
0.01974487
-0.01696777
0.01565552
0.01126099
0.01599121
0.02626038
0.01126099
0.00047302
0.00064087
-0.00338745
0.01939392
0.01345825
-0.00680542
-0.01084900
0.01177979
-0.01797485
-0.00190735
-0.01690674
0.01644897
0.00935364
-0.01063538
-0.02130127
0.02980042
0.01237488
0.00251770
-0.00898743
0.01449585
0.00354004
-0.01644897
-0.04679871
0.02700806
-0.00173950
0.01277161
-0.01281738
0.02386475
0.01689148
0.00178528
0.00030518
0.03395081
0.00358582
-0.00538635
-0.02009583
0.01617432
0.00981140
0.00332642
-0.03533936
0.02601624
-0.00375366
0.01100159
-0.01716614
0.01892090
0.01512146
-0.00975037
-0.01873779
0.03260803
0.00088501
0.01179504
-0.05546570
0.01593018
-0.00469971
-0.01522827
-0.06500244
0.03486633
125
126
127
-0.00079346
0.00588989
0.02717590
-0.03021240
0.03402710
0.07472229
-0.05854797
0.08795166
0.08680725
-0.07080078
0.09323120
0.03575134
51
-0.06494141
0.07124329
0.00018311
-0.05015564
0.05776978
-0.03523254
-0.02285767
0.03340149
-0.05368042
-0.00508118
0.01075745
-0.04931641
APPENDIX 3: SECOND-STAGE LOWER SPLIT LSP CODEBOOK
Index
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Element 1
0.00281525
-0.00021553
0.00709152
-0.00341034
-0.00196075
-0.00179482
-0.00576019
-0.00498390
0.00724030
-0.00100517
0.01622772
-0.01317978
0.00139236
0.00160599
0.00048065
0.00121498
0.01221657
0.00564766
0.02144051
-0.01160431
-0.00497437
-0.00357437
-0.01611328
-0.01193810
0.01710129
0.00753784
0.03960609
-0.03484535
-0.00045013
-0.00150681
0.00778198
-0.01263237
Element 2
0.00292778
-0.00037766
-0.00558853
-0.00456047
0.00144005
-0.00482559
0.00680923
-0.01045990
0.00892258
0.00750542
0.00503349
-0.00148201
0.01294518
-0.00276566
0.02153206
-0.01841927
0.00114632
0.00059319
-0.01291847
-0.01168442
-0.00429916
-0.01308441
0.01459503
-0.02121544
0.01618958
0.01832008
0.01548195
0.00230217
0.01565170
-0.01651573
0.04269028
-0.04002953
Element 3
0.00433731
-0.00252151
-0.00040245
0.00535393
0.01340103
-0.00926208
0.00318718
-0.00181580
-0.00010681
-0.01124763
-0.00928497
-0.00485039
0.01284790
-0.02051735
-0.00239372
0.00706482
0.01258469
-0.00907707
-0.00042725
0.01208878
0.02562332
-0.01529694
0.00725365
-0.00399017
0.00624657
-0.02398491
-0.00556374
0.00053406
0.03667641
-0.03601646
0.00644302
0.00638008
52
APPENDIX 4: SECOND-STAGE UPPER SPLIT LSP CODEBOOK
Index
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Element 1
0.00223160
0.00498199
-0.00000954
0.00362587
-0.01116562
0.00366402
-0.00282288
0.01284599
-0.00849152
-0.00639915
-0.00438499
0.01028252
-0.01620674
-0.00172234
-0.00300217
0.01217651
-0.00471497
0.00894547
-0.00122452
-0.00556946
-0.01784134
-0.00508881
-0.00472450
0.02260780
-0.00877571
0.00730515
0.00929070
0.01049805
-0.02701187
0.00286293
0.00362587
0.02854538
Element 2
-0.00800133
0.00384903
0.00230217
0.01415634
0.00059700
0.00034904
-0.00809288
0.00154495
-0.00714302
0.00654030
0.00685120
0.00627327
0.00895309
0.00682259
-0.00821686
-0.00773621
-0.01052666
-0.00356674
0.00730324
0.02675247
0.00078583
0.00965881
-0.01339912
0.01769447
-0.00870895
0.00027847
-0.00706673
0.01000977
-0.01168251
-0.00534248
-0.02618980
-0.00962830
Element 3
-0.00899124
-0.00713539
0.00827026
0.00111580
-0.01137161
-0.00654984
0.00408554
0.00731087
0.00018120
-0.00492859
-0.00248146
-0.00315285
0.00953102
0.00998497
0.00954819
0.00847435
-0.02195930
-0.00493240
0.01606369
-0.00582695
-0.00429535
0.00708389
0.00592613
0.00827408
-0.01420212
-0.00198555
-0.00564384
-0.02177620
0.01052856
0.02644157
0.00177765
-0.00597000
Element 4
0.00006485
-0.00961494
0.00367355
0.00265884
0.00316811
0.00271797
-0.00595474
0.00330925
0.00532913
-0.00344276
0.01663589
0.00683403
0.00367737
-0.01184273
0.01287270
-0.00031281
-0.01058769
-0.02550888
0.01205063
-0.00326729
-0.01312637
-0.01148987
-0.01262474
-0.00707054
0.01482201
-0.01367950
0.01904678
0.00494194
0.00321388
-0.00658035
0.00383186
-0.00085640
53
Element 5
0.00058365
-0.00307274
0.00186920
-0.00458145
-0.00823975
-0.01940155
-0.00964355
-0.00998116
0.00732613
0.01243401
0.00031281
0.00990868
-0.00362778
0.00318718
-0.00807762
0.00645638
0.00412560
-0.00962448
0.01569366
0.00189209
-0.00244522
-0.02126884
0.00816154
-0.00349998
0.01783562
0.02097321
0.00018692
0.00013351
0.00094223
-0.00415039
-0.00398064
-0.00148964
APPENDIX 5: PITCH PREDICTOR TAB CODEBOOK
Index
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Element 1
-0.1450195
-0.1288755
-0.0344850
-0.0050965
-0.0058290
0.2434080
-0.0683900
0.0208435
0.0831910
0.1194460
0.0783080
0.1614380
0.1701965
0.3446655
0.1160280
0.2894895
-0.1333315
-0.0087890
-0.0806275
-0.0332030
0.0019225
0.2192995
0.0077210
0.1473085
0.1340940
0.4174500
0.0624085
0.1651305
0.1640320
0.3888245
0.0041200
0.2724305
Element 2
0.2992860
0.6889955
0.2010195
0.5024415
0.2820435
0.4494935
-0.2305605
0.3207705
0.3331300
0.6207885
0.1853335
0.4213255
0.2445375
0.4873655
-0.2495730
0.3384705
0.3481750
0.8484800
0.2053835
0.4768980
0.2650755
0.7223815
0.0712585
0.2614745
0.4643860
0.6805420
0.1940615
0.4480895
0.2704775
0.5142820
-0.2192080
0.3651735
Element 3
0.1412050
0.4095765
0.1164550
0.3855895
0.0539245
0.2871705
0.0762635
0.2186585
0.0210265
0.2395935
0.0917360
0.3347170
0.0487670
0.1487425
0.0732725
0.2445070
-0.0697630
0.1400755
-0.0161745
0.1187745
-0.1307375
0.0311585
0.0126040
0.1786500
-0.1044920
-0.1161805
-0.0299375
0.1498110
-0.1030580
-0.0249940
-0.0823975
0.0523375
54
APPENDIX 6: GAIN CODEBOOK
Index
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Element
-4.91895
-3.75049
-3.09082
-2.59961
-2.22656
-1.46240
-0.88037
-0.34717
-1.93408
-1.25635
-0.70117
-0.16650
0.20361
0.82568
1.59863
2.75684
-1.68457
-1.06299
-0.52588
0.01563
0.39941
1.05664
1.91602
3.34326
0.60693
1.31201
2.29736
4.11426
5.20996
6.70410
8.74316
10.92188
55
Relative log-gain of previous frame, [dB2]
APPENDIX 7: GAIN CHANGE THRESHOLD MATRIX
-24 to –22
-22 to –20
-20 to –18
-18 to –16
-16 to –14
-14 to –12
-12 to –10
-10 to –8
-8 to –6
-6 to –4
-4 to –2
-2 to 0
0 to 2
2 to 4
4 to 6
6 to 8
8 to 10
10 to 12
i=1
i=2
i=3
i=4
i=5
i=6
i=7
i=8
i=9
i=10
i=11
i=12
i=13
i=14
i=15
i=16
i=17
i=18
-8 to -6
j=1
0.00000
0.00000
0.00000
6.31250
0.00000
-0.36523
5.51172
3.95703
7.37305
7.37305
4.39844
0.58789
0.14453
0.00000
0.00000
0.00000
0.00000
0.00000
-6 to -4
j=2
0.13477
0.64453
0.33594
5.50977
5.04883
6.15625
6.31641
10.51172
8.93945
8.12109
5.94336
5.10938
5.64844
5.54688
0.39258
0.00000
0.00000
0.00000
-4 to -2
j=3
2.26563
4.90039
7.27734
4.83984
5.09180
8.26953
9.66602
8.42969
8.57422
6.66406
5.73047
5.41602
5.05859
5.15625
3.92188
1.15039
0.37695
0.07617
Log-gain change of previous frame, [dB2]
-2 to 0
0 to 2
2 to 4
4 to 6
j=4
j=5
j=6
j=7
2.94336
4.71875
0.00000
0.00000
3.38281
4.58203
5.69336
0.00000
5.82422
11.66211
11.66211
0.00000
6.99023
8.22852
11.49805
1.89844
5.91406
6.92188
7.38086
4.13867
5.40430
5.88477
11.53906
5.31836
7.58594
10.63281
12.03906
8.79297
7.62891
11.45703
11.95898
10.85352
6.85742
9.67773
11.54492
10.98242
5.87891
7.59766
10.67969
10.42578
5.10742
5.69531
8.31641
10.05273
4.55273
4.32813
5.75586
7.42383
4.06836
3.51758
4.07617
4.56055
3.37891
2.90430
2.74805
2.82422
2.67383
2.66602
2.40039
4.65039
2.56641
3.98438
3.61133
4.66797
4.30664
7.07031
0.81641
2.86914
1.46875
3.49219
3.16992
-0.84180
56
6 to 8
j=8
0.00000
0.00000
0.00000
0.00000
0.00000
-4.97070
3.06836
2.83008
10.43359
9.46875
8.23047
6.63867
4.99219
3.37500
3.29883
0.58398
1.19336
3.81250
8 to 10
j=9
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
1.50000
2.53320
6.85938
7.11328
6.81055
5.51953
4.02930
2.16016
-0.26563
0.69922
-0.50781
10 to 12
j=10
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
5.05859
3.06445
3.04102
4.14258
4.82227
4.49805
2.95703
0.09570
-1.23242
0.00000
12 to 14
j=11
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
-1.27930
3.31641
5.19141
3.42188
0.40820
0.00000
0.00000
0.00000
APPENDIX 8: EXCITATION VQ SHAPE CODEBOOK
Index
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Element 1
-0.537476
1.145142
1.174194
2.946899
-1.704102
-0.889038
-0.756958
1.373169
-0.573364
1.058716
0.432617
2.542358
-2.572266
0.432251
0.701538
0.857056
-1.474365
-0.361694
0.407104
1.670288
-1.860596
-0.845703
0.425415
0.208374
-1.022583
0.502075
0.270630
2.266357
-1.876343
-0.389771
-0.040771
0.448242
Element 2
0.974976
1.222534
1.399414
0.798096
0.098755
-0.337402
-0.061890
-0.413330
-0.463745
-0.566040
0.441895
0.207031
-2.758423
-2.303711
-1.355591
-1.842285
1.636108
0.711914
1.661255
1.159668
0.592285
0.081421
0.641357
0.481567
0.425781
-0.491455
0.005981
-1.128540
-0.895142
-1.818604
-1.141968
-0.755127
Element 3
-0.631104
-1.252441
0.330933
-0.274658
-0.526001
0.784546
0.558960
0.690552
-0.606934
-1.677246
-0.630493
-1.611450
-0.499390
-2.016479
-0.861572
-0.006348
-0.683838
-0.136353
0.566406
1.760254
1.213379
2.197754
1.210205
1.808472
-0.168945
-0.296631
0.257813
-0.399414
-0.012207
1.185791
0.364258
1.767578
Element 4
-0.617920
0.616211
0.823120
-0.027344
-0.395508
0.298462
-0.907227
-0.794067
-0.623535
0.752563
-1.445801
0.313354
-0.020142
0.228638
-0.243042
1.216919
0.362915
1.619873
-0.559937
0.524780
0.719482
1.654785
-1.444580
0.685913
-1.642700
-0.068359
-0.466309
0.438477
0.886841
0.913452
-0.283691
-0.691406
57