OKI MSM66P56 Si/sd voice recognizer, recorder/player, and speech synthesizer Datasheet

MSM6679AL-110
Voice Recognition Processor
FIRST EDITION
ISSUE DATE: Nov. 1998
MSM6679AL-110 Voice Recognition Processor
¡ Semiconductor
Contents
General Description ...........................................................................................................................
Features ................................................................................................................................................
Functional and I/O Diagrams ..........................................................................................................
Pin Descriptions .................................................................................................................................
Electrical Specifications .....................................................................................................................
Absolute Maximum Ratings ............................................................................................
Operating Conditions .......................................................................................................
DC Characteristics (VDD = 2.7 to 5.5 V, Ta = -30 to 70˚C) ..........................................
AC Characteristics ............................................................................................................
Timing Diagram ................................................................................................................
System Configuration Example .......................................................................................................
Functional Description ......................................................................................................................
Voice Recognition .............................................................................................................
SI Recognition ...................................................................................................
SD Recognition .................................................................................................
Name Tag Recording ........................................................................................................
Audio Input Interface .......................................................................................................
Audio Output Interface ....................................................................................................
Memory Interface ..............................................................................................................
External Voice Synthesis Control ...................................................................................
Serial Interface ...................................................................................................................
MSM6679AL-110 Slave-Mode API ..................................................................................................
Command Summary ........................................................................................................
Command Descriptions ...................................................................................................
Asynchronous Serial Protocol Example ........................................................................
1
1
2
6
10
10
10
11
12
13
14
15
15
15
18
19
19
19
19
21
22
23
24
28
40
E2F0013-28-Y1
version: Nov.
1998
MSM6679AL-110 VoiceThis
Recognition
Processor
¡ Semiconductor
MSM6679AL-110
¡ Semiconductor
SI/SD Voice Recognizer, Recorder/Player, and Speech Synthesizer
GENERAL DESCRIPTION
The MSM6679AL-110 Voice Recognition Processor (VRP) is a slave-mode device that performs
five func-tions: speaker-independent (SI) voice recognition, speaker-dependent (SD) voice
recognition, solid-state sound recording, sound playback, and speech synthesis. The highly
integrated device also provides an on-chip memory controller, Flash memory interface, analog
data conversion, Oki speech synthesizer interface, and pulse width modulation (PWM) sound
output.
For SI recognition, the MSM6679AL-110 contains a vocabulary template in external memory.
Pretrained SI vocabularies eliminate the need for laborious training, as usually required by SD
products. The memory requirements are dependent on the size of the vocabulary. The
MSM6679AL-110 can tolerate background noise, while providing high recognition accuracy. In
its designated operating environment, the device achieves a typical recognition accuracy of
>95% (using an Oki-defined test procedure).
For SD recognition, the MSM6679AL-110 stores SD vocabulary templates, as defined by the user,
in external SRAM. The MSM6679AL-110 can create SD vocabularies of up to 61 words each, with
each word using approximately 50 bytes.
In addition to providing voice recognition capabilities, the MSM6679AL-110 integrates a solidstate recorder/player, speech synthesis functions, and a tone generator. ADPCM recording/
playback provides high quality sound and efficient memory utilization. The MSM6679AL-110
can respond to spoken com-mands, verbally or with tones, via an on-chip speech synthesizer and
tone generator. For larger speech-synthesis requirements, the MSM6679AL-110 also provides a
glueless MSM665x control interface for off-chip speech synthesis.
The MSM6679AL-110 can interface to any application or personal computer via a serial interface
through an open, device-independent serial mode API (SMAPI). To accelerate code development,
Oki supplies an evaluation kit, and assembly and C language programs for this product.
The MSM6679AL-110 is a low power version of the MSM6679A-110.
Note:
This device is intended for use in applications other than central office communication
systems and central office switching systems.
FEATURES
• SI recognition
- Up to 20 - 25 words in each vocabulary
- Multiple vocabulary support
• SD recognition
- Up to 61 words in each vocabulary
- Multiple vocabulary support
• Speech synthesis
- Up to 2.3-sec internal and 27.6-sec external
speech synthesis on-chip; sample looping
and concatenation allows even longer
phrases.
- On-chip controller for MSM665x speech
synthesizer
- Standard beep tone outputs
- Pulse code modualation (PCM) and
adaptive differential pulse code
modualation (ADPCM) voice or soundeffect output
• Speech capture and playback
- 28-kbps ADPCM speech compression
• Serial ASCII command interface
• 6944-Hz audio input sample rate for record
and playback
• 10-kHz sample rate for voice recognition
• 200-msec recognition latency
• Flexible memory mapping for EPROM,
FLASH, and SRAM
• 14.3182 MHz operation
• Package: 100-pin TQFP
(TQFP100-P-1414-0.50-K)
1
MSM6679AL-110 Voice Recognition Processor
¡ Semiconductor
FUNCTIONAL AND I/O DIAGRAMS
Analog
Recognition and
System
Serial
Input
Synthesis Engine
Controller
Interface
PWM
Vocabulary
Algorithm
Memory
Memory
Output
External
Speech
Synthesis
External Memory Control
Control
Figure 1. MSM6679AL-110 Block Diagram
ADC0 ~ ADC7
D0 ~ D7
A/D Interface
VREF
Serial-Mode
MSM665x Interface
PWM Output
Serial Interface
IC Reset and Oscillator Inputs
NAR
BUSY
SI
SD
STROBE
RESOUT
VOICEOUT1
RXD
TXD
RES
OSC0
OSC1
A0 ~ A15
WRRAM
RDRAM
RAMPAGE0
RAMPAGE1
SLEEP
PDC
Figure 2. MSM6679AL-110 Logic Symbol
2
Memory Interface
76 N/C
77 N/C
78 N/C
79 N/C
80 VDD
81 VREF
82 ADC0
83 ADC1
84 ADC2
85 ADC3
86 ADC4
87 ADC5
88 ADC6
89 ADC7
90 AGND
91 RXD
92 TXD
93 GND
94 N/C
95 N/C
96 N/C
MSM6679AL-110 Voice Recognition Processor
97 N/C
98 N/C
99 N/C
100 N/C
¡ Semiconductor
1
75 A15
N/C 2
74 A14
N/C
56 D5
N/C 21
55 D4
N/C 22
54 D3
N/C 23
53 D2
N/C 24
52 D1
N/C 25
51 D0
WRRAM 50
57 D6
PDC 20
RDRAM 49
58 D7
N/C 19
N/C 48
59 GND
N/C 18
N/C 47
60 A0
N/C 17
SLEEP 46
61 A1
N/C 16
N/C 45
62 A2
N/C 15
N/C 44
63 A3
GND 14
N/C 43
64 A4
VDD 13
N/C 42
65 A5
NAR 12
VDD 41
66 A6
N/C 11
OSC1 40
67 A7
VOICEOUT1 10
OSC0 39
9
GND 38
68 A8
N/C
N/C 37
8
N/C 36
69 A9
N/C
VDD 35
7
N/C 34
70 A10
SD
N/C 33
6
RES 32
71 A11
SI
RAMPAGE1 31
5
RAMPAGE0 30
72 A12
BUSY
N/C 29
4
N/C 28
73 A13
STROBE
N/C 27
3
N/C 26
RESOUT
Figure 3. MSM6679AL-110 100-Pin TQFP Pinout
3
MSM6679AL-110 Voice Recognition Processor
¡ Semiconductor
MSM6679AL-110 Alphabetic Pin List
4
Name
#
Name
#
Name
#
Name
#
Name
#
Name
#
A0
60
A10
70
ADC4
86
D4
55
RAMPAGE0
30
TXD
92
VDD
13, 35,
41, 80
VOICEOUT1
10
A1
61
A11
71
ADC5
87
D5
56
RAMPAGE1
31
A2
62
A12
72
ADC6
88
D6
57
RDRAM
49
A3
63
A13
73
ADC7
89
D7
58
RES
32
14,38, RESOUT
59,93
RXD
A4
64
A14
74
AGND
90
A5
65
A15
75
BUSY
5
A6
66
ADC0
82
D0
51
NAR
12
SD
7
A7
67
ADC1
83
D1
52
OSC0
39
SI
6
A8
68
ADC2
84
D2
53
OSC1
40
SLEEP
46
A9
69
ADC3
85
D3
54
PDC
20
STROBE
4
GND
3
VREF
81
91
WRRAM
50
¡ Semiconductor
MSM6679AL-110 Voice Recognition Processor
Figure 4. MSM6679AL-110 100-Pin Package Mechanical Drawing
5
MSM6679AL-110 Voice Recognition Processor
¡ Semiconductor
PIN DESCRIPTIONS
Pin #
1
Pin Name Signal Type
NC
Description
(do not connect) Reserved. These pins are reserved for future used and must be left open.
2
3
RESOUT
Output
MSM665x Reaet. This pin provides a reset signal for an external speech
synthesis engine.
4
STROBE
Output
MSM665x Strobe. This output provides the LOAD signal for an external
speech synthesizer.
5
BUSY
Input
MSM665x Busy. When using an external MSM665x device, this pin monitors
the MSM665x BUSY signal and connects directly to the MSM665x BUSY
signal output.
6
SI
Output
7
SD
Output
MSM665x Serial Clock. This MSM6679AL-110 output connects to the
MSM665x SI input. The SI pin is the MSM665x serial clock input pin.
MSM665x Serial Data. This MSM6679AL-110 output connects to the
MSM665x SD input. The SD pin is the MSM665x serial data input pin.
8
NC
(do not connect) Reserved. These pins are reserved for future use and must be left open.
9
10
VOICEOUT1
Output
Voice Out. This pin is the PWM output for speech synthesis, voice sample
playback, and voice prompts. An external integrator must be used to convert
this to an analog signal.
11
NC
12
NAR
(do not connect) Reserved. This pin is reserved for future use and must be left open.
Input
MSM665x Next Address Request. This pin signals to the MSM6679AL-110
that the external speech synthesis engine is ready for another command.
13
VDD
Digital Power Power.
14
GND
Digital Ground Ground.
15
NC
Input
Reserved. These pins are reserved for future use and must be tied to VDD.
16
17
18
NC
19
NC
(do not connect) Reserved. This pin is reserved for future use and must be left open.
Input
20
PDC
Input
Reserved. This pin is reserved for future use and must be tied to VDD.
Power down release. Power down mode is released by both edge of PDC
signal.
6
¡ Semiconductor
Pin #
21
MSM6679AL-110 Voice Recognition Processor
Pin Name Signal Type
NC
Description
(do not connect) Reserved. These pins are reserved for future use and must be left open.
22
23
24
25
26
27
28
29
30
RAMPAGE0
31
RAMPAGE1
32
RES
Output
RAM Page Select. These signals support selection of one out of four RAM
pages. Each page is 64kbytes in size.
Input
MSM6679AL-110 Reset. External logic should assert this power-on reset
signal LOW when power is applied to the MSM6679AL-110.
33
NC
Input
Reserved. These pins are reserved for future use and must be tied to VDD.
34
35
VDD
36
NC
Digital Power Power.
Input
38
GND
Ground
39
OSC0
Input
Reserved. These pins are reserved for future use and must be tied to VDD.
37
Ground.
Oscillator 0/External Clock. When the MSM6679AL-110 uses a crystal
oscillator, this input is the oscillator input pin. The pin is then connected to
one side of a crystal and load capacitor. When used with an external clock,
the external clock is applied to this input.
40
OSC1
Output
Oscillator 1. When the MSM6679AL-110 uses a crystal oscillator, this output
is the oscillator output pin. The pin is then connected to one side of a crystal
and load capacitor. When used with an external clock, this output is left
unconnected.
41
VDD
Digital Power Power.
42
NC
(do not connect) Reserved. These pins are reserved for future use and must be left open.
43
44
45
46
SLEEP
Output
Sleep. When power down mode, this pin becomes low. Sleep signal can be
used for external memory control.
7
MSM6679AL-110 Voice Recognition Processor
Pin #
47
Pin Name Signal Type
NC
¡ Semiconductor
Description
(do not connect) Reserved. These pins are reserved for future use and must be left open.
48
49
RDRAM
Output
RAM Read. This is a strobe signal for direct connection to an external RAM's
RD input. When asserted LOW, this signal indicates that the MSM6679AL110 is ready to read data from RAM.
50
WRRAM
Output
RAM Write. This is a strobe signal for direct connection to an external RAM's
WR input. When asserted LOW, this signal indicates that the MSM6679AL110 is ready to write data to RAM.
8
51
D0
52
D1
53
D2
54
D3
55
D4
56
D5
57
D6
58
D7
59
GND
60
A0
61
A1
62
A2
63
A3
64
A4
65
A5
66
A6
67
A7
68
A8
69
A9
70
A10
71
A11
72
A12
73
A13
74
A14
75
A15
Bidirectional Memory Data Bus.
I/O
Digital Ground Ground.
Output
Memory Address Bus.
¡ Semiconductor
Pin #
76
MSM6679AL-110 Voice Recognition Processor
Pin Name Signal Type
NC
Description
(do not connect) Reserved. These pins are reserved for future use and must be left open.
77
78
79
80
VDD
81
VREF
Digital Power Power.
Analog Power Analog Power. The MSM6679AL-110's on-chip A/D converter uses this
Reference Voltage analog power when converting an analog signal into digital samples. Also
this is used as an analog reference voltage.
82
ADC0
83
ADC1
input. Signal conditioning, via a bandpass fillter and gain circuit, is required
84
ADC2
before this input.
85
ADC3
86
ADC4
87
ADC5
88
ADC6
89
ADC7
90
AGND
Analog Input Analog Input. These eight inputs are tied together and serve as the analog
Analog Ground Analog Ground. This pin provides an analog ground point, allowing
independent grounding of the analog and digital circuitry. Separate grounds
reduce the impact of digital switching noise on analog sampling accuracy.
91
RXD
Input
92
TXD
Output
Serial Port Receive. This is the receive data line for serial port.
Serial Port Transmit. This is the transmit data line for serial port.
93
GND
Ground
Ground.
94
NC
(do not connect) Reserved. These pins are reserved for future use and must be left open.
95
96
97
98
99
100
9
MSM6679AL-110 Voice Recognition Processor
¡ Semiconductor
ELECTRICAL SPECIFICATIONS
Absolute Maximum Ratings
Parameter
Digital power supply voltage
Input voltage
Output voltage
Symbol
–0.3 to +7.0
VI
–0.3 to VDD +0.3
VO
VREF
Analog input voltage
VAI
Storage temperature
1.
Value
VDD
Analog power/reference voltage
Power dissipation
Conditions
PD
TSTG
GND = AGND = 0 V
–0.3 to VDD +0.3
Unit
V
–0.3 to VDD +0.3
–0.3 to VREF
Ta = 70˚C, per package
650
Ta = 70˚C, per output
8
—
–50 to +150˚C
mW
˚C
Permanent device damage may occur if ABSOLUTE MAXIMUM RATINGS are exceeded.
Functional operation should be restricted to the conditions as detailed elsewhere in this
data sheet. Exposure to absolute maximum rating conditions for extended periods may
affect device reliability.
Operating Conditions
Symbol
Conditions
Value
Digital power supply voltage
VDD
fOSC = 14.3182 MHz
2.7 to 5.5
Analog power/reference voltage
VREF
—
VDD –0.3 to VDD
Analog input voltage
VAI
—
AGND to VREF
Storage holding voltage
VDDH
fOSC = 0 MHz
2.0 to 5.5
Operating frequency
fOSC
VDD = 2.7 to 5.5 V
14.3182
MHz
Ambient temperature
Ta
—
–30 to 70˚C
˚C
MOS load
20
Parameter
Fan-out
N
TTL load, D0 ~ D7, WRRAM,
RDRAM and SLEEP
TTL Load, all other outputs
10
6
1
Unit
V
¡ Semiconductor
MSM6679AL-110 Voice Recognition Processor
DC Characteristics (VDD = 2.7 to 5.5 V, Ta = -30 to 70˚C)
Parameter
Symbol
High-level input voltage
VIH
Low-level input voltage
VIL
Condition
Rated Value
Unit
Min
Typ [1]
Max
Applied to D0-D7
0.44 × VDD
—
VDD +0.3
Applied to all other I/O
0.80 × VDD
—
VDD +0.3
Applied to D0-D7
–0.3
—
0.16 × VDD
Applied to all other I/O
–0.3
—
0.2 × VDD
VDD –0.4
—
—
VDD –0.4
—
—
—
—
0.5
—
—
0.5
—
—
1/–1
VI = VDD/0 V, applied to RES
—
—
1/–250
VI = VDD/0 V, applied to OSC0
—
—
15/–15
VO = 2.4 V, applied to D0-D7
–2
—
—
VO = 2.4 V, applied to all other I/O
–1
—
—
VO = 2.4 V, applied to D0-D7
10
—
—
VO = 2.4 V, applied to all other I/O
5
—
—
—
±10
—
5
—
—
7
—
During voice input
—
—
4
When voice input is halted
—
—
10
µA
fOSC = 14.3182 MHz, no load
—
—
T.B.D
mA
Output current = –400 mA, applied
to D0-D7, WRRAM, RDRAM and
High-level output voltage
VOH
SLEEP
Output current = –200 mA, for all
other I/O
V
Output current = 3.2 mA, applied
to D0-D7, WRRAM, RDRAM and
Low-level output voltage
VOL
SLEEP
Output current = 1.6 mA, for all
other I/O
VI = VDD/0 V, applied to ADC0-
Input leak current
IIH, IIL
Input current
High-level output current
IOH
Low-level output current
IOL
Output leakage current
ILO
Input capacitance
CI
Output capacitance
CO
Analog reference power
supply voltage
Power consumption
1.
IREF
IDD
ADC7
VO = VDD/0 V
f = 1 MHz, Ta = 25˚C
µA
mA
µA
pF
mA
Typical condition is 3 V 25˚C.
11
MSM6679AL-110 Voice Recognition Processor
¡ Semiconductor
AC Characteristics
External Data Memory Control (VDD = 2.7 ~ 5.5 V, Ta = -30 ~ 70˚C)
Parameter
Symbol
Condition
Min.
Max.
Cycle time
tCYC
—
69.8
—
Clock pulse width (HIGH level)
tfWH
28
—
Clock pulse width (LOW level)
tfWL
28
—
RDRAM pulse width
tRW
190
—
WRRAM pulse width
tWW
190
—
RDRAM pulse delay time
tRD
—
75
WRRAM pulse delay time
tWD
—
75
Address set-up time
tAS
–5.1
—
Address hold time
tAH
29
41
Read data set-up time
tRS
60
—
Read data hold time
tRH
0
—
Read data access time
tACC
—
124
Write data set-up time
tWS
169
—
Write data hold time
tWH
29
41
12
CL = 50 pF
Unit
ns
¡ Semiconductor
MSM6679AL-110 Voice Recognition Processor
Timing Diagram
tCYC
CLK
tfWH
tfWL
RDRAM
tRD
tRW
A0 - A15
RAP0 - 15
tAS
tAH
D0 - D7
DIN0 - 7
tACC
tRH
tRS
WRRAM
tWD
tWW
A0 - A15
RAP0 - 15
tAS
tAH
D0 - D7
DOUT0 - 7
tWS
CLK
WRRAM
RDRAM
A0 - A15
RAP0 - 15
DIN0 - 7
DOUT0 - 7
:
:
:
:
:
:
:
tWH
Clock pulse
RAM write strobe signal
RAM read strobe signal
Memory address bus
RAM address
Read data
Write data
Figure 5. RAM Read/Write Timing
13
14
22 mF
0.22 mF
33
VSS
VDD
XT
XT
AOUT
NAR
BUSY
SI
SD
ST
RESET
Analog
Circuit
0.1 mF
NAR
BUSY
SI
SD
STROBE
RESOUT
ADC0
ADC1
ADC2
ADC3
ADC4
ADC5
ADC6
ADC7
VREF
AGND
VOICEOUT
TXD
RXD
PDC
GND
VDD
D0
D1
D2
D3
D4
D5
D6
D7
A0
A1
A2
A3
A4
A5
A6
A7
A8
A9
A10
A11
A12
A13
A14
A15
SLEEP
WR
RD
RAMPAGE0
RAMPAGE1
OSC0
OSC1
33
FLASH
3
4
2
14.3182 MHz
A0
A1
A2
A3
A4
A5
A6
A7
A8
A9
A10
A11
A12
A13
A14
A15
A16
A0
A1
A2
A3
A4
A5
A6
A7
A8
A9
A10
A11
A12
A13
A14
SRAM
4 MHz
Speaker
Mic
Host MCU interface
0.1 mF
0.1 mF
WR
RD
CS
D0
D1
D2
D3
D4
D5
D6
D7
WR
RD
CS
D0
D1
D2
D3
D4
D5
D6
D7
MSM6679AL-110 Voice Recognition Processor
¡ Semiconductor
SYSTEM CONFIGURATION EXAMPLE
MSM6679AL-110
MSM66P54
Figure 6. MSM6679AL-110 System Configuration Example
¡ Semiconductor
MSM6679AL-110 Voice Recognition Processor
FUNCTIONAL DESCRIPTION
Voice Recognition
The MSM6679AL-110 performs both SI and SD recognition. SI vocabularies are embedded in the
MSM6679AL-110. For SD recognition, each recognized phrase must be enrolled in the
MSM6679AL-110’s vocabulary by creating a composite template from multiple recordings of the
same phrase. Then the composite tempalte is stored in SRAM or FLASH memory. During both
SI and SD recognition, the MSM6679AL-110 performs the following steps:
1. After external band-pass filtering, the MSM6679AL-110 converts the analog signal to PCM
samples.
2. The MSM6679AL-110 extracts significant features from the sample data by frequency and
time-domain analysis.
3. The MSM6679AL-110 compares the analyzed input with the reference data for each signal,
weighing the significance of similarities according to control software parameters. A score
(expressed as distance) is generated for each phrase.
4. The vocabulary phrase that achieves the highest score (or lowest distance) is judged to match
the input phrase, assuming that the score exceeds a predetermined threshold.
5. Via a special command, the MSM6679AL-110 can also return the scores of the input against
all defined vocabulary phrases for SI or SD recognition. This feature allows external host
software to select the next best match, if the closest match is not contextually logical.
SI Recognition
Oki supplies the MSM6679AL-110 with predefined SI vocabularies which Oki builds from
hundreds of utterances by a wide variety of speakers. SI vocabularies are limited to 25 words or
less, which allows the MSM6679AL-110 to achieve a net accuracy of >95%, even in noisy
conditions.
SI vocabularies are grouped into sub-vocabularies of ≤15 words, to maintain the highest
accuracy. Similar words in any one sub-vocabulary can cause substitution errors.
Oki Semiconductor’s standard cellular vocabulary is intended for an automotive environment
with a far-talk microphone. This vocabulary may work adequately in other conditions, such as
an office or outside, but recognition performance may be degraded.
MSM6679AL-110 Cellular SI Recognition Vocabulary
Sub-Vocabulary 1
Sub-Vocabulary 2
Sub-Vocabulary 3
Phrase
Index
Phrase
Index
Phrase
Index
Phrase
Index
Store
1
One
1
Eight
8
Yes
1
Dial
2
Two
2
Nine
9
No
2
Delete
3
Three
3
Zero
Ah
Cancel
3
Directory
4
Four
4
Oh
Bh
—
—
—
—
Five
5
Stop
Ch
—
—
—
—
Six
6
Clear
Dh
—
—
—
—
Seven
7
—
—
—
—
15
MSM6679AL-110 Voice Recognition Processor
¡ Semiconductor
MSM6679AL-110 Control Vocabulary
Sub-Vocabulary 1 Sub-Vocabulary 2
Phrase
Index
A/C
Fan
Phrase
Index
1
Low
1
2
Medium
2
Temperature
3
High
3
Timer
4
Increase
4
Service
5
Decresse
5
Help
6
Set
6
Select
7
Reset
7
—
—
Cancel
8
—
—
Clear
9
—
—
Recall
A
—
—
On
B
—
—
Help
C
MSM6679AL-110 Direction Vocabulary
Sub-Vocabulary 1
Phrase
Index
Up
1
Down
2
Left
3
Right
4
Formard
5
Reverse
6
Faster
7
Slower
8
Start
9
Stop
A
Cancel
B
MSM6679AL-110 Browse Vocabulary
Sub-Vocabulary 1
16
Sub-Vocabulary 2
Phrase
Index
Phrase
Index
Phrase
Index
Phrase
Index
Phrase
Index
Up
1
Next
5
Home
9
Set
1
On
5
Down
2
Previous
6
—
—
Reset
2
Play
6
Left
3
Select
7
—
—
Start
3
Lock
7
Right
4
Cancel
8
—
—
Stop
4
Cancel
8
¡ Semiconductor
MSM6679AL-110 Voice Recognition Processor
MSM6679AL-110 Japanese Navigation Vocabulary
Sub-Vocabulary 1 Sub-Vocabulary 2 Sub-Vocabulary 3 Sub-Vocabulary 4
Phrase
Index
Genzaichi
Jiaku
Phrase
Index
Phrase
Index
Phrase
Index
1
Ue
1
Hyoujun
1
Hai
1
2
Shita
2
Kakudai
2
Iie
2
Kaisya
3
Hidari
3
Shukushou
3
Ofu
3
Houi
4
Migi
4
Zentai
4
—
—
Sentaku
5
—
—
Kaiten
5
—
—
Yuudou
6
—
—
Kyori
6
—
—
Nabi
7
—
—
Hosei
7
—
—
—
—
—
—
Teisei
8
—
—
MSM6679AL-110 Japanese Cellular Vocabulary
Sub-Vocabulary 1
Sub-Vocabulary 2
Phrase
Index
Phrase
Index
Phrase
Index
On
1
Ichi
1
Kyuu
9
Ofu
2
Ni
2
Zero
A
Daiyaru
3
San
3
Sharp
B
Tansyuku
4
Yon
4
Star
C
Denwacho
5
Go
5
Kakunin
D
Kakunin
6
Roku
6
Touroku
E
Nabi
7
Nana
7
Rei
F
—
—
Hachi
8
—
—
MSM6679AL-110 German Cellular Vocabulary
Sub-Vocabulary 1
Phrase
Index
Speichern
Wählen
Sub-Vocabulary 2
Sub-Vocabulary 3
Phrase
Index
Phrase
Index
1
Eins
1
Neun
2
Zwei
2
Null
Löschen
3
Drei
3
Notruf
B
Name
4
Vier
4
Wählen
C
Fünf
5
Löschen
D
Sechs
6
Raute
E
Sieben
7
Stern
F
Acht
8
Phrase
Index
9
Ja
1
A
Nein
2
Löschen
3
SI vocabulary generation starts with collecting reference utterances from ≥400 speakers with:
• An equal mixture of males and females
• Accents from all regions of the country of intended use
• ~15% non-native speakers.
17
MSM6679AL-110 Voice Recognition Processor
¡ Semiconductor
The samples should be generated from a randomly-ordered list, with each word spoken twice
and with a dummy word at the beginning and end. There must be >2 sec between each sample
for accurate data processing. To provide the audio fidelity required for high-quality recognition
training, a DAT recorder, together with the microphone that will be used in the final application,
is required. To ensure data integrity, data is submitted to Oki after collecting samples from the
first 20 speakers for initial screening. If acceptable, then the remaining collection may proceed.
If substitution errors are possible, collection of spare words during initial collection is
recommended. For example, alternate words to “Stop” and “Top” could be “Halt” and “First.”
Collections should contain a wide variety of the background sound conditions that will exist
during actual usage. For example, if the collection is for use in an automobile, conditions such
as vehicle speed, road conditions, various window opening positions, heater or AC blower
speeds and radio volumes should be varied during the collection. The signal-to-noise ratio
should be maintained at ≥ 20dB.
To achieve high accuracy rates, phrase selection, data collection, background initialization
strategy, and control software need careful consideration. There are no published standards for
recognition accuracy.
Oki defines accuracy by:
Accuracy = 100% - ERATE
ERATE = ESUB + 1/2 EREJ
with the following definitions:
Parameters for Recognition Accuracy
Name
Symbol
Condition
Substitution Error
ESUB
Most critical type error, e.g., Say "Five", recogrize "Nine"
Rejection Error
EREJ
Word not recognized, opportunity for operator to repeat
Gap Error
EGAP
Word spoken before recognizer ready
Time-Out Error
ETME
Word length is too long
Spurious Response Error
ESPU
Sourd or imvalid word classfied as a valid word
(i.e., drop handset or speak wong word)
A typical target accuracy of 97% is achieved with a 3% ERATE , composed of a 1.5% ESUB rate and
a 3%EREJ rate.
SD Recognition
In SD recognition mode, the MSM6679AL-110 can be trained to recognize up to 61 words. The
MSM6679AL-110 can support multiple speakers by switching vocabularies, but only one
speaker’s vocabulary should be active at one time.
The end user enrolls a phrase in the MSM6679AL-110’s vocabulary by recording the phrase three
times or more. The host Micro Controller Unit (MCU) controls the number of times each phrase
in enrolled. Generally, higher recognition accuracy is achieved with each additional enrollment.
The word set is made more robust by pronouncing each phrase slightly differently during initial
enrollment.
In addition to enrollment training, adaptive template updating can drive the accuracy towards
100%. The host MCU updates templates by first asking the speaker to confirm a recognized
phrase with a “yes” or “no” response, and subsequently updating the template for corresponding
words. The use of name tags (see next paragraph) facilitates this process.
18
¡ Semiconductor
MSM6679AL-110 Voice Recognition Processor
Name Tag Recording
To facilitate SD recognition, the MSM6679AL-110 supports recording and playback of name tags.
Name tags are used to confirm correct responses in SD recognition. For example, in a phone
dialer application, the user associates a “name” (which is recorded into memory) with a phone
number. The MSM6679AL-110 then plays back the name tag so that the user can verify that the
recognized phrase is the correct one.
The VRP stores names tags in memory using an ADPCM compression algorithm with 28 kbps
of speech. The length of a name tag is controlled with a command from the users host MCU
program. The maximum number of name tags possible is 61, but the actual number is dependent
upon record time and memory available. See the section on memory interface for more detail.
Audio Input Interface
A critical item for high-accuracy speech recognition is correct design of the audio input circuit.
A circuit with appropriate gain and frequency responses must be placed between the microphone
and MSM6679AL-110’s A/D input. Oki recommends input gain and a band pass filter with the
following characteristics:
• Four pole Chebyshev high-pass filter, 3 dB point at 225 Hz
• Dual-pole low-pass filter, 3 dB point at 4250 Hz
• Midband gain of 46 dB at 1000 Hz
The above gain and filter characteristics are obtained by using a rail-to-rail quad CMOS op-amp
and one-half supply rail splitter to bias the input signal at 1/2 VDD nominal.
The MSM6679AL-110 uses multiple analog inputs to improve sampling quality. An on-chip
analogy to digital (A/D) conversion unit transforms the analog signal to a digital data stream.
Audio Output Interface
The MSM6679AL-110 also provides the VOICEOUT1 PWM output. The MSM6679AL-110 uses
ADPCM to generate voice or sound-effect output. ADPCM represents an improvement over
conventional PCM techniques in that it adaptively changes the quantizer step (scale factor) to suit
the waveform being encoded. The result is more efficient memory usage with no loss of quality.
Careful selection of the components for internal and external output filters and amplifiers is
recommended. An incorrect choice would impair the original quality. This consideration equally
includes:
• Careful separation of analog and digital lines
• Grounding of analog lines at both ends
• Further adequate separation from high-speed digital circuits to avoid distortions thereof
Memory Interface
The memory control section manages RAM and/or ROM devices in two 64-Kbyte memory
spaces, in conjunction with internal memory for voice templates and working memory. Some
versions work with no external memory, some have some external RAM, some use only external
EPROM, and some use external memory in conjunction with both internal ROM and RAM. The
MSM6679AL-110 requires a minimum of 32 Kbytes SRAM and 16 Kbytes ROM.
The following table shows vocabulary sizes and playback facilities for various configurations.
19
MSM6679AL-110 Voice Recognition Processor
¡ Semiconductor
Typical Configurations
Recognition MSM6679AL-110
Application
Controller
Vocabulary Sound Playback
(Words)
(sec) [1]
MSM665x MSM6679AL-110 MSM6679AL-110
Playback
Speech
Speech
Interface
Record
Playback
SI
SD
25
61[2]
2.3
9.2
OK
—
OK
50
61[2]
2.3
—
OK
—
OK
Internal External
Memory Size
(bytes)
EPROM Flash SRAM
64K
—
32K
—
128K
32K
25
61
2.3
27.6
OK
OK
OK
Telephone
50
61
2.3
18.4
OK
OK
OK
Dialer
75
61
2.3
—
OK
OK
OK
100
61
2.3
—
OK
OK
OK
61[3]
61
2.3
36.8
OK
—
OK
—
—
64-384K
12
61[2]
1.15
OK
—
—
16K
—
32K
Computer
Peripheral
Minimum
Configuration
1.
2.
3.
Phrase chaining features usually permit much longer overall playback durations; not
including external speech synthesizer.
SD recognition vocabularies are volatile in these configurations.
Per download. Vocabulary swapping by host permits unlimited vocabulary size.
The MSM6679AL-110 supports 32 Kbytes of RAM, and up to 64 Kbytes of ROM (EPROM or
Flash) per bank in separate memory spaces.
For accessing the ROM and RAM address spaces, the MSM6679AL-110 provides the separate
Write RAM (WRRAM) and Read RAM (RDRAM) signals. The RDRAM signals connect directly
to Output Enable (OE) control signal inputs on the RAM and ROM, respectively. The WRRAM
signal connects directly to the Write Enable (WE) control signal input on the RAM.
20
¡ Semiconductor
MSM6679AL-110 Voice Recognition Processor
FLASH
SRAM
00000
Reserved
04A00
Default Working SD
Templates
05480
Working Name Tag
Pointer Table
05700
Alternate SD Templates
08000
SI First (F509*)
07300
SD First
07D80
NTP First
Name Tag Block Address
08000
000
10000
100
Name Tag Data
18000
SI Last
(F501*)
200
1F300
SD Last
1FD80
NTP Last
1FFFF
2F6
2FB
2FF
*Denotes commands to select blocks
Figure 7. MSM6679AL-110 External Memory Map
External Voice Synthesis Control
The MSM6679AL-110 is capable of interfacing to the MSM665x family of Oki ROM, OTP, or
external EPROM speech synthesizers, allowing for up to 260 seconds of high-quality voice and
sound effects. The following table indicates the speech capabilities of the MSM665x family.
MSM665x Family Characteristics
Type
Maximum Speech Duration[2]
Data ROM
Capacity[1] fSAM = 4.0 kHz fSAM = 6.4 kHz fSAM = 8.0 kHz fSAM = 16.0 kHz fSAM = 32.0 kHz
MSM6650
64 Mbits[3]
>1 hour
>40 minutes
MSM6652
288 Kbit
16.9 sec
MSM6653
544 Kbit
31.2 sec
MSM66P54[4]
1 Mbit
MSM6654
1 Mbit
>30 minutes
>15 minutes
>8 minutes
10.5 sec
8.4 sec
4.2 sec
2.1 sec
19.5 sec
15.6 sec
7.8 sec
3.9 sec
63.8 sec
39.9 sec
31.9 sec
15.9 sec
7.9 sec
63.8 sec
39.9 sec
31.9 sec
15.9 sec
7.9 sec
MSM6655
1.5 Mbit
96.5 sec
60.3 sec
48.2 sec
24.1 sec
12.0 sec
MSM66P56[5]
2 Mbit
129.1 sec
80.7 sec
64.5 sec
32.2 sec
16.1 sec
MSM6656
2 Mbit
129.1 sec
80.7 sec
64.5 sec
32.2 sec
16.1 sec
MSM6658
4 Mbit
258 sec
161.4 sec
129.1 sec
64.5 sec
32.2 sec
1.
Actual ROM area in MSM6652, MSM6653, MSM6654, MSM6655, and MSM6656, MSM6658,
MSM66P54, MSM66P56 is smaller by 22 Kbits.
21
MSM6679AL-110 Voice Recognition Processor
2.
3.
4.
5.
¡ Semiconductor
Longer speech patterns can be created by chaining and repeating existing speech samples.
Via external ROM only (no on-chip ROM available).
One-Time-Programmable (OTP) version of MSM6654. See the MSM66P54 data sheet for
more information.
One-Time-Programmable (OTP) version of MSM6656. See the MSM66P56 data sheet for
more information.
The MSM665x interface consists of the following signals:
• BUSY - Asserted LOW during MSM665x device playback. The MSM6679AL-110 F50Bh and
F10100xxh commands select this signal for MSM665x command polling.
• NAR - Next Address Request status signal. By default, the MSM6679AL-110 uses this signal
to poll commands to the MSM665x. The F51Bh, F480h, and F440h commands select NAR for
polling.
• SI - Serial Input Clock.
• SD - Serial Data Out.
• STROBE - Initiates speech synthesis.
• RESOUT - Initializes device when asserted LOW. The MSM6679AL-110 F480h command
generates this signal.
Serial Interface
The MSM6679AL-110 supplies a serial interface suitable for connection to an RS-232C serial port
buffer or equivalent. The serial interface uses one MSM6679AL-110 input (RXD) and one
MSM6679AL-110 output (TXD). The interface operates at 9600 Baud with:
• 8 data bits
• 1 start bit
• 1 stop bit
• No parity
• No handshake
A host processor sends serial ASCII commands to the MSM6679AL-110 and receives serial ASCII
responses based on voice input responses.
22
¡ Semiconductor
MSM6679AL-110 Voice Recognition Processor
MSM6679AL-110 SLAVE-MODE API
This section describes the slave-mode Applications Protocol Interface (API) between a host MCU
and the MSM6679AL-110. The slave-mode API offers the following features:
• Direct slave-mode control voice recognition, sound recording and playback, and sound
synthesis
• Serial port interfaces
• Simple procedures for downloading and uploading data
• ASCII format
• Comprehensive return codes and error reporting
The host MCU selects the active speech recognition vocabulary, speech responses, and controls
all actions required to implement an interactive voice response system. The MSM6679AL-110
performs speech recognition, based on the vocabulary selected by the host, and returns digital
codes representing the most probable match of the current utterance to an individual utterance
in the selected vocabulary. The MSM6679AL-110 can also respond with “name tags.” Name tags
can be fixed words, phrases or sound effects, or can be words, phrases or sound effects that have
been interactively recorded by the user.
The API supports serial interface. The MSM6679AL-110 returns each response using the same
interface through which the most recent message was received. The user can thus connect and
use both interfaces.
For all messages, the serial interface represents each 8-bit value with two hexadecimal digits
coded in ASCII. When downloading and uploading data, the MSM6679AL-110 uses a stream of
8-bit binary values.
The serial-mode interface uses a 9600-baud UART with 1 start bit, 8 data bits, and 1 stop bit. There
is no parity or handshaking. Serial-interface messages are of variable length, but consist of an
even number of bytes. The serial interface echoes all received ASCII characters immediately back
to the host MCU.
Messages are of variable length. All messages consist of an even number of bytes. Opcodes
consist of exactly four bytes, with values between F000h and FEFEh. Operand bytes may take
values from 0000h to FFFFh. The MSM6679AL-110 issues a return code for many of the host
commands. The return code generally consists of the same opcode, followed by data indicating
success of failure of the operation.
Opcodes are organized into the following categories:
• Purge
• Set parameter
• Initialize
• Recognize
• Speak
• Request
• Record
• SD recognition control
The following tables summarize available opcodes and provide detailed descriptions of the
opcode functions.
23
MSM6679AL-110 Voice Recognition Processor
¡ Semiconductor
Command Summary
Function
Opcode (Hex)
Description
Default (Hex)
Purge
F000
Clear MSM6679AL-110 input stack
—
Set parameter
F102 xxxx
F103 xxxx
F104 xxxx
F12x
F130 xxxx
Set SP/SI origin to xxxx.
Set SD origin.
Set triggering origin.
Set SD SP table to table x.
Select triggering table.
8000
4A00
F100
F123
0101, 0202...
F2xx mod 80
F2xx mod 40
F2xx mod 20
F2xx mod 10
F2xx mod 8
F2xx mod 4
F2xx mod 2
F2xx mod 1
Initialize background estimation.
Wait for F3h command after each response.
Beep after each triggered utterance
Reserved
Set speech response level to default.
Send acknowledge after each speech output response.
Only detect triggers.
Initialize SD parameter table and name tags.
Disabled.
Disabled.
Disabled.
Disabled.
Enabled.
Enabled.
Disabled.
Load from first
FLASH.
F300
F301 to F33F
F340
F341
Stop listening (recognition).
Start SI recognition.
Start SD recognition.
Sort SD recognition distances, return index to utterance with
least distance.
Update SD enrollment.
Request recognition parameter upload to host.
Sort SD recognition distances, return index and distance to
utterance with least distance
Sort SD recognition distances, return all distances.
Sort SD recognition distances, return minimum and
maximum energy values.
Sort SD recognition distances, return all energy values and
distances.
—
—
—
—
Play back name tag from external memory.
Play back sound from internal memory.
Play 50-ms beep.
Pause for 0.2 sec.
Initialize MSM665x IC, set MSM665x busy mode OFF, select
FLASH SI recognition.
Play back one of 127 phrases in external MSM665x device.
Set MSM665x busy mode ON.
Set 6654 NAR mode
Set output volume (03h = minimum, FEh = maximum).
—
—
—
—
—
Status request.
Select last FLASH bank for SI recognition.
Select download RAM bank for speaker independent/signal
processing (SI/SP) template area.
Set MSM6679AL-110 power down mode.
—
F509
F509
Initialize
Recognize
F342
F343
F344
F351
F361
F371
Speak
F401 to F43D
F441 to F47C
F47E
F47F
F480
F481 - F4FF
F50B
F51B
FE03 to FEFE
Request
F500
F501
F510
F520
24
—
—
—
—
—
—
—
OFF
ON
FE80h
—
¡ Semiconductor
Function
Request
Record
SD
Recognition
Control
MSM6679AL-110 Voice Recognition Processor
Opcode (Hex)
Description
Default (Hex)
F502....
F504
F505
F506
F507
F517
F508
F518
F509
Download/upload.
Retrieve MSM6679AL-110 firmware revision.
Initialize background (BG) noise level.
Retrieve vocabulary and trigger table revision number.
Save SD templates from download RAM to first FLASH.
Save SDR templates in last FLASH. (4A00-547B→F300-FD7F)
Recall SD templates from first FLASH to download RAM.
Get SDR Templates from last FLASH (F300-FD7B→4A00-547B)
Select first FLASH bank for SI recognition.
—
414C
—
3039
—
—
—
—
F509
F101 00xx
F105
F106
F50A
F50C
F51C
0051
0000
01FF
—
—
—
F50D
F51D
F50E
F50F
FA01 ~ FA3D
Set name tag length, set MSM665x busy mode ON.
Set name tag record origin
Set name tag record end
Clear name tag table in SRAM (5480 - 56FF).
Recall last saved name tag table.
Recall name tag pointers from last FLASH
(FD80-FFFF→5480-56FF)
Save name tag table from SRAM to FLASH.
Save name tag pointers in last FLASH (5480-56FF→FD80-FFFF)
Set record volume high.
Set record volume normal (default).
Record name tag 01h - 3Dh.
F6xx
F9xx
FB00
FC00
F521
Set SD pointer to segment xxh.
Search for SD utterance xxh.
Enroll SD utterance selected by search command (F9xx).
Erase utterance from SD vocabulary.
Clear SDR table (4A00 - 547B)
—
—
—
—
—
—
—
F50F
F50F
—
Response Summary
Command
Result after
Parameter Set
Operands
F101h 00 tm
F102h AdH AdL
F103h AdH AdL
F104h AdH AdL
F12Xh
Description
Record time = tm*14 msec.
High and low bytes of SP/SI origin address.
High and low bytes of SD origin address.
High and low bytes of triggering origin address.
SP table Xh selected.
F280h
F240h
F220h
F210h
Initialization
Acknowledgment F208h
F204h
F202h
F201h
Invalid message received.
Sample data over-run. [1]
32-Kbyte block boundary violation error.
Unclassified download/upload error.
Divide-by-zero error.
Select/jump error.
Invalid SP header or table.
Reserved.
Speech Ack
Speech acknowledgment. [2]
F400h
25
MSM6679AL-110 Voice Recognition Processor
¡ Semiconductor
Response Summary (Continued)
Command
Status [3]
SI Recognition
Result [5]
Operands
Description
F500h
F501h
F520h
F540h
F560h
F580h
F5A0h
F5C0h
F5F0h
MSM6679AL-110 ready.
Operation complete.
Operations complete; MSM6679AL-110 disabled (vocabulary 0).
MSM6679AL-110 waiting for start command.
MSM6679AL-110 waiting for end trigger.
MSM6679AL-110 processing recognition.
Download/upload in progress. [4]
Download/upload complete.
Speak output in progress.
F600h
F6Utt
F6 Utt Dst1H Dst1L...DstNH DstNL
F6 Utt EminH EminL EmaxH EmaxL
F6 Utt Dst1H Dst1L...DstNH DstNL
EminH EminL EmaxH EmaxL
Aborting SI listen mode.
Utt = utterance ID.
Utterance ID, high/low byte of distance to utterance 1...utterance N.
Utterance ID, high/low byte of min. and max. energy value,
Utterance ID, high/low byte of distance to utterance 1...utterance N,
high/low byte of minimum energy value, high/low byte of
maximum energy value.
Trigger detection code (see init command).
Rejection: utterance too loud.
Rejection: utterance too long.
Rejection: utterance begins too soon.
Rejection: bad signal/noise ratio.
Rejection: reason uncertain.
F63Ah
F63Bh
F63Ch
F63Dh
F63Eh
F63Fh
F700h
F73Eh
F73Fh
F740h
F341h F7Utt
SD Recognition F344h F7Utt DstH DstL
F351h F7Utt Dst1H Dst1L...
Result
DstNH DstNL
F361h F7Utt EminH EminL
EmaxH EmaxL
F371h F7Utt Dst1H Dst1L...
DstNH DstNL
EminH EminL EmaxH EmaxL
Aborting SD Listen mode. After SD utterance search: not found.
Rejection.
Sort completed. After SD utterance search: empty.
Rejection: MSM6679AL-110 SD memory full/empty. After SD
utterance search: in use.
Utt = Utterance ID triggered.
Utterance ID, high/low byte of distance.
Utterance ID, high/low byte of distance to utterance 1...
utterance N.
Utterance ID, high/low byte of minimum energy value,
maximum energy value.
Utterance ID, high and low byte of distance to utterance 1...
distance to utterance N, high and low byte of minimum energy
value, maximum energy value.
Vector Upload
F743h 0000h
Upload failure.
F743h NH NL V1H V1L...VNH VNL High/low bytes of length of vector, V, high/low byte of first V...Nth V.
Trap Error
Codes
F801h
F802h
F804h
F808h
F810h
F820h
F840h
F880h
Record Response FA00
26
Reserved.
Invalid SP header or table.
Select/jump error.
Divide-by-zero error.
Unclassified download/upload error.
Memory full; 32-Kbyte block boundary violation error.
Sample data over-run. [1]
Invalid message received.
Record complete.
¡ Semiconductor
1.
2.
3.
4.
5.
MSM6679AL-110 Voice Recognition Processor
Sample data overrun issued when real-time SP in Listen mode cannot keep up with
incoming samples, i.e., if the A/D signal input routine overwrites a sample data buffer
before it is fully processed.
This acknowledge is sent only if Init command 1111 0010 xxxx x1xx (F2 xxxx x1xx) is set
to enable acknowledgments.
These messages are sent in response to a request command (F5XYh) from the host.
Upload/download in progress, acknowledging load request immediately before data
transfer. If in response to an N-byte download request, the MSM6679AL-110 then receives
N bytes (if N is even, or N+1 if N is odd) of data from the host. If N is odd and N+1 bytes
are received, only N bytes are written to MSM6679AL-110 memory. If in response to an
upload, the MSM6679AL-110 then sends N bytes (if N is even, or N+1 if N is odd) of data
to the host.
If an utterance was recognized, XYh is the utterance identity or class number, and
additional parameters may be appended, if requested in the SI Recog (F3XYh with X=0...3)
command. Otherwise, XYh indicates various results as detailed.
27
MSM6679AL-110 Voice Recognition Processor
¡ Semiconductor
Command Descriptions
Purge
Operand
F000
Description
Purge MSM6679AL-110 Input Stack. This command clears
the MSM6679AL-110 input stack of commands that are
waiting to be executed. Commands already in progress, such
as a pending MSM6654 poll action, are not affected. It does
not affect the MSM6679AL-110 output stack.
Return Values
None
Set Parameter
Operand
F102h XXYYh
Description
Set SP/SI Recognition Origin. Prior to SD or SI recognition,
address pointers must be set to point at the SP or SI
recognition parameter tables.This command sets the starting
address of SP and SI recognition parameter tables.
This address is the location of the first word of a header that
contains pointers to one or more individual SP/SI tables.
XXYYh = high (XXh) and low (YYh) bytes of requested
address. The MSM6679AL-110 uses and returns an even
address outside the MSM6679AL-110 work space that is as
near as possible to the requested address.
Leave this parameter at its default value unless you are using
an Oki custom SI vocabulary and are instructed to alter SP/SI
recognition origin.
Default SP/SI origin: 8000h
Return Values [1]
F102h XXYYh = High (XXh) and
low (YYh) bytes of resultant
address.
If a valid header is not found at
the resultant address, the
MSM6679AL-110 immediately
sends response code:
F802h = Invalid SP/SI header.
F103h XXYYh
Set SD Recognition Origin [2]. This command sets the SD
origin address at the starting address of the current SD
recognition parameter table. This command may be used to
select among mul-tiple RAM-resident SD vocabulary tables.
XXYYh = high (XXh) and low (YYh) bytes of requested
address. The MSM6679AL-110 uses and returns an even
F103h XXYYh = high (XXh) and
address outside the MSM6679AL-110 work space that is as
low (YYh) of resultant address.
near as possible to the requested address.
Leave this parameter at its default value unless you are using
an Oki custom vocabulary and are instructed to alter SD
recognition origin.
The table length is 0A7Ch bytes.
Default SD origin: 4A00h
F104h XXYYh
Set Triggering Origin. This command sets the starting
address of triggering parameter tables.
This address is the location of the first word of a section of
data memory containing one or more contiguous triggering
parameter tables.
XXYYh = high (XXh) and low (YYh) bytes of requested
F104h XXYYh = high (XXh) and
address. The MSM6679AL-110 uses and returns an even
low (YYh) bytes of resultant
address outside the MSM6679AL-110 work space that is as
address.
near as possible to the requested address.
Leave this parameter at its default value unless you are using
an Oki custom SI vocabulary and are instructed to alter
triggering origin.
Default triggering origin: F100h.
28
¡ Semiconductor
MSM6679AL-110 Voice Recognition Processor
Set Parameter (Continued)
Operand
Set SD Recognition SP table. This command sets the SP
parameter table number to be used in processing speech
input during SD Recognition. The MSM6679AL-110 selects
SP table number Z, where Z is the nearest valid value to Y. By
default, the MSM6679AL-110 selects SP table 3 until this
command is issued. This command selects SP parameters
only, and does not select among multiple RAM-resident SD
vocabulary tables, which can be independently selected by the
Set SD Origin command (F103h).
After setting the table number and returning the resultant
value, the MSM6679AL-110 checks the validity of the SP
header. If the header is invalid, an error message is returned.
Set this value to (NSI +1), where NSI is the number of SI
subvocabularies.
Default SP table: 3.
F12Yh
F130h VN TN
1.
2.
Description
Select Triggering Table. This command selects triggering
table TN for use with SP table VN. Valid values for VN and TN
are between 01h and 0Fh.
Leave this parameter at its default value unless you are using
an Oki custom SI vocabulary and are instructed to alter the
triggering table.
Return Values [1]
F12Z = SP table Z selected.
If the SP header is invalid, a
second message follows:
F802h = Invalid SP header.
F130h f(VN) f(TN) = Triggering
table selected.
Default = 0101, 0202, 0303...
Return value is actual parameter value which may not equal the set parameter value.
See also F6XY
29
MSM6679AL-110 Voice Recognition Processor
¡ Semiconductor
Initialize
F2xx Bit Power-On/
Values
Reset Value
Action
Return Value
After power-on, the MSM6679AL-110's mode corresponds to that after issuing a F20C command.
This mode may NOT be the optimum condition for most situations, so the user is advised to carefully understand
the desired condition and develop a suitable command for the application at hand.
In addition, ensure that unwanted bits do not get set or reset when attempting to set individual conditions. The
conditions selected are based on the XXh values associated with the last F2 command issued.
1xxx xxxx
x1xx xxxx
30
Cleared
Cleared
Background Noise Initialization. When set to 1, the MSM6679AL110 starts a 500-ms background noise initialization. When set to
0, the MSM6679AL-110 does not perform background noise
initialization.
The MSM6679AL-110 requires this command prior to recognition
for noise vector subtraction during the utterance sampling period.
Use the background initialization command whenever there is a
change in the background noise level. For example, sample the
noise signature in a vehicle at rest and moving at 35 MPH with its
windows rolled down. The quality of a phone line connection can
also vary from call to call.
The host MCU must implement a strategy as to when to issue a
background initialization command. In a vehicle, the host MCU
could monitor the vehicle speed, fan speed, radio volume, etc.
Alternatively, the host MCU could issue this command each time a
new recognition session starts or a new line connection is
established. However, the 0.5-sec sample period could degrade
system responsiveness if used too frequently. A zero in this bit
location during the F2XXh command will not cause an
initialization. The F505h command causes the same initialization
sequence.
Wait for Recognition Command/Auto Restart SI Recognition.
When set to 1, the MSM6679AL-110 waits for a recognition
command after each response. When set to 0, the MSM6679AL110 auto-restarts SI recogni-tion after each response.
This bit should be set to 1 when an action is to be taken
immediately after an utterance. Auto-restart recognition is the
desired mode during digit string recognition, automated tape
testing of digits, or in demonstrations where continuous
recognition is desired.
F501 = Background
initialization
complete
F2XY = Initialization
acknowledge. [1]
F2XY = Initialization
acknowledge. [1]
¡ Semiconductor
MSM6679AL-110 Voice Recognition Processor
Initialize (Continued)
F2xx Bit Power-On/
Values
xx1x xxxx
xxxx 1xxx
xxxx x1xx
xxxx xx1x
xxxx xxx1
1.
Reset Value
Action
Return Value
Cleared
Beep After Each Voice Trigger. When set to 1, the MSM6679AL110 beeps after each voice trigger. When set to 0, the MSM6679AL110 does not beep after each voice trigger. These beeps do not
cause a F400h message to be issued to the host MCU.
When set to 1, the MSM6679AL-110 beep can help a user avoid
speaking before the MSM6679AL-110 is ready. This mode is
normally used with a digits vocabulary to pace the user and
confirm each utterance reception.
F2XY = Initialization
Instead of using beeps, an external MSM665x speech synthesizer acknowledge. [1]
can repeat digits as they are recognized. However, some users find
the number repetition annoying. Therefore, firmware could repeat
digits during initial usage and switch to beep mode later. Typically,
performance improves with time as users learns to speak with the
correct enunciation and volumes. The MSM6679AL-110 in this
case trains the user. Note that the host MCU can also make the
MSM6679AL-110 beep with the F47Eh command.
Set
Set Output Volume. When set to 1, VOICEOUT1 sound output level
is set to half of full volume (80h). When set to 0, voice output level
is unaffected.
MSM6679AL-110 sound output volume can also be set at any
F2XY = Initialization
level on a continuous scale from 00h to FEh (low to high) with the acknowledge. [1]
FEXXh command. The MSM665x speech synthesizer has four
discrete sound output volumes, corresponding to 0h - 20h, 21h 40h, 41h - 80h, and 81h - FEh.
Set
Send Response Code After Sound Output. When set to 1, the
MSM6679AL-110 issues an acknowledge response (F400h) when
sound output is completed. When set to 0, the MSM6679AL-110
F2XY = Initialization
does not issue an acknowledge response when speech response is acknowledge. [1]
completed. Automatic beeps after voice triggers do not cause an
F400h command to be issued.
Cleared
Trigger Detection Only. When set to 1, the MSM6679AL-110 does
not sort SI vocabularies for the best match, instead returning
F63Ah code when an utterance has been detected. When set to 0,
normal recognition is performed.
When this bit is set to 1, the host MCU can use the F343h
command to upload the recognition parameter vector, so that the
host can perform independent processing.
F2XY = Initialization
acknowledge. [1]
Cleared
Clear SD Recognition and Name Tag RAM. When set to 1, the
MSM6679AL-110 initializes the SD parameter table. When set to
0, existing SD parameters are preserved.
After this bit is set to 1, all SD training and name tag pointers are
erased. Use this command to start training for a new user. If the
old name tags are to be retained, the F50Ch command can recall
old name tags from FLASH.
To set up for a blank SD and name tag table at the next power-on,
issue the command sequence F201h F507h.
F2XY = Initialization
acknowledge. [1]
See the Response Summary table earlier in this section for a complete description of the
XY codes in initialization acknowledgment messages.
31
MSM6679AL-110 Voice Recognition Processor
¡ Semiconductor
Recognize
Opcode
F300h
Action
None
Stop Listening. This command causes the
MSM6679AL-110 to exit SI or SD Listen mode, F600h
whichever was active.
F700h
Start SI Listen Mode. For all the following
opcodes, the MSM6679AL-110 per-forms SI
recognition on incoming utterances, using SI
vocabulary Y. The vocabulary Y is identified by
one of 15 sets, thus Y = 1h ~ Fh.
Aborting SD Listen mode.
F600h
Aborting SI Listen mode.
F63Ah
Trigger detection code
(see Initialization command).
F63Bh~F63Fh Rejection.
Invalid signal processing table.
Sample data overrun.
Return recognized phrase using
vocabulary number Y.
F6h Utt
Utterance ID in vocabulary Y.
Return recognized phrase and
distance table for vocab Y.
Utterance ID in vocabulary Y, high and
F6h Utt
Dst1H Dst1L... low byte of distance to utterance 1...
DstNH DstNL distance to utterance N.
Return recognized phrase and energy
value for vocab Y.
Utterance ID in vocabulary Y, high and
F6h Utt
EminH EminL low byte of minimum and maximum
EmaxH EmaxL energy val-ue.
Return recognized phrase, distance
table, and energy value for vocab Y.
F6h Utt
Dst1H Dst1L...
DstNH DstNL
EminH EminL
EmaxH EmaxL
Utterance ID, high and low byte of
distance to utterance 1...distance to
utterance N, high and low byte of
minimum and maximum en-ergy value.
F740
Triggered.
F700
Abort SD Listen mode.
F73E
Rejection.
F73F
Memory empty.
F802
Invalid SP table.
F840
Sample data overrun.
F73Fh
Abnormal response:
Memory empty.
F341h
Return recognized phrase for vocab
Y. This command can be issued
several times to yield first, second,
third best, etc.
F7h Utt
Utt = Utterance ID.
F344h
Return recognized phrase and
distance for the current vocabulary.
F7h Utt
DstH DstL
Utt = index of recognized phrase, DstH
DstL = high/low bytes of distance from
nearest phrase.
F351
Return recognized phrase and
distance table for vocab Y.
F7h Utt
Utterance ID, high and low byte of
Dst1H Dst1L...
distance to utt. 1...N.
DstNH DstNL
F361h
Return recognized phrase and energy
value for vocab Y.
F7h Utt
Utterance ID, high and low byte of
EminH EminL
minimum and maximum energy value.
EmaxH EmaxL
F32Yh
F33Yh
Start SD Listen Mode. When an utterance is
captured, it is analyzed and converted to a
"recognition parameter vector." The host may
then command the MSM6679AL-110 to use
this vector in various ways (e.g., Sort, Update,
or Recognition Vector Upload).
SD Recognition Sort. These commands sort
the distances between the recognition
parameter vector and the reference vectors for
the utterances in the current SD vocabulary.
32
Aborting SI Listen mode.
F840h
F301h F33Fh F31Yh
F341h,
F344h,
F351h,
F361h,
F371h
MSM6679AL-110 was not in Listen mode.
F802h
F30Yh
F340h
Return Value
¡ Semiconductor
MSM6679AL-110 Voice Recognition Processor
Recognize (Continued)
Opcode
Action
F341h,
F344h,
F351h,
F361h,
F371h
F371h
F342h
Update SD Recognition Enrollment. This
command updates enrollment on utter-ance
Utt, immediately after a "F7h Utt" response to
the Sort SD Distances command (F341h).
Alternatively, the utterance to be updated can
be selected by the SD Search command
(F9XYh).
This command uses the recognition parameter
vector from the most recently captured
utterance, and does not start SD Listen mode.
Generally, update should be performed only if
correct utterance identify is confirmed by the
user.
F343h
Return recognized phrase, distance
table, and energy value for vocab Y.
Recognition Vector Upload. Request
recognition parameter vector upload to host.
Return Value
F7h Utt
Dst1H Dst1L...
DstNH DstNL
EminH EminL
EmaxH EmaxL
Utterance ID, high and low byte of
distance to utterance 1...distance to
utterance N, high and low byte of
minimum and maximum energy value.
F740h
Update complete.
F743h NH NL V1H V1L... VNH VNL = Success, where
NH/NL = high/low bytes of N, N = Length of recognition
parameter vector V, V1H/V1L = high/low bytes of first
element of V, VNH/VNL = high/low bytes of Nth element.
F743h 00 00
Failure.
Speak
Opcode
Action
Return Value
Speak Phrase from External Memory. This
command causes the MSM6679AL-110 to play
back a name tag from external memory. If no
F401h ~
sound is defined for a selected index, the
F400h
F43Dh
MSM6679AL-110 plays a beep. See the Record
commands for information on creating name
tags.
Speak Phrase from Low Internal Memory. If no
sound is defined for a selected index, the
MSM6679AL-110 plays a beep. The default
phrases supplied with the MSM6679AL-110 in
the smaller low playback memory area are
F441h ~
listed below.
F450h
F441h Drip.
F442h
Buzzer.
F443h
Dial tone.
F444h
Bonk.
F400h
If enabled, this value is returned upon
completion of playback.
If enabled, this value is returned upon
completion of playback.
33
MSM6679AL-110 Voice Recognition Processor
¡ Semiconductor
Speak (Continued)
Opcode
Action
Return Value
Speak Phrase from High Internal/External
Memory. If no sound is defined for a selected
index, the MSM6679AL-110 plays a beep. The
default phras-es supplied with the
MSM6679AL-110 in the larger upper playback
memory area are listed below.
F451h
"0" simulated DTMF tone.
F452h
"1" simulated DTMF tone.
F453h
"2" simulated DTMF tone.
F451h ~
F454h
F47Ch
F455h
"3" simulated DTMF tone.
F456h
"5" simulated DTMF tone.
F457h
"6" simulated DTMF tone.
F458h
"7" simulated DTMF tone.
F459h
"8" simulated DTMF tone.
F45Ah
"9" simulated DTMF tone.
F45Bh
"*" simulated DTMF tone.
F45Ch
"#" simulated DTMF tone.
F400h
If enabled, this value is returned upon
completion of playback.
"4" simulated DTMF tone.
F47D
Reserved. This command is reserved for future
use.
—
—
F47Eh
Beep. This causes the MSM6679AL-110 to
beep for 50 ms.
F400h
If enabled, this value is returned upon
completion of playback.
F47Fh
Pause. This command can be issued while the
MSM6679AL-110 is performing sound output
and is then put in the MSM6679AL-110
command stack for subsequent processing.
F400h
When this command is executed, sound output
pauses for 0.2 sec.
The pause command is useful for word
spacing.
If enabled, this value is returned upon
completion of playback.
F480h
Set MSM6654 Mode. This command causes
the MSM6679AL-110 to initialize
None.
the external MSM665x device, also clearing the
device from BUSY mode.
Playback Sound from MSM665x Device. This
command causes the MSM6679AL-110 to
issue a speak command to the MSM665x slave
device.
F481h The value is passed on the MSM665x device as F400h
F4FFh
01h - 07Fh. The actual phrase is determined by
the vocabulary programmed into the MSM665x
device. Up to 127 external phrases are
supported.
F50Bh
34
Set MSM665x Busy Mode ON.
None.
If enabled, this value is returned upon
completion of playback.
If NAR is set, the F400h command is
sent when the MSM665x device is ready
for an-other command. If busy mode is
selected, the F400 command is
returened when the sound is finished.
¡ Semiconductor
MSM6679AL-110 Voice Recognition Processor
Speak (Continued)
Opcode
F51Bh
Action
Return Value
Set 6654 NAR mode. This command, which is
the complement of the F50B command, sets up
the handshaking to the attached 6654 speech
None.
synthe-sizer to use the NAR. This setup uses
the 6654's double buffer feature to eliminate
any gap between two consecutive phases.
Set Output Level. This command sets the
speech output level to one of 255 values as
follows:
FEXYh
FE03
Set minimum output level.
FE80h
Set output level half way (default).
FEFEh
Set maximum output level.
None.
Request
Opcode
F500h
Action
Status Request. This command causes the
MSM6679AL-110 to return a 2-byte value
indicating its current status.
Return Value
F500h
MSM6679AL-110 ready.
F520h
MSM6679AL-110 disabled.
F540h
MSM6679AL-110 waiting for start.
F560h
MSM6679AL-110 waiting for end.
F580h
MSM6679AL-110 processing.
F5A0h
Download/upload in progress.
F5C0h
Download/upload complete.
F5E0h
Select/jump complete.
F501h
Select last FLASH bank for SI recognition.
F510h
Select download RAM bank for SI/SP template
area. This command enables the download
RAM bank in the upper 32 K of data memory
for SI recognition.
No return value
F520h
Select buffer RAM bank for SI/SP. This
command enables the buffer RAM bank in the
upper 32 K of data memory for SI recognition.
No return value
35
MSM6679AL-110 Voice Recognition Processor
¡ Semiconductor
Request (Continued)
Opcode
Action
Download/Upload.
Full syntax: F5 02 00 Ctl AdH AdL NH NL [Dt1... DtN [Dt(N+1)]]
Full syntax: F5 02 00 Ctl AdH AdL NH NL [Dt1... DtN [Dt(N+1)]]
Ctl(7) = 0 for download, Ctl(7) = 1 for upload
Ctl(6) = 0 for data RAM, Ctl(6) = 1 for program RAM/ROM
If Ctl(6)=0 then Ctl(1-0) = Seg: Data segment selection
If Ctl(6)=1 and Ctl(1-0) = x0, then external program
segment 0 is used.
If Ctl(6)=1 and Ctl(1-0) = x1, then external program
segment 1 is used.
F502h
F504h
36
AdH AdL = high, low bytes of starting address.
NH NL = high, low bytes of N
N = Number of bytes to be downloaded or
uploaded (maximum 07FFCh)
Dt1... DtN = Download data. Note (here and in
upload response) that data are 8-bit binary
values, even if using the serial interface.
Dt(N+1). If N is odd, an extra byte is appended
to the data so that the total number of bytes in
the message remains even.
This command requests data transfer to/from data
or external program memory.The control
parameter (Ctl) controls the direction of the
transfer (i.e., download vs. upload) and specifies
which of six 64-Kbyte memory segments (i.e., four
data segments and two external program
segments) is to be accessed. This command does
not work with internal program memory. It is not
possible to download to external program memory
while running in external program memory. The
address and length parameters (AdH AdL NH NL)
specify the starting address and length of the
transfer in bytes. Since the MSM6679AL-110 can
only perform download /upload transfers within
one 32-Kbyte block in one Download /Upload
command, the address and length parameters
must not specify a transfer that violates a 32-Kbyte
address boundary. If this restriction is violated, the
download/upload request will be denied.
Retrieve MSM6679AL-110 Firmware Revision
Number.
Return Value
Immediately after receiving parameter NL, the
MSM6679AL-110 responds with a message to indicate
acceptance or denial of the transfer request. Acceptance
is indicated by F5A0h.
Denial is indicated by a F8XYh.
At the end of an accepted transfer, the MSM6679AL-110
re-sponds with a message to confirm or deny valid
completion of the transfer. Valid completion is indicated
by F5C0h.
F880h
Invalid message received.
F840h
Sample data over-run.
F820h
32-Kbyte block boundary violation error.
F810h
Unclassified download/upload error.
F808h
Divide-by-zero error.
F804h
Select/jump error.
F802h
Invalid SP header or table.
F801h
Reserved.
FAXYh
FBXYh
Most and least significant byte of
ad-dress where error occurred.
XXXX
Four-digit ASCII number.
¡ Semiconductor
MSM6679AL-110 Voice Recognition Processor
Request (Continued)
Opcode
Action
Return Value
F505h
Initialize in Background. Background noise
initialization is performed for 500 ms.
The MSM6679AL-110 requires this command
prior to recognition for noise vector subtraction
during the utterance sampling period. Use the
background initialization command whenever
there is a change in the background noise level.
For example, sample the noise signature in a
vehicle at rest and moving at 35 MPH with its
windows rolled down. The quality of a phone
line connection can also vary from call to call.
The host MCU must implement a strategy as to
when to issue a background initialization
F501h
command. In a vehicle, the host MCU could
monitor the vehicle speed, fan speed, radio
volume, etc. Alternatively, the host MCU could
issue this command each time a new
recognition session starts or a new line
connection is established.
However, the 0.5-sec sample period could
degrade system responsiveness if used too
frequently. A zero in this bit location during the
F2XXh command will not cause an initialization.
The F2xxh command can also be used to
perform background noise initialization.
Initialization is complete.
F506h
Retrieve Vocabulary and Trigger Table Revision
XXXX
Number.
Four digit ASCII number.
F507h
Save SDR templates in last FLASH. Save the
download RAM bank SD template area.
Saves 2684 bytes from the address set by the
F103 command to the address range F300FD7F in the last FLASH. The default is 4A00547B→F300-FD7F).
F501h
Save is complete.
F508h
Get SDR templates from last FLASH. Get the
download RAM bank SD template area.
Saves 2684 bytes to the address set by the
F103 command from the address range F300FD7B in the last FLASH. The default is (F300FD7B→4A00-547B).
No return value
F509h
Select Default SI Vocabulary. (First FLASH)
—
—
37
MSM6679AL-110 Voice Recognition Processor
¡ Semiconductor
Record
Opcode
Action
F101h
00XXh
Set Name Tag Length, Set MSM665x Busy
Mode ON. Name tag record length is set by
XXh, with XXh defining record length in 14-ms
intervals.
The maximum record length of FFh yields a
recording interval of 3.57 sec.
The default value is 1.2 sec.
F101h 00XXh Operation complete.
F105
xxxx
Set Name Tag Record Origin. This command
sets the beginning address for recording name
tags.
XXXX = 128 byte blocks from 0000 to 02FF.
The reset default is 0000.
This is only effective before an F50A command
since new recordings start after the end of the
previous recording. The F50A command uses
this num-ber to calculate the first address.
F105 BAAA,
where B is the
bank num-ber
(0,1,2), and
AAA is the
bank ad-dress
/16
(800 - FF8)
F106
xxxx
F106 BAAA,
where B is the
Set Name Tag Record End. This command sets bank num-ber
the ending address for recording name tags.
(0,1,2), and
XXXX = 128 byte blocks from 0000 to 02FF.
AAA is the
The reset default is 01FF.
bank ad-dress
/16
(800 - FF8)
F50Ah
Clear Name Tag Table.
F50Ch
Recall name tag pointers from first FLASH.
Save the first FLASH name tag pointers (FD80 F501h
FFFF) to the working name tag pointer table.
The default is (FD80-FFFF→5480-56FF).
Saved name tag table recalled.
F51Ch
Recall name tag pointers from last FLASH.
Save the last FLASH name tag pointesr (FD80 FFFF) to the working name tag pointer table.
The default is (FD80-FFFF→5480-56FF).
F501h
Name tag pointers recalled.
F50Dh
Save name tag pointers in first FLASH. Save
the working name tag pointer table to the first
FLASH name tag pointers. The default is (5480
-56FD→FD80-FFFD).
F501h
Name tag table saved.
F51Dh
Save name tag pointers in last FLASH. Save the
working name tag pointer table to the last
F501h
FLASH name tag pointers. The default is (5480
-56FD→FD80-FFFD).
Name tag pointers saved.
F50Eh
Set Record Volume HIGH.
—
—
F50Fh
Set Record Volume to Normal. This is the
default setting.
—
—
FA00h
Reserved. This command is reserved for future
use.
—
—
FA00h
Completed.
F280h
Memory full.
FA01h ~
FA3Dh Record Name Tag.
38
Return Value
F501h
Name tag table cleared.
¡ Semiconductor
MSM6679AL-110 Voice Recognition Processor
Record (Continued)
Opcode
Action
FA3Dh ~ Reserved. These commands are reserved for
FAFFh future use.
Return Value
—
—
SD Recognition Control
Opcode
Action
Return Value
Recognition performance is largely a function of how well the enrollment data represents subsequent tokens of the
enrolled utterances, and performance generally improves steadily with each additional enrollment pass. For most
applications, three initial enrollment passes are recommended. Subsequent reference updating can be performed
with the SD Recognize Update command (F342).
F521h
Clear SDR table. This command initializes a
blank SD template table. The 2684-byte area
from the address set by the F103 command
(the working SDR table) is set to zeros. The
SDR tables in the FLASH banks are not affected.
The default is (4A00 - 547B).
F501h
F6XYh
Set SD Segment Pointer. This command sets
the SD segment pointer to XY00h, i.e., set the
starting address of the current SD recognition
parame-ter table to XY00h. Issuing this
command is equivalent to issuing the Set SD
Origin command, F103h XY00h. (For further
details of operation, please refer to the
description of that command.)
No return value.
F9XYh
Search for SD Utterance XY. This is the first
F740h
step in adding an utterance to the vocabulary,
or in replacing an existing one. The SD
vocabulary memory is searched for utt. no. XYh. F700h
If it is not found and if sufficient SD memory
exists, the MSM6679AL-110 prepares to add
F73Fh
utterance number XYh to the vo-cabulary.
FB00h
FC00h
Enroll SD Utterance. This command starts
MSM6679AL-110 SD Listen mode, then uses
the next captured utterance to start or update
training of the reference data for SD utterance
number XY specified in the most recent Search
command (F9XYh). The user must be
prompted to say the utter-ance prior to issuing
this command.
If the utterance was previously enrolled, a
training update is performed; if not, the
reference data is initialized. Each utterance in
the SD vocabulary must be enrolled at least
once before it can be recognized.
SDR table is cleared
Utterance number found.
Utterance number not found.
Memory full.
F740h
Operation complete.
F700h
Aborting SD Listen mode.
F73Eh
Improper level, must repeat.
F802h
Invalid signal processing table.
F840h
Sample data overrun.
Erase utterance from SD vocabulary. This
command erases the reference parameters for
utterance number XYh from the SD vocabulary, F740h
where XYh is the utterance number retained
from the previous Search command (F9XYh).
Operation complete.
39
MSM6679AL-110 Voice Recognition Processor
¡ Semiconductor
Asynchronous Serial Protocol Example
All messages to the MSM6679AL-110 (except downloads and uploads) are echoed, but replies
from the MSM6679AL-110 to the host are not echoed by the host. This arrangement facilitates
manual communication with the MSM6679AL-110 using standard terminals. The following
table illustrates the range of MSM6679AL-110 functions.
Comment
Action
Voice Input
Host
MSM6679AL-110
Command Response
Initialize MSM6679AL-110 Host initializes MSM6679AL-110.
MSM6679AL-110 acknowledges.
F258
F258
F200
Load trigger tables at
5000h.
Host requests download
to data segment 0,
starting at location 5000h,
of 256 bytes (0100h).
MSM6679AL-110 accepts request.
Host sends 256 bytes
(~0.25 sec at 9600 baud).
MSM6679AL-110 indicates download
complete.
F502
0000
5000
0100
F502
0000
5000
0100
F5A0
Set new triggering origin.
Host requests
Set triggering origin to 5000h.
MSM6679AL-110 sets triggering origin
and sends confirming response.
F104
5000
F104
5000
F104
5000
Download new SD
vocabulary.
Host requests download
to data segment 0,
starting at location 6000h,
of 4 Kbytes (1000h).
MSM6679AL-110 accepts request.
Host sends 4 Kbytes
(~4.3 sec at 9600 baud)
MSM6679AL-110 indicates download
complete.
F502
0000
6000
1000
F502
0000
6000
1000
F5A0
40
...
F5C0
...
F5C0
¡ Semiconductor
Comment
MSM6679AL-110 Voice Recognition Processor
Action
Voice Input
Host
MSM6679AL-110
Command Response
Set new SD tables.
Host requests
Set SD origin to 6000h.
MSM6679AL-110 sets SD origin
and responds.
F103
6000
F103
6000
F103
6000
Download first 4 K of SI
vocabulary.
Host requests download
to data segment 0,
starting at location 7000h,
of 4k bytes (1000h).
MSM6679AL-110 accepts request.
Host sends 4 Kbytes.
MSM6679AL-110 indicates download
complete.
F502
0000
7000
1000
F502
0000
7000
1000
F5A0
Host requests download
to data segment 0,
starting at location 8000h,
of 32k bytes (7FFC).
MSM6679AL-110 accepts request
HOST sends 32 Kbytes.
MSM6679AL-110 indicates download
complete.
F502
0000
8000
7FFC
Set new SP/SI tables.
Host requests
Set SP/SI origin = 7000h.
MSM6679AL-110 sets SP/SI origin
and responds.
F102
7000
F102
7000
F102
7000
Upload data for
diagnostics.
Host requests upload
from data segment 0,
starting at location 300h,
of 45 bytes (2Dh).
MSM6679AL-110 accepts request,
signals in progress.
MSM6679AL-110 sends 46 bytes.
MSM6679AL-110 indicates upload
complete.
F502
00A0
0300
002D
F502
00A0
0300
002D
F5A0
Host requests set SP table 3.
MSM6679AL-110 selects SP table 3
and confirms.
Host initializes MSM6679AL-110.
MSM6679AL-110 acknowledges.
F123
F123
F123
F258
F258
F200
F301
F301
F302
F603
F302
F302
F602
F302
Download last 32 K of SI
vocabulary.
Set up MSM6679AL-110
for SI recognition.
SI recognition.
...
F5C0
F502
0000
8000
7FFC
F5A0
...
F5C0
...
F5C0
Host starts SI recognition, vocabulary 1.
"Dial"
MSM6679AL-110 recognizes utterance 3.
Host starts SI recognition, vocabulary 2.
"Two"
MSM6679AL-110 recognizes utterance 2.
Host starts SI recognition, vocabulary 2.
"Three"
MSM6679AL-110 recognizes utterance 3.
F603
41
MSM6679AL-110 Voice Recognition Processor
Comment
SI recognition.
Action
¡ Semiconductor
Voice Input
Host
F301
Host starts SI recognition, vocabulary 1.
MSM6679AL-110
Command Response
F301
"Store"
F601
MSM6679AL-110 recognizes utterance 1.
SD enrollment.
Get ready to train SD utterance 1.
Memory is empty and ready to train.
Pass 1; host sends SD enroll command.
F901
FB00
F901
F700
FB00
"John Smith"
SD utterance 1 initialized.
Pass 2; host sends SD enroll command.
FB00
"John Smith"
SD utterance 1 updated.
Pass 3. Host sends SD enroll command.
F740
FB00
F740
FB00
FB00
"John Smith"
F740
SD utterance 1 updated.
SI recognition of control
words.
Host starts SI recognition, vocabulary 1.
F301
F301
F302
F603
F302
F302
F605
F302
F301
F606
F301
"Dial"
MSM6679AL-110 recognizes utterance 3.
Host starts SI recognition, vocabulary 2.
"Five"
MSM6679AL-110 recognizes utterance 5.
Host starts SI recognition, vocabulary 2.
"Six"
MSM6679AL-110 recognizes utterance 6.
Host starts SI recognition, vocabulary 1.
"Store"
MSM6679AL-110 recognizes utterance 7.
SD enrollment.
F601
Host prepares MSM6679AL-110 to train
SD utterance 2
Memory is empty and ready to train.
Pass 1; host sends SD enroll command.
F902
F902
FB00
F700
FB00
FB00
F740
FB00
FB00
F740
FB00
"Bill Jones"
SD utterance 2 initialized.
Pass 2; host sends SD enroll command.
"Bill Jones"
MSM6679AL-110 updates SD utterance 2.
Pass 3; host sends SD enroll command.
"Bill Jones"
MSM6679AL-110 signals operation
completed.
SI recognition of control
word.
F740
Host starts SI recognition, vocabulary 1.
F301
MSM6679AL-110 recognizes utterance 3.
SD recognition.
F301
"Directry"
F603
Host starts SD recognition.
F340
F340
F341
F740
F341
F701
"John Smith"
MSM6679AL-110 signals trigger OK.
Host sends SD sort command.
MSM6679AL-110 recognizes utterance 1.
42
¡ Semiconductor
Comment
Name tag recording.
MSM6679AL-110 Voice Recognition Processor
Action
Voice Input
Host initiates MSM665x port.
Host sets recording length to 1 sec.
MSM6679AL-110 signals operation complete.
Host clears name tag table
MSM6679AL-110 signals operation complete.
Host sets record gain to max. level.
Start recording tag one.
Host
MSM6679AL-110
Command Response
F480
F101 0047
F50A
F50E
FA01
F480
F101 0047
F101 0047
F50A
F501
F50E
FA01
"Jane Doe"
MSM6679AL-110 signals name tag
recording complete.
Save name tags to FLASH.
Name tags saved.
Name tag playback.
FA00
F50D
F50D
F501
FEFF
F401
FEFF
F401
"Jane Doe"
F400
Host sets output volume to mid point.
Play MSM6679AL-110 internal sound 1.
FE80
F442
Play back sound from MSM6654.
F49F
FE80
F442
"bzzzz"
F49F
"Completed"
Host sets volume to max. level.
Host commands play back name tag 1.
MSM6679AL-110 signals playback OK.
Sound playback.
43
E2Y0001-28-41
MSM6679AL-110 Voice Recognition Processor
¡ Semiconductor
NOTICE
1.
The information contained herein can change without notice owing to product and/or
technical improvements. Before using the product, please make sure that the information
being referred to is up-to-date.
2.
The outline of action and examples for application circuits described herein have been
chosen as an explanation for the standard action and performance of the product. When
planning to use the product, please ensure that the external conditions are reflected in the
actual circuit, assembly, and program designs.
3.
When designing your product, please use our product below the specified maximum
ratings and within the specified operating ranges including, but not limited to, operating
voltage, power dissipation, and operating temperature.
4.
Oki assumes no responsibility or liability whatsoever for any failure or unusual or
unexpected operation resulting from misuse, neglect, improper installation, repair, alteration
or accident, improper handling, or unusual physical or electrical stress including, but not
limited to, exposure to parameters beyond the specified maximum ratings or operation
outside the specified operating range.
5.
Neither indemnity against nor license of a third party's industrial and intellectual property
right, etc. is granted by us in connection with the use of the product and/or the information
and drawings contained herein. No responsibility is assumed by us for any infringement
of a third party's right which may result from the use thereof.
6.
The products listed in this document are intended for use in general electronics equipment
for commercial applications (e.g., office automation, communication equipment,
measurement equipment, consumer electronics, etc.). These products are not authorized
for use in any system or application that requires special or enhanced quality and reliability
characteristics nor in any system or application where the failure of such system or
application may result in the loss or damage of property, or death or injury to humans.
Such applications include, but are not limited to, traffic and automotive equipment, safety
devices, aerospace equipment, nuclear power control, medical equipment, and life-support
systems.
7.
Certain products in this document may need government approval before they can be
exported to particular countries. The purchaser assumes the responsibility of determining
the legality of export of these products and will take appropriate and necessary steps at their
own expense for these.
8.
No part of the contents cotained herein may be reprinted or reproduced without our prior
permission.
Copyright 1998 Oki Electric Industry Co., Ltd.
Printed in Japan
44
Similar pages