MSM6679AL-110 Voice Recognition Processor FIRST EDITION ISSUE DATE: Nov. 1998 MSM6679AL-110 Voice Recognition Processor ¡ Semiconductor Contents General Description ........................................................................................................................... Features ................................................................................................................................................ Functional and I/O Diagrams .......................................................................................................... Pin Descriptions ................................................................................................................................. Electrical Specifications ..................................................................................................................... Absolute Maximum Ratings ............................................................................................ Operating Conditions ....................................................................................................... DC Characteristics (VDD = 2.7 to 5.5 V, Ta = -30 to 70˚C) .......................................... AC Characteristics ............................................................................................................ Timing Diagram ................................................................................................................ System Configuration Example ....................................................................................................... Functional Description ...................................................................................................................... Voice Recognition ............................................................................................................. SI Recognition ................................................................................................... SD Recognition ................................................................................................. Name Tag Recording ........................................................................................................ Audio Input Interface ....................................................................................................... Audio Output Interface .................................................................................................... Memory Interface .............................................................................................................. External Voice Synthesis Control ................................................................................... Serial Interface ................................................................................................................... MSM6679AL-110 Slave-Mode API .................................................................................................. Command Summary ........................................................................................................ Command Descriptions ................................................................................................... Asynchronous Serial Protocol Example ........................................................................ 1 1 2 6 10 10 10 11 12 13 14 15 15 15 18 19 19 19 19 21 22 23 24 28 40 E2F0013-28-Y1 version: Nov. 1998 MSM6679AL-110 VoiceThis Recognition Processor ¡ Semiconductor MSM6679AL-110 ¡ Semiconductor SI/SD Voice Recognizer, Recorder/Player, and Speech Synthesizer GENERAL DESCRIPTION The MSM6679AL-110 Voice Recognition Processor (VRP) is a slave-mode device that performs five func-tions: speaker-independent (SI) voice recognition, speaker-dependent (SD) voice recognition, solid-state sound recording, sound playback, and speech synthesis. The highly integrated device also provides an on-chip memory controller, Flash memory interface, analog data conversion, Oki speech synthesizer interface, and pulse width modulation (PWM) sound output. For SI recognition, the MSM6679AL-110 contains a vocabulary template in external memory. Pretrained SI vocabularies eliminate the need for laborious training, as usually required by SD products. The memory requirements are dependent on the size of the vocabulary. The MSM6679AL-110 can tolerate background noise, while providing high recognition accuracy. In its designated operating environment, the device achieves a typical recognition accuracy of >95% (using an Oki-defined test procedure). For SD recognition, the MSM6679AL-110 stores SD vocabulary templates, as defined by the user, in external SRAM. The MSM6679AL-110 can create SD vocabularies of up to 61 words each, with each word using approximately 50 bytes. In addition to providing voice recognition capabilities, the MSM6679AL-110 integrates a solidstate recorder/player, speech synthesis functions, and a tone generator. ADPCM recording/ playback provides high quality sound and efficient memory utilization. The MSM6679AL-110 can respond to spoken com-mands, verbally or with tones, via an on-chip speech synthesizer and tone generator. For larger speech-synthesis requirements, the MSM6679AL-110 also provides a glueless MSM665x control interface for off-chip speech synthesis. The MSM6679AL-110 can interface to any application or personal computer via a serial interface through an open, device-independent serial mode API (SMAPI). To accelerate code development, Oki supplies an evaluation kit, and assembly and C language programs for this product. The MSM6679AL-110 is a low power version of the MSM6679A-110. Note: This device is intended for use in applications other than central office communication systems and central office switching systems. FEATURES • SI recognition - Up to 20 - 25 words in each vocabulary - Multiple vocabulary support • SD recognition - Up to 61 words in each vocabulary - Multiple vocabulary support • Speech synthesis - Up to 2.3-sec internal and 27.6-sec external speech synthesis on-chip; sample looping and concatenation allows even longer phrases. - On-chip controller for MSM665x speech synthesizer - Standard beep tone outputs - Pulse code modualation (PCM) and adaptive differential pulse code modualation (ADPCM) voice or soundeffect output • Speech capture and playback - 28-kbps ADPCM speech compression • Serial ASCII command interface • 6944-Hz audio input sample rate for record and playback • 10-kHz sample rate for voice recognition • 200-msec recognition latency • Flexible memory mapping for EPROM, FLASH, and SRAM • 14.3182 MHz operation • Package: 100-pin TQFP (TQFP100-P-1414-0.50-K) 1 MSM6679AL-110 Voice Recognition Processor ¡ Semiconductor FUNCTIONAL AND I/O DIAGRAMS Analog Recognition and System Serial Input Synthesis Engine Controller Interface PWM Vocabulary Algorithm Memory Memory Output External Speech Synthesis External Memory Control Control Figure 1. MSM6679AL-110 Block Diagram ADC0 ~ ADC7 D0 ~ D7 A/D Interface VREF Serial-Mode MSM665x Interface PWM Output Serial Interface IC Reset and Oscillator Inputs NAR BUSY SI SD STROBE RESOUT VOICEOUT1 RXD TXD RES OSC0 OSC1 A0 ~ A15 WRRAM RDRAM RAMPAGE0 RAMPAGE1 SLEEP PDC Figure 2. MSM6679AL-110 Logic Symbol 2 Memory Interface 76 N/C 77 N/C 78 N/C 79 N/C 80 VDD 81 VREF 82 ADC0 83 ADC1 84 ADC2 85 ADC3 86 ADC4 87 ADC5 88 ADC6 89 ADC7 90 AGND 91 RXD 92 TXD 93 GND 94 N/C 95 N/C 96 N/C MSM6679AL-110 Voice Recognition Processor 97 N/C 98 N/C 99 N/C 100 N/C ¡ Semiconductor 1 75 A15 N/C 2 74 A14 N/C 56 D5 N/C 21 55 D4 N/C 22 54 D3 N/C 23 53 D2 N/C 24 52 D1 N/C 25 51 D0 WRRAM 50 57 D6 PDC 20 RDRAM 49 58 D7 N/C 19 N/C 48 59 GND N/C 18 N/C 47 60 A0 N/C 17 SLEEP 46 61 A1 N/C 16 N/C 45 62 A2 N/C 15 N/C 44 63 A3 GND 14 N/C 43 64 A4 VDD 13 N/C 42 65 A5 NAR 12 VDD 41 66 A6 N/C 11 OSC1 40 67 A7 VOICEOUT1 10 OSC0 39 9 GND 38 68 A8 N/C N/C 37 8 N/C 36 69 A9 N/C VDD 35 7 N/C 34 70 A10 SD N/C 33 6 RES 32 71 A11 SI RAMPAGE1 31 5 RAMPAGE0 30 72 A12 BUSY N/C 29 4 N/C 28 73 A13 STROBE N/C 27 3 N/C 26 RESOUT Figure 3. MSM6679AL-110 100-Pin TQFP Pinout 3 MSM6679AL-110 Voice Recognition Processor ¡ Semiconductor MSM6679AL-110 Alphabetic Pin List 4 Name # Name # Name # Name # Name # Name # A0 60 A10 70 ADC4 86 D4 55 RAMPAGE0 30 TXD 92 VDD 13, 35, 41, 80 VOICEOUT1 10 A1 61 A11 71 ADC5 87 D5 56 RAMPAGE1 31 A2 62 A12 72 ADC6 88 D6 57 RDRAM 49 A3 63 A13 73 ADC7 89 D7 58 RES 32 14,38, RESOUT 59,93 RXD A4 64 A14 74 AGND 90 A5 65 A15 75 BUSY 5 A6 66 ADC0 82 D0 51 NAR 12 SD 7 A7 67 ADC1 83 D1 52 OSC0 39 SI 6 A8 68 ADC2 84 D2 53 OSC1 40 SLEEP 46 A9 69 ADC3 85 D3 54 PDC 20 STROBE 4 GND 3 VREF 81 91 WRRAM 50 ¡ Semiconductor MSM6679AL-110 Voice Recognition Processor Figure 4. MSM6679AL-110 100-Pin Package Mechanical Drawing 5 MSM6679AL-110 Voice Recognition Processor ¡ Semiconductor PIN DESCRIPTIONS Pin # 1 Pin Name Signal Type NC Description (do not connect) Reserved. These pins are reserved for future used and must be left open. 2 3 RESOUT Output MSM665x Reaet. This pin provides a reset signal for an external speech synthesis engine. 4 STROBE Output MSM665x Strobe. This output provides the LOAD signal for an external speech synthesizer. 5 BUSY Input MSM665x Busy. When using an external MSM665x device, this pin monitors the MSM665x BUSY signal and connects directly to the MSM665x BUSY signal output. 6 SI Output 7 SD Output MSM665x Serial Clock. This MSM6679AL-110 output connects to the MSM665x SI input. The SI pin is the MSM665x serial clock input pin. MSM665x Serial Data. This MSM6679AL-110 output connects to the MSM665x SD input. The SD pin is the MSM665x serial data input pin. 8 NC (do not connect) Reserved. These pins are reserved for future use and must be left open. 9 10 VOICEOUT1 Output Voice Out. This pin is the PWM output for speech synthesis, voice sample playback, and voice prompts. An external integrator must be used to convert this to an analog signal. 11 NC 12 NAR (do not connect) Reserved. This pin is reserved for future use and must be left open. Input MSM665x Next Address Request. This pin signals to the MSM6679AL-110 that the external speech synthesis engine is ready for another command. 13 VDD Digital Power Power. 14 GND Digital Ground Ground. 15 NC Input Reserved. These pins are reserved for future use and must be tied to VDD. 16 17 18 NC 19 NC (do not connect) Reserved. This pin is reserved for future use and must be left open. Input 20 PDC Input Reserved. This pin is reserved for future use and must be tied to VDD. Power down release. Power down mode is released by both edge of PDC signal. 6 ¡ Semiconductor Pin # 21 MSM6679AL-110 Voice Recognition Processor Pin Name Signal Type NC Description (do not connect) Reserved. These pins are reserved for future use and must be left open. 22 23 24 25 26 27 28 29 30 RAMPAGE0 31 RAMPAGE1 32 RES Output RAM Page Select. These signals support selection of one out of four RAM pages. Each page is 64kbytes in size. Input MSM6679AL-110 Reset. External logic should assert this power-on reset signal LOW when power is applied to the MSM6679AL-110. 33 NC Input Reserved. These pins are reserved for future use and must be tied to VDD. 34 35 VDD 36 NC Digital Power Power. Input 38 GND Ground 39 OSC0 Input Reserved. These pins are reserved for future use and must be tied to VDD. 37 Ground. Oscillator 0/External Clock. When the MSM6679AL-110 uses a crystal oscillator, this input is the oscillator input pin. The pin is then connected to one side of a crystal and load capacitor. When used with an external clock, the external clock is applied to this input. 40 OSC1 Output Oscillator 1. When the MSM6679AL-110 uses a crystal oscillator, this output is the oscillator output pin. The pin is then connected to one side of a crystal and load capacitor. When used with an external clock, this output is left unconnected. 41 VDD Digital Power Power. 42 NC (do not connect) Reserved. These pins are reserved for future use and must be left open. 43 44 45 46 SLEEP Output Sleep. When power down mode, this pin becomes low. Sleep signal can be used for external memory control. 7 MSM6679AL-110 Voice Recognition Processor Pin # 47 Pin Name Signal Type NC ¡ Semiconductor Description (do not connect) Reserved. These pins are reserved for future use and must be left open. 48 49 RDRAM Output RAM Read. This is a strobe signal for direct connection to an external RAM's RD input. When asserted LOW, this signal indicates that the MSM6679AL110 is ready to read data from RAM. 50 WRRAM Output RAM Write. This is a strobe signal for direct connection to an external RAM's WR input. When asserted LOW, this signal indicates that the MSM6679AL110 is ready to write data to RAM. 8 51 D0 52 D1 53 D2 54 D3 55 D4 56 D5 57 D6 58 D7 59 GND 60 A0 61 A1 62 A2 63 A3 64 A4 65 A5 66 A6 67 A7 68 A8 69 A9 70 A10 71 A11 72 A12 73 A13 74 A14 75 A15 Bidirectional Memory Data Bus. I/O Digital Ground Ground. Output Memory Address Bus. ¡ Semiconductor Pin # 76 MSM6679AL-110 Voice Recognition Processor Pin Name Signal Type NC Description (do not connect) Reserved. These pins are reserved for future use and must be left open. 77 78 79 80 VDD 81 VREF Digital Power Power. Analog Power Analog Power. The MSM6679AL-110's on-chip A/D converter uses this Reference Voltage analog power when converting an analog signal into digital samples. Also this is used as an analog reference voltage. 82 ADC0 83 ADC1 input. Signal conditioning, via a bandpass fillter and gain circuit, is required 84 ADC2 before this input. 85 ADC3 86 ADC4 87 ADC5 88 ADC6 89 ADC7 90 AGND Analog Input Analog Input. These eight inputs are tied together and serve as the analog Analog Ground Analog Ground. This pin provides an analog ground point, allowing independent grounding of the analog and digital circuitry. Separate grounds reduce the impact of digital switching noise on analog sampling accuracy. 91 RXD Input 92 TXD Output Serial Port Receive. This is the receive data line for serial port. Serial Port Transmit. This is the transmit data line for serial port. 93 GND Ground Ground. 94 NC (do not connect) Reserved. These pins are reserved for future use and must be left open. 95 96 97 98 99 100 9 MSM6679AL-110 Voice Recognition Processor ¡ Semiconductor ELECTRICAL SPECIFICATIONS Absolute Maximum Ratings Parameter Digital power supply voltage Input voltage Output voltage Symbol –0.3 to +7.0 VI –0.3 to VDD +0.3 VO VREF Analog input voltage VAI Storage temperature 1. Value VDD Analog power/reference voltage Power dissipation Conditions PD TSTG GND = AGND = 0 V –0.3 to VDD +0.3 Unit V –0.3 to VDD +0.3 –0.3 to VREF Ta = 70˚C, per package 650 Ta = 70˚C, per output 8 — –50 to +150˚C mW ˚C Permanent device damage may occur if ABSOLUTE MAXIMUM RATINGS are exceeded. Functional operation should be restricted to the conditions as detailed elsewhere in this data sheet. Exposure to absolute maximum rating conditions for extended periods may affect device reliability. Operating Conditions Symbol Conditions Value Digital power supply voltage VDD fOSC = 14.3182 MHz 2.7 to 5.5 Analog power/reference voltage VREF — VDD –0.3 to VDD Analog input voltage VAI — AGND to VREF Storage holding voltage VDDH fOSC = 0 MHz 2.0 to 5.5 Operating frequency fOSC VDD = 2.7 to 5.5 V 14.3182 MHz Ambient temperature Ta — –30 to 70˚C ˚C MOS load 20 Parameter Fan-out N TTL load, D0 ~ D7, WRRAM, RDRAM and SLEEP TTL Load, all other outputs 10 6 1 Unit V ¡ Semiconductor MSM6679AL-110 Voice Recognition Processor DC Characteristics (VDD = 2.7 to 5.5 V, Ta = -30 to 70˚C) Parameter Symbol High-level input voltage VIH Low-level input voltage VIL Condition Rated Value Unit Min Typ [1] Max Applied to D0-D7 0.44 × VDD — VDD +0.3 Applied to all other I/O 0.80 × VDD — VDD +0.3 Applied to D0-D7 –0.3 — 0.16 × VDD Applied to all other I/O –0.3 — 0.2 × VDD VDD –0.4 — — VDD –0.4 — — — — 0.5 — — 0.5 — — 1/–1 VI = VDD/0 V, applied to RES — — 1/–250 VI = VDD/0 V, applied to OSC0 — — 15/–15 VO = 2.4 V, applied to D0-D7 –2 — — VO = 2.4 V, applied to all other I/O –1 — — VO = 2.4 V, applied to D0-D7 10 — — VO = 2.4 V, applied to all other I/O 5 — — — ±10 — 5 — — 7 — During voice input — — 4 When voice input is halted — — 10 µA fOSC = 14.3182 MHz, no load — — T.B.D mA Output current = –400 mA, applied to D0-D7, WRRAM, RDRAM and High-level output voltage VOH SLEEP Output current = –200 mA, for all other I/O V Output current = 3.2 mA, applied to D0-D7, WRRAM, RDRAM and Low-level output voltage VOL SLEEP Output current = 1.6 mA, for all other I/O VI = VDD/0 V, applied to ADC0- Input leak current IIH, IIL Input current High-level output current IOH Low-level output current IOL Output leakage current ILO Input capacitance CI Output capacitance CO Analog reference power supply voltage Power consumption 1. IREF IDD ADC7 VO = VDD/0 V f = 1 MHz, Ta = 25˚C µA mA µA pF mA Typical condition is 3 V 25˚C. 11 MSM6679AL-110 Voice Recognition Processor ¡ Semiconductor AC Characteristics External Data Memory Control (VDD = 2.7 ~ 5.5 V, Ta = -30 ~ 70˚C) Parameter Symbol Condition Min. Max. Cycle time tCYC — 69.8 — Clock pulse width (HIGH level) tfWH 28 — Clock pulse width (LOW level) tfWL 28 — RDRAM pulse width tRW 190 — WRRAM pulse width tWW 190 — RDRAM pulse delay time tRD — 75 WRRAM pulse delay time tWD — 75 Address set-up time tAS –5.1 — Address hold time tAH 29 41 Read data set-up time tRS 60 — Read data hold time tRH 0 — Read data access time tACC — 124 Write data set-up time tWS 169 — Write data hold time tWH 29 41 12 CL = 50 pF Unit ns ¡ Semiconductor MSM6679AL-110 Voice Recognition Processor Timing Diagram tCYC CLK tfWH tfWL RDRAM tRD tRW A0 - A15 RAP0 - 15 tAS tAH D0 - D7 DIN0 - 7 tACC tRH tRS WRRAM tWD tWW A0 - A15 RAP0 - 15 tAS tAH D0 - D7 DOUT0 - 7 tWS CLK WRRAM RDRAM A0 - A15 RAP0 - 15 DIN0 - 7 DOUT0 - 7 : : : : : : : tWH Clock pulse RAM write strobe signal RAM read strobe signal Memory address bus RAM address Read data Write data Figure 5. RAM Read/Write Timing 13 14 22 mF 0.22 mF 33 VSS VDD XT XT AOUT NAR BUSY SI SD ST RESET Analog Circuit 0.1 mF NAR BUSY SI SD STROBE RESOUT ADC0 ADC1 ADC2 ADC3 ADC4 ADC5 ADC6 ADC7 VREF AGND VOICEOUT TXD RXD PDC GND VDD D0 D1 D2 D3 D4 D5 D6 D7 A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 SLEEP WR RD RAMPAGE0 RAMPAGE1 OSC0 OSC1 33 FLASH 3 4 2 14.3182 MHz A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 A16 A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 SRAM 4 MHz Speaker Mic Host MCU interface 0.1 mF 0.1 mF WR RD CS D0 D1 D2 D3 D4 D5 D6 D7 WR RD CS D0 D1 D2 D3 D4 D5 D6 D7 MSM6679AL-110 Voice Recognition Processor ¡ Semiconductor SYSTEM CONFIGURATION EXAMPLE MSM6679AL-110 MSM66P54 Figure 6. MSM6679AL-110 System Configuration Example ¡ Semiconductor MSM6679AL-110 Voice Recognition Processor FUNCTIONAL DESCRIPTION Voice Recognition The MSM6679AL-110 performs both SI and SD recognition. SI vocabularies are embedded in the MSM6679AL-110. For SD recognition, each recognized phrase must be enrolled in the MSM6679AL-110’s vocabulary by creating a composite template from multiple recordings of the same phrase. Then the composite tempalte is stored in SRAM or FLASH memory. During both SI and SD recognition, the MSM6679AL-110 performs the following steps: 1. After external band-pass filtering, the MSM6679AL-110 converts the analog signal to PCM samples. 2. The MSM6679AL-110 extracts significant features from the sample data by frequency and time-domain analysis. 3. The MSM6679AL-110 compares the analyzed input with the reference data for each signal, weighing the significance of similarities according to control software parameters. A score (expressed as distance) is generated for each phrase. 4. The vocabulary phrase that achieves the highest score (or lowest distance) is judged to match the input phrase, assuming that the score exceeds a predetermined threshold. 5. Via a special command, the MSM6679AL-110 can also return the scores of the input against all defined vocabulary phrases for SI or SD recognition. This feature allows external host software to select the next best match, if the closest match is not contextually logical. SI Recognition Oki supplies the MSM6679AL-110 with predefined SI vocabularies which Oki builds from hundreds of utterances by a wide variety of speakers. SI vocabularies are limited to 25 words or less, which allows the MSM6679AL-110 to achieve a net accuracy of >95%, even in noisy conditions. SI vocabularies are grouped into sub-vocabularies of ≤15 words, to maintain the highest accuracy. Similar words in any one sub-vocabulary can cause substitution errors. Oki Semiconductor’s standard cellular vocabulary is intended for an automotive environment with a far-talk microphone. This vocabulary may work adequately in other conditions, such as an office or outside, but recognition performance may be degraded. MSM6679AL-110 Cellular SI Recognition Vocabulary Sub-Vocabulary 1 Sub-Vocabulary 2 Sub-Vocabulary 3 Phrase Index Phrase Index Phrase Index Phrase Index Store 1 One 1 Eight 8 Yes 1 Dial 2 Two 2 Nine 9 No 2 Delete 3 Three 3 Zero Ah Cancel 3 Directory 4 Four 4 Oh Bh — — — — Five 5 Stop Ch — — — — Six 6 Clear Dh — — — — Seven 7 — — — — 15 MSM6679AL-110 Voice Recognition Processor ¡ Semiconductor MSM6679AL-110 Control Vocabulary Sub-Vocabulary 1 Sub-Vocabulary 2 Phrase Index A/C Fan Phrase Index 1 Low 1 2 Medium 2 Temperature 3 High 3 Timer 4 Increase 4 Service 5 Decresse 5 Help 6 Set 6 Select 7 Reset 7 — — Cancel 8 — — Clear 9 — — Recall A — — On B — — Help C MSM6679AL-110 Direction Vocabulary Sub-Vocabulary 1 Phrase Index Up 1 Down 2 Left 3 Right 4 Formard 5 Reverse 6 Faster 7 Slower 8 Start 9 Stop A Cancel B MSM6679AL-110 Browse Vocabulary Sub-Vocabulary 1 16 Sub-Vocabulary 2 Phrase Index Phrase Index Phrase Index Phrase Index Phrase Index Up 1 Next 5 Home 9 Set 1 On 5 Down 2 Previous 6 — — Reset 2 Play 6 Left 3 Select 7 — — Start 3 Lock 7 Right 4 Cancel 8 — — Stop 4 Cancel 8 ¡ Semiconductor MSM6679AL-110 Voice Recognition Processor MSM6679AL-110 Japanese Navigation Vocabulary Sub-Vocabulary 1 Sub-Vocabulary 2 Sub-Vocabulary 3 Sub-Vocabulary 4 Phrase Index Genzaichi Jiaku Phrase Index Phrase Index Phrase Index 1 Ue 1 Hyoujun 1 Hai 1 2 Shita 2 Kakudai 2 Iie 2 Kaisya 3 Hidari 3 Shukushou 3 Ofu 3 Houi 4 Migi 4 Zentai 4 — — Sentaku 5 — — Kaiten 5 — — Yuudou 6 — — Kyori 6 — — Nabi 7 — — Hosei 7 — — — — — — Teisei 8 — — MSM6679AL-110 Japanese Cellular Vocabulary Sub-Vocabulary 1 Sub-Vocabulary 2 Phrase Index Phrase Index Phrase Index On 1 Ichi 1 Kyuu 9 Ofu 2 Ni 2 Zero A Daiyaru 3 San 3 Sharp B Tansyuku 4 Yon 4 Star C Denwacho 5 Go 5 Kakunin D Kakunin 6 Roku 6 Touroku E Nabi 7 Nana 7 Rei F — — Hachi 8 — — MSM6679AL-110 German Cellular Vocabulary Sub-Vocabulary 1 Phrase Index Speichern Wählen Sub-Vocabulary 2 Sub-Vocabulary 3 Phrase Index Phrase Index 1 Eins 1 Neun 2 Zwei 2 Null Löschen 3 Drei 3 Notruf B Name 4 Vier 4 Wählen C Fünf 5 Löschen D Sechs 6 Raute E Sieben 7 Stern F Acht 8 Phrase Index 9 Ja 1 A Nein 2 Löschen 3 SI vocabulary generation starts with collecting reference utterances from ≥400 speakers with: • An equal mixture of males and females • Accents from all regions of the country of intended use • ~15% non-native speakers. 17 MSM6679AL-110 Voice Recognition Processor ¡ Semiconductor The samples should be generated from a randomly-ordered list, with each word spoken twice and with a dummy word at the beginning and end. There must be >2 sec between each sample for accurate data processing. To provide the audio fidelity required for high-quality recognition training, a DAT recorder, together with the microphone that will be used in the final application, is required. To ensure data integrity, data is submitted to Oki after collecting samples from the first 20 speakers for initial screening. If acceptable, then the remaining collection may proceed. If substitution errors are possible, collection of spare words during initial collection is recommended. For example, alternate words to “Stop” and “Top” could be “Halt” and “First.” Collections should contain a wide variety of the background sound conditions that will exist during actual usage. For example, if the collection is for use in an automobile, conditions such as vehicle speed, road conditions, various window opening positions, heater or AC blower speeds and radio volumes should be varied during the collection. The signal-to-noise ratio should be maintained at ≥ 20dB. To achieve high accuracy rates, phrase selection, data collection, background initialization strategy, and control software need careful consideration. There are no published standards for recognition accuracy. Oki defines accuracy by: Accuracy = 100% - ERATE ERATE = ESUB + 1/2 EREJ with the following definitions: Parameters for Recognition Accuracy Name Symbol Condition Substitution Error ESUB Most critical type error, e.g., Say "Five", recogrize "Nine" Rejection Error EREJ Word not recognized, opportunity for operator to repeat Gap Error EGAP Word spoken before recognizer ready Time-Out Error ETME Word length is too long Spurious Response Error ESPU Sourd or imvalid word classfied as a valid word (i.e., drop handset or speak wong word) A typical target accuracy of 97% is achieved with a 3% ERATE , composed of a 1.5% ESUB rate and a 3%EREJ rate. SD Recognition In SD recognition mode, the MSM6679AL-110 can be trained to recognize up to 61 words. The MSM6679AL-110 can support multiple speakers by switching vocabularies, but only one speaker’s vocabulary should be active at one time. The end user enrolls a phrase in the MSM6679AL-110’s vocabulary by recording the phrase three times or more. The host Micro Controller Unit (MCU) controls the number of times each phrase in enrolled. Generally, higher recognition accuracy is achieved with each additional enrollment. The word set is made more robust by pronouncing each phrase slightly differently during initial enrollment. In addition to enrollment training, adaptive template updating can drive the accuracy towards 100%. The host MCU updates templates by first asking the speaker to confirm a recognized phrase with a “yes” or “no” response, and subsequently updating the template for corresponding words. The use of name tags (see next paragraph) facilitates this process. 18 ¡ Semiconductor MSM6679AL-110 Voice Recognition Processor Name Tag Recording To facilitate SD recognition, the MSM6679AL-110 supports recording and playback of name tags. Name tags are used to confirm correct responses in SD recognition. For example, in a phone dialer application, the user associates a “name” (which is recorded into memory) with a phone number. The MSM6679AL-110 then plays back the name tag so that the user can verify that the recognized phrase is the correct one. The VRP stores names tags in memory using an ADPCM compression algorithm with 28 kbps of speech. The length of a name tag is controlled with a command from the users host MCU program. The maximum number of name tags possible is 61, but the actual number is dependent upon record time and memory available. See the section on memory interface for more detail. Audio Input Interface A critical item for high-accuracy speech recognition is correct design of the audio input circuit. A circuit with appropriate gain and frequency responses must be placed between the microphone and MSM6679AL-110’s A/D input. Oki recommends input gain and a band pass filter with the following characteristics: • Four pole Chebyshev high-pass filter, 3 dB point at 225 Hz • Dual-pole low-pass filter, 3 dB point at 4250 Hz • Midband gain of 46 dB at 1000 Hz The above gain and filter characteristics are obtained by using a rail-to-rail quad CMOS op-amp and one-half supply rail splitter to bias the input signal at 1/2 VDD nominal. The MSM6679AL-110 uses multiple analog inputs to improve sampling quality. An on-chip analogy to digital (A/D) conversion unit transforms the analog signal to a digital data stream. Audio Output Interface The MSM6679AL-110 also provides the VOICEOUT1 PWM output. The MSM6679AL-110 uses ADPCM to generate voice or sound-effect output. ADPCM represents an improvement over conventional PCM techniques in that it adaptively changes the quantizer step (scale factor) to suit the waveform being encoded. The result is more efficient memory usage with no loss of quality. Careful selection of the components for internal and external output filters and amplifiers is recommended. An incorrect choice would impair the original quality. This consideration equally includes: • Careful separation of analog and digital lines • Grounding of analog lines at both ends • Further adequate separation from high-speed digital circuits to avoid distortions thereof Memory Interface The memory control section manages RAM and/or ROM devices in two 64-Kbyte memory spaces, in conjunction with internal memory for voice templates and working memory. Some versions work with no external memory, some have some external RAM, some use only external EPROM, and some use external memory in conjunction with both internal ROM and RAM. The MSM6679AL-110 requires a minimum of 32 Kbytes SRAM and 16 Kbytes ROM. The following table shows vocabulary sizes and playback facilities for various configurations. 19 MSM6679AL-110 Voice Recognition Processor ¡ Semiconductor Typical Configurations Recognition MSM6679AL-110 Application Controller Vocabulary Sound Playback (Words) (sec) [1] MSM665x MSM6679AL-110 MSM6679AL-110 Playback Speech Speech Interface Record Playback SI SD 25 61[2] 2.3 9.2 OK — OK 50 61[2] 2.3 — OK — OK Internal External Memory Size (bytes) EPROM Flash SRAM 64K — 32K — 128K 32K 25 61 2.3 27.6 OK OK OK Telephone 50 61 2.3 18.4 OK OK OK Dialer 75 61 2.3 — OK OK OK 100 61 2.3 — OK OK OK 61[3] 61 2.3 36.8 OK — OK — — 64-384K 12 61[2] 1.15 OK — — 16K — 32K Computer Peripheral Minimum Configuration 1. 2. 3. Phrase chaining features usually permit much longer overall playback durations; not including external speech synthesizer. SD recognition vocabularies are volatile in these configurations. Per download. Vocabulary swapping by host permits unlimited vocabulary size. The MSM6679AL-110 supports 32 Kbytes of RAM, and up to 64 Kbytes of ROM (EPROM or Flash) per bank in separate memory spaces. For accessing the ROM and RAM address spaces, the MSM6679AL-110 provides the separate Write RAM (WRRAM) and Read RAM (RDRAM) signals. The RDRAM signals connect directly to Output Enable (OE) control signal inputs on the RAM and ROM, respectively. The WRRAM signal connects directly to the Write Enable (WE) control signal input on the RAM. 20 ¡ Semiconductor MSM6679AL-110 Voice Recognition Processor FLASH SRAM 00000 Reserved 04A00 Default Working SD Templates 05480 Working Name Tag Pointer Table 05700 Alternate SD Templates 08000 SI First (F509*) 07300 SD First 07D80 NTP First Name Tag Block Address 08000 000 10000 100 Name Tag Data 18000 SI Last (F501*) 200 1F300 SD Last 1FD80 NTP Last 1FFFF 2F6 2FB 2FF *Denotes commands to select blocks Figure 7. MSM6679AL-110 External Memory Map External Voice Synthesis Control The MSM6679AL-110 is capable of interfacing to the MSM665x family of Oki ROM, OTP, or external EPROM speech synthesizers, allowing for up to 260 seconds of high-quality voice and sound effects. The following table indicates the speech capabilities of the MSM665x family. MSM665x Family Characteristics Type Maximum Speech Duration[2] Data ROM Capacity[1] fSAM = 4.0 kHz fSAM = 6.4 kHz fSAM = 8.0 kHz fSAM = 16.0 kHz fSAM = 32.0 kHz MSM6650 64 Mbits[3] >1 hour >40 minutes MSM6652 288 Kbit 16.9 sec MSM6653 544 Kbit 31.2 sec MSM66P54[4] 1 Mbit MSM6654 1 Mbit >30 minutes >15 minutes >8 minutes 10.5 sec 8.4 sec 4.2 sec 2.1 sec 19.5 sec 15.6 sec 7.8 sec 3.9 sec 63.8 sec 39.9 sec 31.9 sec 15.9 sec 7.9 sec 63.8 sec 39.9 sec 31.9 sec 15.9 sec 7.9 sec MSM6655 1.5 Mbit 96.5 sec 60.3 sec 48.2 sec 24.1 sec 12.0 sec MSM66P56[5] 2 Mbit 129.1 sec 80.7 sec 64.5 sec 32.2 sec 16.1 sec MSM6656 2 Mbit 129.1 sec 80.7 sec 64.5 sec 32.2 sec 16.1 sec MSM6658 4 Mbit 258 sec 161.4 sec 129.1 sec 64.5 sec 32.2 sec 1. Actual ROM area in MSM6652, MSM6653, MSM6654, MSM6655, and MSM6656, MSM6658, MSM66P54, MSM66P56 is smaller by 22 Kbits. 21 MSM6679AL-110 Voice Recognition Processor 2. 3. 4. 5. ¡ Semiconductor Longer speech patterns can be created by chaining and repeating existing speech samples. Via external ROM only (no on-chip ROM available). One-Time-Programmable (OTP) version of MSM6654. See the MSM66P54 data sheet for more information. One-Time-Programmable (OTP) version of MSM6656. See the MSM66P56 data sheet for more information. The MSM665x interface consists of the following signals: • BUSY - Asserted LOW during MSM665x device playback. The MSM6679AL-110 F50Bh and F10100xxh commands select this signal for MSM665x command polling. • NAR - Next Address Request status signal. By default, the MSM6679AL-110 uses this signal to poll commands to the MSM665x. The F51Bh, F480h, and F440h commands select NAR for polling. • SI - Serial Input Clock. • SD - Serial Data Out. • STROBE - Initiates speech synthesis. • RESOUT - Initializes device when asserted LOW. The MSM6679AL-110 F480h command generates this signal. Serial Interface The MSM6679AL-110 supplies a serial interface suitable for connection to an RS-232C serial port buffer or equivalent. The serial interface uses one MSM6679AL-110 input (RXD) and one MSM6679AL-110 output (TXD). The interface operates at 9600 Baud with: • 8 data bits • 1 start bit • 1 stop bit • No parity • No handshake A host processor sends serial ASCII commands to the MSM6679AL-110 and receives serial ASCII responses based on voice input responses. 22 ¡ Semiconductor MSM6679AL-110 Voice Recognition Processor MSM6679AL-110 SLAVE-MODE API This section describes the slave-mode Applications Protocol Interface (API) between a host MCU and the MSM6679AL-110. The slave-mode API offers the following features: • Direct slave-mode control voice recognition, sound recording and playback, and sound synthesis • Serial port interfaces • Simple procedures for downloading and uploading data • ASCII format • Comprehensive return codes and error reporting The host MCU selects the active speech recognition vocabulary, speech responses, and controls all actions required to implement an interactive voice response system. The MSM6679AL-110 performs speech recognition, based on the vocabulary selected by the host, and returns digital codes representing the most probable match of the current utterance to an individual utterance in the selected vocabulary. The MSM6679AL-110 can also respond with “name tags.” Name tags can be fixed words, phrases or sound effects, or can be words, phrases or sound effects that have been interactively recorded by the user. The API supports serial interface. The MSM6679AL-110 returns each response using the same interface through which the most recent message was received. The user can thus connect and use both interfaces. For all messages, the serial interface represents each 8-bit value with two hexadecimal digits coded in ASCII. When downloading and uploading data, the MSM6679AL-110 uses a stream of 8-bit binary values. The serial-mode interface uses a 9600-baud UART with 1 start bit, 8 data bits, and 1 stop bit. There is no parity or handshaking. Serial-interface messages are of variable length, but consist of an even number of bytes. The serial interface echoes all received ASCII characters immediately back to the host MCU. Messages are of variable length. All messages consist of an even number of bytes. Opcodes consist of exactly four bytes, with values between F000h and FEFEh. Operand bytes may take values from 0000h to FFFFh. The MSM6679AL-110 issues a return code for many of the host commands. The return code generally consists of the same opcode, followed by data indicating success of failure of the operation. Opcodes are organized into the following categories: • Purge • Set parameter • Initialize • Recognize • Speak • Request • Record • SD recognition control The following tables summarize available opcodes and provide detailed descriptions of the opcode functions. 23 MSM6679AL-110 Voice Recognition Processor ¡ Semiconductor Command Summary Function Opcode (Hex) Description Default (Hex) Purge F000 Clear MSM6679AL-110 input stack — Set parameter F102 xxxx F103 xxxx F104 xxxx F12x F130 xxxx Set SP/SI origin to xxxx. Set SD origin. Set triggering origin. Set SD SP table to table x. Select triggering table. 8000 4A00 F100 F123 0101, 0202... F2xx mod 80 F2xx mod 40 F2xx mod 20 F2xx mod 10 F2xx mod 8 F2xx mod 4 F2xx mod 2 F2xx mod 1 Initialize background estimation. Wait for F3h command after each response. Beep after each triggered utterance Reserved Set speech response level to default. Send acknowledge after each speech output response. Only detect triggers. Initialize SD parameter table and name tags. Disabled. Disabled. Disabled. Disabled. Enabled. Enabled. Disabled. Load from first FLASH. F300 F301 to F33F F340 F341 Stop listening (recognition). Start SI recognition. Start SD recognition. Sort SD recognition distances, return index to utterance with least distance. Update SD enrollment. Request recognition parameter upload to host. Sort SD recognition distances, return index and distance to utterance with least distance Sort SD recognition distances, return all distances. Sort SD recognition distances, return minimum and maximum energy values. Sort SD recognition distances, return all energy values and distances. — — — — Play back name tag from external memory. Play back sound from internal memory. Play 50-ms beep. Pause for 0.2 sec. Initialize MSM665x IC, set MSM665x busy mode OFF, select FLASH SI recognition. Play back one of 127 phrases in external MSM665x device. Set MSM665x busy mode ON. Set 6654 NAR mode Set output volume (03h = minimum, FEh = maximum). — — — — — Status request. Select last FLASH bank for SI recognition. Select download RAM bank for speaker independent/signal processing (SI/SP) template area. Set MSM6679AL-110 power down mode. — F509 F509 Initialize Recognize F342 F343 F344 F351 F361 F371 Speak F401 to F43D F441 to F47C F47E F47F F480 F481 - F4FF F50B F51B FE03 to FEFE Request F500 F501 F510 F520 24 — — — — — — — OFF ON FE80h — ¡ Semiconductor Function Request Record SD Recognition Control MSM6679AL-110 Voice Recognition Processor Opcode (Hex) Description Default (Hex) F502.... F504 F505 F506 F507 F517 F508 F518 F509 Download/upload. Retrieve MSM6679AL-110 firmware revision. Initialize background (BG) noise level. Retrieve vocabulary and trigger table revision number. Save SD templates from download RAM to first FLASH. Save SDR templates in last FLASH. (4A00-547B→F300-FD7F) Recall SD templates from first FLASH to download RAM. Get SDR Templates from last FLASH (F300-FD7B→4A00-547B) Select first FLASH bank for SI recognition. — 414C — 3039 — — — — F509 F101 00xx F105 F106 F50A F50C F51C 0051 0000 01FF — — — F50D F51D F50E F50F FA01 ~ FA3D Set name tag length, set MSM665x busy mode ON. Set name tag record origin Set name tag record end Clear name tag table in SRAM (5480 - 56FF). Recall last saved name tag table. Recall name tag pointers from last FLASH (FD80-FFFF→5480-56FF) Save name tag table from SRAM to FLASH. Save name tag pointers in last FLASH (5480-56FF→FD80-FFFF) Set record volume high. Set record volume normal (default). Record name tag 01h - 3Dh. F6xx F9xx FB00 FC00 F521 Set SD pointer to segment xxh. Search for SD utterance xxh. Enroll SD utterance selected by search command (F9xx). Erase utterance from SD vocabulary. Clear SDR table (4A00 - 547B) — — — — — — — F50F F50F — Response Summary Command Result after Parameter Set Operands F101h 00 tm F102h AdH AdL F103h AdH AdL F104h AdH AdL F12Xh Description Record time = tm*14 msec. High and low bytes of SP/SI origin address. High and low bytes of SD origin address. High and low bytes of triggering origin address. SP table Xh selected. F280h F240h F220h F210h Initialization Acknowledgment F208h F204h F202h F201h Invalid message received. Sample data over-run. [1] 32-Kbyte block boundary violation error. Unclassified download/upload error. Divide-by-zero error. Select/jump error. Invalid SP header or table. Reserved. Speech Ack Speech acknowledgment. [2] F400h 25 MSM6679AL-110 Voice Recognition Processor ¡ Semiconductor Response Summary (Continued) Command Status [3] SI Recognition Result [5] Operands Description F500h F501h F520h F540h F560h F580h F5A0h F5C0h F5F0h MSM6679AL-110 ready. Operation complete. Operations complete; MSM6679AL-110 disabled (vocabulary 0). MSM6679AL-110 waiting for start command. MSM6679AL-110 waiting for end trigger. MSM6679AL-110 processing recognition. Download/upload in progress. [4] Download/upload complete. Speak output in progress. F600h F6Utt F6 Utt Dst1H Dst1L...DstNH DstNL F6 Utt EminH EminL EmaxH EmaxL F6 Utt Dst1H Dst1L...DstNH DstNL EminH EminL EmaxH EmaxL Aborting SI listen mode. Utt = utterance ID. Utterance ID, high/low byte of distance to utterance 1...utterance N. Utterance ID, high/low byte of min. and max. energy value, Utterance ID, high/low byte of distance to utterance 1...utterance N, high/low byte of minimum energy value, high/low byte of maximum energy value. Trigger detection code (see init command). Rejection: utterance too loud. Rejection: utterance too long. Rejection: utterance begins too soon. Rejection: bad signal/noise ratio. Rejection: reason uncertain. F63Ah F63Bh F63Ch F63Dh F63Eh F63Fh F700h F73Eh F73Fh F740h F341h F7Utt SD Recognition F344h F7Utt DstH DstL F351h F7Utt Dst1H Dst1L... Result DstNH DstNL F361h F7Utt EminH EminL EmaxH EmaxL F371h F7Utt Dst1H Dst1L... DstNH DstNL EminH EminL EmaxH EmaxL Aborting SD Listen mode. After SD utterance search: not found. Rejection. Sort completed. After SD utterance search: empty. Rejection: MSM6679AL-110 SD memory full/empty. After SD utterance search: in use. Utt = Utterance ID triggered. Utterance ID, high/low byte of distance. Utterance ID, high/low byte of distance to utterance 1... utterance N. Utterance ID, high/low byte of minimum energy value, maximum energy value. Utterance ID, high and low byte of distance to utterance 1... distance to utterance N, high and low byte of minimum energy value, maximum energy value. Vector Upload F743h 0000h Upload failure. F743h NH NL V1H V1L...VNH VNL High/low bytes of length of vector, V, high/low byte of first V...Nth V. Trap Error Codes F801h F802h F804h F808h F810h F820h F840h F880h Record Response FA00 26 Reserved. Invalid SP header or table. Select/jump error. Divide-by-zero error. Unclassified download/upload error. Memory full; 32-Kbyte block boundary violation error. Sample data over-run. [1] Invalid message received. Record complete. ¡ Semiconductor 1. 2. 3. 4. 5. MSM6679AL-110 Voice Recognition Processor Sample data overrun issued when real-time SP in Listen mode cannot keep up with incoming samples, i.e., if the A/D signal input routine overwrites a sample data buffer before it is fully processed. This acknowledge is sent only if Init command 1111 0010 xxxx x1xx (F2 xxxx x1xx) is set to enable acknowledgments. These messages are sent in response to a request command (F5XYh) from the host. Upload/download in progress, acknowledging load request immediately before data transfer. If in response to an N-byte download request, the MSM6679AL-110 then receives N bytes (if N is even, or N+1 if N is odd) of data from the host. If N is odd and N+1 bytes are received, only N bytes are written to MSM6679AL-110 memory. If in response to an upload, the MSM6679AL-110 then sends N bytes (if N is even, or N+1 if N is odd) of data to the host. If an utterance was recognized, XYh is the utterance identity or class number, and additional parameters may be appended, if requested in the SI Recog (F3XYh with X=0...3) command. Otherwise, XYh indicates various results as detailed. 27 MSM6679AL-110 Voice Recognition Processor ¡ Semiconductor Command Descriptions Purge Operand F000 Description Purge MSM6679AL-110 Input Stack. This command clears the MSM6679AL-110 input stack of commands that are waiting to be executed. Commands already in progress, such as a pending MSM6654 poll action, are not affected. It does not affect the MSM6679AL-110 output stack. Return Values None Set Parameter Operand F102h XXYYh Description Set SP/SI Recognition Origin. Prior to SD or SI recognition, address pointers must be set to point at the SP or SI recognition parameter tables.This command sets the starting address of SP and SI recognition parameter tables. This address is the location of the first word of a header that contains pointers to one or more individual SP/SI tables. XXYYh = high (XXh) and low (YYh) bytes of requested address. The MSM6679AL-110 uses and returns an even address outside the MSM6679AL-110 work space that is as near as possible to the requested address. Leave this parameter at its default value unless you are using an Oki custom SI vocabulary and are instructed to alter SP/SI recognition origin. Default SP/SI origin: 8000h Return Values [1] F102h XXYYh = High (XXh) and low (YYh) bytes of resultant address. If a valid header is not found at the resultant address, the MSM6679AL-110 immediately sends response code: F802h = Invalid SP/SI header. F103h XXYYh Set SD Recognition Origin [2]. This command sets the SD origin address at the starting address of the current SD recognition parameter table. This command may be used to select among mul-tiple RAM-resident SD vocabulary tables. XXYYh = high (XXh) and low (YYh) bytes of requested address. The MSM6679AL-110 uses and returns an even F103h XXYYh = high (XXh) and address outside the MSM6679AL-110 work space that is as low (YYh) of resultant address. near as possible to the requested address. Leave this parameter at its default value unless you are using an Oki custom vocabulary and are instructed to alter SD recognition origin. The table length is 0A7Ch bytes. Default SD origin: 4A00h F104h XXYYh Set Triggering Origin. This command sets the starting address of triggering parameter tables. This address is the location of the first word of a section of data memory containing one or more contiguous triggering parameter tables. XXYYh = high (XXh) and low (YYh) bytes of requested F104h XXYYh = high (XXh) and address. The MSM6679AL-110 uses and returns an even low (YYh) bytes of resultant address outside the MSM6679AL-110 work space that is as address. near as possible to the requested address. Leave this parameter at its default value unless you are using an Oki custom SI vocabulary and are instructed to alter triggering origin. Default triggering origin: F100h. 28 ¡ Semiconductor MSM6679AL-110 Voice Recognition Processor Set Parameter (Continued) Operand Set SD Recognition SP table. This command sets the SP parameter table number to be used in processing speech input during SD Recognition. The MSM6679AL-110 selects SP table number Z, where Z is the nearest valid value to Y. By default, the MSM6679AL-110 selects SP table 3 until this command is issued. This command selects SP parameters only, and does not select among multiple RAM-resident SD vocabulary tables, which can be independently selected by the Set SD Origin command (F103h). After setting the table number and returning the resultant value, the MSM6679AL-110 checks the validity of the SP header. If the header is invalid, an error message is returned. Set this value to (NSI +1), where NSI is the number of SI subvocabularies. Default SP table: 3. F12Yh F130h VN TN 1. 2. Description Select Triggering Table. This command selects triggering table TN for use with SP table VN. Valid values for VN and TN are between 01h and 0Fh. Leave this parameter at its default value unless you are using an Oki custom SI vocabulary and are instructed to alter the triggering table. Return Values [1] F12Z = SP table Z selected. If the SP header is invalid, a second message follows: F802h = Invalid SP header. F130h f(VN) f(TN) = Triggering table selected. Default = 0101, 0202, 0303... Return value is actual parameter value which may not equal the set parameter value. See also F6XY 29 MSM6679AL-110 Voice Recognition Processor ¡ Semiconductor Initialize F2xx Bit Power-On/ Values Reset Value Action Return Value After power-on, the MSM6679AL-110's mode corresponds to that after issuing a F20C command. This mode may NOT be the optimum condition for most situations, so the user is advised to carefully understand the desired condition and develop a suitable command for the application at hand. In addition, ensure that unwanted bits do not get set or reset when attempting to set individual conditions. The conditions selected are based on the XXh values associated with the last F2 command issued. 1xxx xxxx x1xx xxxx 30 Cleared Cleared Background Noise Initialization. When set to 1, the MSM6679AL110 starts a 500-ms background noise initialization. When set to 0, the MSM6679AL-110 does not perform background noise initialization. The MSM6679AL-110 requires this command prior to recognition for noise vector subtraction during the utterance sampling period. Use the background initialization command whenever there is a change in the background noise level. For example, sample the noise signature in a vehicle at rest and moving at 35 MPH with its windows rolled down. The quality of a phone line connection can also vary from call to call. The host MCU must implement a strategy as to when to issue a background initialization command. In a vehicle, the host MCU could monitor the vehicle speed, fan speed, radio volume, etc. Alternatively, the host MCU could issue this command each time a new recognition session starts or a new line connection is established. However, the 0.5-sec sample period could degrade system responsiveness if used too frequently. A zero in this bit location during the F2XXh command will not cause an initialization. The F505h command causes the same initialization sequence. Wait for Recognition Command/Auto Restart SI Recognition. When set to 1, the MSM6679AL-110 waits for a recognition command after each response. When set to 0, the MSM6679AL110 auto-restarts SI recogni-tion after each response. This bit should be set to 1 when an action is to be taken immediately after an utterance. Auto-restart recognition is the desired mode during digit string recognition, automated tape testing of digits, or in demonstrations where continuous recognition is desired. F501 = Background initialization complete F2XY = Initialization acknowledge. [1] F2XY = Initialization acknowledge. [1] ¡ Semiconductor MSM6679AL-110 Voice Recognition Processor Initialize (Continued) F2xx Bit Power-On/ Values xx1x xxxx xxxx 1xxx xxxx x1xx xxxx xx1x xxxx xxx1 1. Reset Value Action Return Value Cleared Beep After Each Voice Trigger. When set to 1, the MSM6679AL110 beeps after each voice trigger. When set to 0, the MSM6679AL110 does not beep after each voice trigger. These beeps do not cause a F400h message to be issued to the host MCU. When set to 1, the MSM6679AL-110 beep can help a user avoid speaking before the MSM6679AL-110 is ready. This mode is normally used with a digits vocabulary to pace the user and confirm each utterance reception. F2XY = Initialization Instead of using beeps, an external MSM665x speech synthesizer acknowledge. [1] can repeat digits as they are recognized. However, some users find the number repetition annoying. Therefore, firmware could repeat digits during initial usage and switch to beep mode later. Typically, performance improves with time as users learns to speak with the correct enunciation and volumes. The MSM6679AL-110 in this case trains the user. Note that the host MCU can also make the MSM6679AL-110 beep with the F47Eh command. Set Set Output Volume. When set to 1, VOICEOUT1 sound output level is set to half of full volume (80h). When set to 0, voice output level is unaffected. MSM6679AL-110 sound output volume can also be set at any F2XY = Initialization level on a continuous scale from 00h to FEh (low to high) with the acknowledge. [1] FEXXh command. The MSM665x speech synthesizer has four discrete sound output volumes, corresponding to 0h - 20h, 21h 40h, 41h - 80h, and 81h - FEh. Set Send Response Code After Sound Output. When set to 1, the MSM6679AL-110 issues an acknowledge response (F400h) when sound output is completed. When set to 0, the MSM6679AL-110 F2XY = Initialization does not issue an acknowledge response when speech response is acknowledge. [1] completed. Automatic beeps after voice triggers do not cause an F400h command to be issued. Cleared Trigger Detection Only. When set to 1, the MSM6679AL-110 does not sort SI vocabularies for the best match, instead returning F63Ah code when an utterance has been detected. When set to 0, normal recognition is performed. When this bit is set to 1, the host MCU can use the F343h command to upload the recognition parameter vector, so that the host can perform independent processing. F2XY = Initialization acknowledge. [1] Cleared Clear SD Recognition and Name Tag RAM. When set to 1, the MSM6679AL-110 initializes the SD parameter table. When set to 0, existing SD parameters are preserved. After this bit is set to 1, all SD training and name tag pointers are erased. Use this command to start training for a new user. If the old name tags are to be retained, the F50Ch command can recall old name tags from FLASH. To set up for a blank SD and name tag table at the next power-on, issue the command sequence F201h F507h. F2XY = Initialization acknowledge. [1] See the Response Summary table earlier in this section for a complete description of the XY codes in initialization acknowledgment messages. 31 MSM6679AL-110 Voice Recognition Processor ¡ Semiconductor Recognize Opcode F300h Action None Stop Listening. This command causes the MSM6679AL-110 to exit SI or SD Listen mode, F600h whichever was active. F700h Start SI Listen Mode. For all the following opcodes, the MSM6679AL-110 per-forms SI recognition on incoming utterances, using SI vocabulary Y. The vocabulary Y is identified by one of 15 sets, thus Y = 1h ~ Fh. Aborting SD Listen mode. F600h Aborting SI Listen mode. F63Ah Trigger detection code (see Initialization command). F63Bh~F63Fh Rejection. Invalid signal processing table. Sample data overrun. Return recognized phrase using vocabulary number Y. F6h Utt Utterance ID in vocabulary Y. Return recognized phrase and distance table for vocab Y. Utterance ID in vocabulary Y, high and F6h Utt Dst1H Dst1L... low byte of distance to utterance 1... DstNH DstNL distance to utterance N. Return recognized phrase and energy value for vocab Y. Utterance ID in vocabulary Y, high and F6h Utt EminH EminL low byte of minimum and maximum EmaxH EmaxL energy val-ue. Return recognized phrase, distance table, and energy value for vocab Y. F6h Utt Dst1H Dst1L... DstNH DstNL EminH EminL EmaxH EmaxL Utterance ID, high and low byte of distance to utterance 1...distance to utterance N, high and low byte of minimum and maximum en-ergy value. F740 Triggered. F700 Abort SD Listen mode. F73E Rejection. F73F Memory empty. F802 Invalid SP table. F840 Sample data overrun. F73Fh Abnormal response: Memory empty. F341h Return recognized phrase for vocab Y. This command can be issued several times to yield first, second, third best, etc. F7h Utt Utt = Utterance ID. F344h Return recognized phrase and distance for the current vocabulary. F7h Utt DstH DstL Utt = index of recognized phrase, DstH DstL = high/low bytes of distance from nearest phrase. F351 Return recognized phrase and distance table for vocab Y. F7h Utt Utterance ID, high and low byte of Dst1H Dst1L... distance to utt. 1...N. DstNH DstNL F361h Return recognized phrase and energy value for vocab Y. F7h Utt Utterance ID, high and low byte of EminH EminL minimum and maximum energy value. EmaxH EmaxL F32Yh F33Yh Start SD Listen Mode. When an utterance is captured, it is analyzed and converted to a "recognition parameter vector." The host may then command the MSM6679AL-110 to use this vector in various ways (e.g., Sort, Update, or Recognition Vector Upload). SD Recognition Sort. These commands sort the distances between the recognition parameter vector and the reference vectors for the utterances in the current SD vocabulary. 32 Aborting SI Listen mode. F840h F301h F33Fh F31Yh F341h, F344h, F351h, F361h, F371h MSM6679AL-110 was not in Listen mode. F802h F30Yh F340h Return Value ¡ Semiconductor MSM6679AL-110 Voice Recognition Processor Recognize (Continued) Opcode Action F341h, F344h, F351h, F361h, F371h F371h F342h Update SD Recognition Enrollment. This command updates enrollment on utter-ance Utt, immediately after a "F7h Utt" response to the Sort SD Distances command (F341h). Alternatively, the utterance to be updated can be selected by the SD Search command (F9XYh). This command uses the recognition parameter vector from the most recently captured utterance, and does not start SD Listen mode. Generally, update should be performed only if correct utterance identify is confirmed by the user. F343h Return recognized phrase, distance table, and energy value for vocab Y. Recognition Vector Upload. Request recognition parameter vector upload to host. Return Value F7h Utt Dst1H Dst1L... DstNH DstNL EminH EminL EmaxH EmaxL Utterance ID, high and low byte of distance to utterance 1...distance to utterance N, high and low byte of minimum and maximum energy value. F740h Update complete. F743h NH NL V1H V1L... VNH VNL = Success, where NH/NL = high/low bytes of N, N = Length of recognition parameter vector V, V1H/V1L = high/low bytes of first element of V, VNH/VNL = high/low bytes of Nth element. F743h 00 00 Failure. Speak Opcode Action Return Value Speak Phrase from External Memory. This command causes the MSM6679AL-110 to play back a name tag from external memory. If no F401h ~ sound is defined for a selected index, the F400h F43Dh MSM6679AL-110 plays a beep. See the Record commands for information on creating name tags. Speak Phrase from Low Internal Memory. If no sound is defined for a selected index, the MSM6679AL-110 plays a beep. The default phrases supplied with the MSM6679AL-110 in the smaller low playback memory area are F441h ~ listed below. F450h F441h Drip. F442h Buzzer. F443h Dial tone. F444h Bonk. F400h If enabled, this value is returned upon completion of playback. If enabled, this value is returned upon completion of playback. 33 MSM6679AL-110 Voice Recognition Processor ¡ Semiconductor Speak (Continued) Opcode Action Return Value Speak Phrase from High Internal/External Memory. If no sound is defined for a selected index, the MSM6679AL-110 plays a beep. The default phras-es supplied with the MSM6679AL-110 in the larger upper playback memory area are listed below. F451h "0" simulated DTMF tone. F452h "1" simulated DTMF tone. F453h "2" simulated DTMF tone. F451h ~ F454h F47Ch F455h "3" simulated DTMF tone. F456h "5" simulated DTMF tone. F457h "6" simulated DTMF tone. F458h "7" simulated DTMF tone. F459h "8" simulated DTMF tone. F45Ah "9" simulated DTMF tone. F45Bh "*" simulated DTMF tone. F45Ch "#" simulated DTMF tone. F400h If enabled, this value is returned upon completion of playback. "4" simulated DTMF tone. F47D Reserved. This command is reserved for future use. — — F47Eh Beep. This causes the MSM6679AL-110 to beep for 50 ms. F400h If enabled, this value is returned upon completion of playback. F47Fh Pause. This command can be issued while the MSM6679AL-110 is performing sound output and is then put in the MSM6679AL-110 command stack for subsequent processing. F400h When this command is executed, sound output pauses for 0.2 sec. The pause command is useful for word spacing. If enabled, this value is returned upon completion of playback. F480h Set MSM6654 Mode. This command causes the MSM6679AL-110 to initialize None. the external MSM665x device, also clearing the device from BUSY mode. Playback Sound from MSM665x Device. This command causes the MSM6679AL-110 to issue a speak command to the MSM665x slave device. F481h The value is passed on the MSM665x device as F400h F4FFh 01h - 07Fh. The actual phrase is determined by the vocabulary programmed into the MSM665x device. Up to 127 external phrases are supported. F50Bh 34 Set MSM665x Busy Mode ON. None. If enabled, this value is returned upon completion of playback. If NAR is set, the F400h command is sent when the MSM665x device is ready for an-other command. If busy mode is selected, the F400 command is returened when the sound is finished. ¡ Semiconductor MSM6679AL-110 Voice Recognition Processor Speak (Continued) Opcode F51Bh Action Return Value Set 6654 NAR mode. This command, which is the complement of the F50B command, sets up the handshaking to the attached 6654 speech None. synthe-sizer to use the NAR. This setup uses the 6654's double buffer feature to eliminate any gap between two consecutive phases. Set Output Level. This command sets the speech output level to one of 255 values as follows: FEXYh FE03 Set minimum output level. FE80h Set output level half way (default). FEFEh Set maximum output level. None. Request Opcode F500h Action Status Request. This command causes the MSM6679AL-110 to return a 2-byte value indicating its current status. Return Value F500h MSM6679AL-110 ready. F520h MSM6679AL-110 disabled. F540h MSM6679AL-110 waiting for start. F560h MSM6679AL-110 waiting for end. F580h MSM6679AL-110 processing. F5A0h Download/upload in progress. F5C0h Download/upload complete. F5E0h Select/jump complete. F501h Select last FLASH bank for SI recognition. F510h Select download RAM bank for SI/SP template area. This command enables the download RAM bank in the upper 32 K of data memory for SI recognition. No return value F520h Select buffer RAM bank for SI/SP. This command enables the buffer RAM bank in the upper 32 K of data memory for SI recognition. No return value 35 MSM6679AL-110 Voice Recognition Processor ¡ Semiconductor Request (Continued) Opcode Action Download/Upload. Full syntax: F5 02 00 Ctl AdH AdL NH NL [Dt1... DtN [Dt(N+1)]] Full syntax: F5 02 00 Ctl AdH AdL NH NL [Dt1... DtN [Dt(N+1)]] Ctl(7) = 0 for download, Ctl(7) = 1 for upload Ctl(6) = 0 for data RAM, Ctl(6) = 1 for program RAM/ROM If Ctl(6)=0 then Ctl(1-0) = Seg: Data segment selection If Ctl(6)=1 and Ctl(1-0) = x0, then external program segment 0 is used. If Ctl(6)=1 and Ctl(1-0) = x1, then external program segment 1 is used. F502h F504h 36 AdH AdL = high, low bytes of starting address. NH NL = high, low bytes of N N = Number of bytes to be downloaded or uploaded (maximum 07FFCh) Dt1... DtN = Download data. Note (here and in upload response) that data are 8-bit binary values, even if using the serial interface. Dt(N+1). If N is odd, an extra byte is appended to the data so that the total number of bytes in the message remains even. This command requests data transfer to/from data or external program memory.The control parameter (Ctl) controls the direction of the transfer (i.e., download vs. upload) and specifies which of six 64-Kbyte memory segments (i.e., four data segments and two external program segments) is to be accessed. This command does not work with internal program memory. It is not possible to download to external program memory while running in external program memory. The address and length parameters (AdH AdL NH NL) specify the starting address and length of the transfer in bytes. Since the MSM6679AL-110 can only perform download /upload transfers within one 32-Kbyte block in one Download /Upload command, the address and length parameters must not specify a transfer that violates a 32-Kbyte address boundary. If this restriction is violated, the download/upload request will be denied. Retrieve MSM6679AL-110 Firmware Revision Number. Return Value Immediately after receiving parameter NL, the MSM6679AL-110 responds with a message to indicate acceptance or denial of the transfer request. Acceptance is indicated by F5A0h. Denial is indicated by a F8XYh. At the end of an accepted transfer, the MSM6679AL-110 re-sponds with a message to confirm or deny valid completion of the transfer. Valid completion is indicated by F5C0h. F880h Invalid message received. F840h Sample data over-run. F820h 32-Kbyte block boundary violation error. F810h Unclassified download/upload error. F808h Divide-by-zero error. F804h Select/jump error. F802h Invalid SP header or table. F801h Reserved. FAXYh FBXYh Most and least significant byte of ad-dress where error occurred. XXXX Four-digit ASCII number. ¡ Semiconductor MSM6679AL-110 Voice Recognition Processor Request (Continued) Opcode Action Return Value F505h Initialize in Background. Background noise initialization is performed for 500 ms. The MSM6679AL-110 requires this command prior to recognition for noise vector subtraction during the utterance sampling period. Use the background initialization command whenever there is a change in the background noise level. For example, sample the noise signature in a vehicle at rest and moving at 35 MPH with its windows rolled down. The quality of a phone line connection can also vary from call to call. The host MCU must implement a strategy as to when to issue a background initialization F501h command. In a vehicle, the host MCU could monitor the vehicle speed, fan speed, radio volume, etc. Alternatively, the host MCU could issue this command each time a new recognition session starts or a new line connection is established. However, the 0.5-sec sample period could degrade system responsiveness if used too frequently. A zero in this bit location during the F2XXh command will not cause an initialization. The F2xxh command can also be used to perform background noise initialization. Initialization is complete. F506h Retrieve Vocabulary and Trigger Table Revision XXXX Number. Four digit ASCII number. F507h Save SDR templates in last FLASH. Save the download RAM bank SD template area. Saves 2684 bytes from the address set by the F103 command to the address range F300FD7F in the last FLASH. The default is 4A00547B→F300-FD7F). F501h Save is complete. F508h Get SDR templates from last FLASH. Get the download RAM bank SD template area. Saves 2684 bytes to the address set by the F103 command from the address range F300FD7B in the last FLASH. The default is (F300FD7B→4A00-547B). No return value F509h Select Default SI Vocabulary. (First FLASH) — — 37 MSM6679AL-110 Voice Recognition Processor ¡ Semiconductor Record Opcode Action F101h 00XXh Set Name Tag Length, Set MSM665x Busy Mode ON. Name tag record length is set by XXh, with XXh defining record length in 14-ms intervals. The maximum record length of FFh yields a recording interval of 3.57 sec. The default value is 1.2 sec. F101h 00XXh Operation complete. F105 xxxx Set Name Tag Record Origin. This command sets the beginning address for recording name tags. XXXX = 128 byte blocks from 0000 to 02FF. The reset default is 0000. This is only effective before an F50A command since new recordings start after the end of the previous recording. The F50A command uses this num-ber to calculate the first address. F105 BAAA, where B is the bank num-ber (0,1,2), and AAA is the bank ad-dress /16 (800 - FF8) F106 xxxx F106 BAAA, where B is the Set Name Tag Record End. This command sets bank num-ber the ending address for recording name tags. (0,1,2), and XXXX = 128 byte blocks from 0000 to 02FF. AAA is the The reset default is 01FF. bank ad-dress /16 (800 - FF8) F50Ah Clear Name Tag Table. F50Ch Recall name tag pointers from first FLASH. Save the first FLASH name tag pointers (FD80 F501h FFFF) to the working name tag pointer table. The default is (FD80-FFFF→5480-56FF). Saved name tag table recalled. F51Ch Recall name tag pointers from last FLASH. Save the last FLASH name tag pointesr (FD80 FFFF) to the working name tag pointer table. The default is (FD80-FFFF→5480-56FF). F501h Name tag pointers recalled. F50Dh Save name tag pointers in first FLASH. Save the working name tag pointer table to the first FLASH name tag pointers. The default is (5480 -56FD→FD80-FFFD). F501h Name tag table saved. F51Dh Save name tag pointers in last FLASH. Save the working name tag pointer table to the last F501h FLASH name tag pointers. The default is (5480 -56FD→FD80-FFFD). Name tag pointers saved. F50Eh Set Record Volume HIGH. — — F50Fh Set Record Volume to Normal. This is the default setting. — — FA00h Reserved. This command is reserved for future use. — — FA00h Completed. F280h Memory full. FA01h ~ FA3Dh Record Name Tag. 38 Return Value F501h Name tag table cleared. ¡ Semiconductor MSM6679AL-110 Voice Recognition Processor Record (Continued) Opcode Action FA3Dh ~ Reserved. These commands are reserved for FAFFh future use. Return Value — — SD Recognition Control Opcode Action Return Value Recognition performance is largely a function of how well the enrollment data represents subsequent tokens of the enrolled utterances, and performance generally improves steadily with each additional enrollment pass. For most applications, three initial enrollment passes are recommended. Subsequent reference updating can be performed with the SD Recognize Update command (F342). F521h Clear SDR table. This command initializes a blank SD template table. The 2684-byte area from the address set by the F103 command (the working SDR table) is set to zeros. The SDR tables in the FLASH banks are not affected. The default is (4A00 - 547B). F501h F6XYh Set SD Segment Pointer. This command sets the SD segment pointer to XY00h, i.e., set the starting address of the current SD recognition parame-ter table to XY00h. Issuing this command is equivalent to issuing the Set SD Origin command, F103h XY00h. (For further details of operation, please refer to the description of that command.) No return value. F9XYh Search for SD Utterance XY. This is the first F740h step in adding an utterance to the vocabulary, or in replacing an existing one. The SD vocabulary memory is searched for utt. no. XYh. F700h If it is not found and if sufficient SD memory exists, the MSM6679AL-110 prepares to add F73Fh utterance number XYh to the vo-cabulary. FB00h FC00h Enroll SD Utterance. This command starts MSM6679AL-110 SD Listen mode, then uses the next captured utterance to start or update training of the reference data for SD utterance number XY specified in the most recent Search command (F9XYh). The user must be prompted to say the utter-ance prior to issuing this command. If the utterance was previously enrolled, a training update is performed; if not, the reference data is initialized. Each utterance in the SD vocabulary must be enrolled at least once before it can be recognized. SDR table is cleared Utterance number found. Utterance number not found. Memory full. F740h Operation complete. F700h Aborting SD Listen mode. F73Eh Improper level, must repeat. F802h Invalid signal processing table. F840h Sample data overrun. Erase utterance from SD vocabulary. This command erases the reference parameters for utterance number XYh from the SD vocabulary, F740h where XYh is the utterance number retained from the previous Search command (F9XYh). Operation complete. 39 MSM6679AL-110 Voice Recognition Processor ¡ Semiconductor Asynchronous Serial Protocol Example All messages to the MSM6679AL-110 (except downloads and uploads) are echoed, but replies from the MSM6679AL-110 to the host are not echoed by the host. This arrangement facilitates manual communication with the MSM6679AL-110 using standard terminals. The following table illustrates the range of MSM6679AL-110 functions. Comment Action Voice Input Host MSM6679AL-110 Command Response Initialize MSM6679AL-110 Host initializes MSM6679AL-110. MSM6679AL-110 acknowledges. F258 F258 F200 Load trigger tables at 5000h. Host requests download to data segment 0, starting at location 5000h, of 256 bytes (0100h). MSM6679AL-110 accepts request. Host sends 256 bytes (~0.25 sec at 9600 baud). MSM6679AL-110 indicates download complete. F502 0000 5000 0100 F502 0000 5000 0100 F5A0 Set new triggering origin. Host requests Set triggering origin to 5000h. MSM6679AL-110 sets triggering origin and sends confirming response. F104 5000 F104 5000 F104 5000 Download new SD vocabulary. Host requests download to data segment 0, starting at location 6000h, of 4 Kbytes (1000h). MSM6679AL-110 accepts request. Host sends 4 Kbytes (~4.3 sec at 9600 baud) MSM6679AL-110 indicates download complete. F502 0000 6000 1000 F502 0000 6000 1000 F5A0 40 ... F5C0 ... F5C0 ¡ Semiconductor Comment MSM6679AL-110 Voice Recognition Processor Action Voice Input Host MSM6679AL-110 Command Response Set new SD tables. Host requests Set SD origin to 6000h. MSM6679AL-110 sets SD origin and responds. F103 6000 F103 6000 F103 6000 Download first 4 K of SI vocabulary. Host requests download to data segment 0, starting at location 7000h, of 4k bytes (1000h). MSM6679AL-110 accepts request. Host sends 4 Kbytes. MSM6679AL-110 indicates download complete. F502 0000 7000 1000 F502 0000 7000 1000 F5A0 Host requests download to data segment 0, starting at location 8000h, of 32k bytes (7FFC). MSM6679AL-110 accepts request HOST sends 32 Kbytes. MSM6679AL-110 indicates download complete. F502 0000 8000 7FFC Set new SP/SI tables. Host requests Set SP/SI origin = 7000h. MSM6679AL-110 sets SP/SI origin and responds. F102 7000 F102 7000 F102 7000 Upload data for diagnostics. Host requests upload from data segment 0, starting at location 300h, of 45 bytes (2Dh). MSM6679AL-110 accepts request, signals in progress. MSM6679AL-110 sends 46 bytes. MSM6679AL-110 indicates upload complete. F502 00A0 0300 002D F502 00A0 0300 002D F5A0 Host requests set SP table 3. MSM6679AL-110 selects SP table 3 and confirms. Host initializes MSM6679AL-110. MSM6679AL-110 acknowledges. F123 F123 F123 F258 F258 F200 F301 F301 F302 F603 F302 F302 F602 F302 Download last 32 K of SI vocabulary. Set up MSM6679AL-110 for SI recognition. SI recognition. ... F5C0 F502 0000 8000 7FFC F5A0 ... F5C0 ... F5C0 Host starts SI recognition, vocabulary 1. "Dial" MSM6679AL-110 recognizes utterance 3. Host starts SI recognition, vocabulary 2. "Two" MSM6679AL-110 recognizes utterance 2. Host starts SI recognition, vocabulary 2. "Three" MSM6679AL-110 recognizes utterance 3. F603 41 MSM6679AL-110 Voice Recognition Processor Comment SI recognition. Action ¡ Semiconductor Voice Input Host F301 Host starts SI recognition, vocabulary 1. MSM6679AL-110 Command Response F301 "Store" F601 MSM6679AL-110 recognizes utterance 1. SD enrollment. Get ready to train SD utterance 1. Memory is empty and ready to train. Pass 1; host sends SD enroll command. F901 FB00 F901 F700 FB00 "John Smith" SD utterance 1 initialized. Pass 2; host sends SD enroll command. FB00 "John Smith" SD utterance 1 updated. Pass 3. Host sends SD enroll command. F740 FB00 F740 FB00 FB00 "John Smith" F740 SD utterance 1 updated. SI recognition of control words. Host starts SI recognition, vocabulary 1. F301 F301 F302 F603 F302 F302 F605 F302 F301 F606 F301 "Dial" MSM6679AL-110 recognizes utterance 3. Host starts SI recognition, vocabulary 2. "Five" MSM6679AL-110 recognizes utterance 5. Host starts SI recognition, vocabulary 2. "Six" MSM6679AL-110 recognizes utterance 6. Host starts SI recognition, vocabulary 1. "Store" MSM6679AL-110 recognizes utterance 7. SD enrollment. F601 Host prepares MSM6679AL-110 to train SD utterance 2 Memory is empty and ready to train. Pass 1; host sends SD enroll command. F902 F902 FB00 F700 FB00 FB00 F740 FB00 FB00 F740 FB00 "Bill Jones" SD utterance 2 initialized. Pass 2; host sends SD enroll command. "Bill Jones" MSM6679AL-110 updates SD utterance 2. Pass 3; host sends SD enroll command. "Bill Jones" MSM6679AL-110 signals operation completed. SI recognition of control word. F740 Host starts SI recognition, vocabulary 1. F301 MSM6679AL-110 recognizes utterance 3. SD recognition. F301 "Directry" F603 Host starts SD recognition. F340 F340 F341 F740 F341 F701 "John Smith" MSM6679AL-110 signals trigger OK. Host sends SD sort command. MSM6679AL-110 recognizes utterance 1. 42 ¡ Semiconductor Comment Name tag recording. MSM6679AL-110 Voice Recognition Processor Action Voice Input Host initiates MSM665x port. Host sets recording length to 1 sec. MSM6679AL-110 signals operation complete. Host clears name tag table MSM6679AL-110 signals operation complete. Host sets record gain to max. level. Start recording tag one. Host MSM6679AL-110 Command Response F480 F101 0047 F50A F50E FA01 F480 F101 0047 F101 0047 F50A F501 F50E FA01 "Jane Doe" MSM6679AL-110 signals name tag recording complete. Save name tags to FLASH. Name tags saved. Name tag playback. FA00 F50D F50D F501 FEFF F401 FEFF F401 "Jane Doe" F400 Host sets output volume to mid point. Play MSM6679AL-110 internal sound 1. FE80 F442 Play back sound from MSM6654. F49F FE80 F442 "bzzzz" F49F "Completed" Host sets volume to max. level. Host commands play back name tag 1. MSM6679AL-110 signals playback OK. Sound playback. 43 E2Y0001-28-41 MSM6679AL-110 Voice Recognition Processor ¡ Semiconductor NOTICE 1. The information contained herein can change without notice owing to product and/or technical improvements. Before using the product, please make sure that the information being referred to is up-to-date. 2. The outline of action and examples for application circuits described herein have been chosen as an explanation for the standard action and performance of the product. When planning to use the product, please ensure that the external conditions are reflected in the actual circuit, assembly, and program designs. 3. When designing your product, please use our product below the specified maximum ratings and within the specified operating ranges including, but not limited to, operating voltage, power dissipation, and operating temperature. 4. Oki assumes no responsibility or liability whatsoever for any failure or unusual or unexpected operation resulting from misuse, neglect, improper installation, repair, alteration or accident, improper handling, or unusual physical or electrical stress including, but not limited to, exposure to parameters beyond the specified maximum ratings or operation outside the specified operating range. 5. Neither indemnity against nor license of a third party's industrial and intellectual property right, etc. is granted by us in connection with the use of the product and/or the information and drawings contained herein. No responsibility is assumed by us for any infringement of a third party's right which may result from the use thereof. 6. The products listed in this document are intended for use in general electronics equipment for commercial applications (e.g., office automation, communication equipment, measurement equipment, consumer electronics, etc.). These products are not authorized for use in any system or application that requires special or enhanced quality and reliability characteristics nor in any system or application where the failure of such system or application may result in the loss or damage of property, or death or injury to humans. Such applications include, but are not limited to, traffic and automotive equipment, safety devices, aerospace equipment, nuclear power control, medical equipment, and life-support systems. 7. Certain products in this document may need government approval before they can be exported to particular countries. The purchaser assumes the responsibility of determining the legality of export of these products and will take appropriate and necessary steps at their own expense for these. 8. No part of the contents cotained herein may be reprinted or reproduced without our prior permission. Copyright 1998 Oki Electric Industry Co., Ltd. Printed in Japan 44