ETC RC86L50-3

RC SYSTEMS
RC8650 VOICE SYNTHESIZER
DoubleTalk RC8650
CMOS, 3.3 Volt / 5 Volt
Voice Synthesizer Chipset
GENERAL DESCRIPTION
The RC8650 is a highly versatile voice and sound synthesizer, integrating a text-to-speech (TTS) processor, real time and prerecorded audio
playback, musical and sinusoidal tone generators, telephone dialer
and A/D converter, into an easy to use chipset. Using a standard serial
or eight bit bus interface, virtually any ASCII text can be streamed to
the RC8650 for automatic conversion into speech by the TTS processor. The audio playback modes augment the TTS processor for applications requiring very high voice quality and a relatively small, ﬁxed
vocabulary, or applications requiring special sounds or sound effects.
The audio output is delivered in both analog and digital PCM audio formats, which can be used to drive a speaker or digital audio stream.
The RC8650’s integrated TTS processor incorporates RC Systems’
DoubleTalk™ TTS technology, which is based on a patented voice
concatenation technique using real human voice samples. The DoubleTalk TTS processor also gives the user unprecedented real-time
control of the speech signal, including pitch, volume, tone, speed,
expression, articulation, and so on.
ﬁned, or even trigger the playback of tones, prerecorded messages
and sounds based on speciﬁc input patterns. All of these features can
be programmed and updated via a standard serial port, even in the
ﬁeld after the RC8650 has been integrated into the end-product.
Up to 3.5 MB of nonvolatile memory is included in the RC8650 for the
storage and on-demand playback of up to 15 minutes of prerecorded
messages and sound effects. A programmable “greeting” message
can be stored that is automatically played whenever the RC8650 is
powered up, allowing a custom message to be played or the RC8650’s
default settings to be reconﬁgured. A user-programmable dictionary
allows the pronunciation of virtually any character string to be rede-
The RC8650 is comprised of two surface-mounted devices. Both operate from a +3.3 V or +5 V supply and consume very little power. Most
applications require only the addition of a lowpass ﬁlter/audio power
ampliﬁer to implement a fully functional system.
RC8650 FUNCTIONAL BLOCK DIAGRAM
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
�
��
��
��
��
��
��
��
��
��
�
��
��
��
�
�
�
��
��
��
��
�
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
DoubleTalk RC8650 User’s Manual Rev 2G
Revised 06/16/03
�
�
��
��
��
��
��
��
1
© 1999-2003 RC Systems, Incorporated
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
FEATURES
APPLICATIONS
• Integrated text-to-speech processor:
–
–
–
–
–
•
•
•
•
•
•
•
•
•
•
•
•
•
High voice quality, unlimited vocabulary
Converts any ASCII text into speech automatically
Capable of very high reading rates
Add/modify messages by simply editing a text ﬁle
On-the-ﬂy control of speed, pitch, volume, etc.
• Playback of sound ﬁles:
– Real-time PCM and ADPCM
– Prerecorded on chip, up to 15 minutes
• Tone generation:
– Three voice musical
– Dual sinusoidal
– DTMF (Touch-Tone) dialer
• On-chip A/D converter:
– Four channels, 8-bit resolution
– One-shot, continuous, single sweep, and
continuous sweep modes of operation
– Software and hardware triggering
– Support for external op amp
•
•
•
•
•
Analog and digital audio outputs
•
•
•
•
In-circuit, ﬁeld programmable
Robotics
Talking OCR systems
ATM machines
Talking pagers and PDAs
GPS navigation systems
Vending and ticketing machines
Remote diagnostic reporting
Dial-up information systems
Handheld barcode readers
Electronic test and measurement
Security systems
Aids for the orally or visually disabled
Meeting federal ADA requirements
RC8650 Product Summary
Stop, pause, and resume controls
��
��
Serial and bus interfaces
User programmable greeting and default settings
Flexible user exception dictionary:
– Change the pronunciation of any input string based on
spelling and context
– Convert encrypted data into meaningful messages
– Trigger tone generation, recorded message playback,
voice parameter changes
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
* Based on 8 kHz sampling rate with ADPCM encoding
2 KB input buffer for virtually no-overhead operation
Available in 3.3 V and 5 V versions
Low power (typ @ 3.3 V):
– 8.8 mA active
– 700 µA idle
– 2 µA standby
2
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
TYPICAL APPLICATION CIRCUIT
��
��
��
�
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
�
��
��
��
��
��
��
��
��
��
��
�
�
��
��
�
�
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
�
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
�
��
��
��
��
��
��
��
��
��
�
�
�
�
�
�
�
�
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
�
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
�
�
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
�
��
3
� �
��
� �
�
�
��
��
��
��
�
�
�
�
��
��
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
SECTION 1: SPECIFICATIONS
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
PINOUTS
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
�
�
�
�
�
�
�
�
�
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
Figure 1.1. Pin Assignments
4
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
� ��
��
�
��
�
��
�
��
�
��
�
��
�
��
�
��
�
��
��
��
��
��
��
��
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
PIN DESCRIPTIONS
Table 1.1. Pin Descriptions
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
5
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
Table 1.1. Pin Descriptions (Continued)
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
6
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
Table 1.1. Pin Descriptions (Continued)
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
7
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
FUNCTIONAL DESCRIPTION
Versatile I/O
The RC8650 chipset includes a number of features that make it ideally suited for any design requiring voice output. The RC8650’s major
features are described below.
All data is sent to the RC8650 through its built in serial and/or parallel
ports. For maximum ﬂexibility, including inﬁeld product update capability, use of the serial port is recommended whenever possible.
The RC8650’s audio output is available in both analog and digital
formats. The analog output should be used in applications where
no further processing of the audio signal is required, such as driving
a speaker or headphones (the output still needs to be ﬁltered and
ampliﬁed, however). The digital output is for applications that require
further processing of the audio signal, such as digital mixing or creating sound ﬁles for later playback.
Text-to-Speech Synthesizer
The RC8650 provides text-to-speech conversion with its integrated
DoubleTalk™ text-to-speech synthesizer. Any English text written to
the RC8650 is automatically converted into speech. Commands can
be embedded in the input stream to dynamically control the voice,
even at the phoneme level (phonemes are the basic sound units of
speech).
A greeting message can be stored in the RC8650 that is automatically spoken immediately after the RC8650 is reset. Most any of the
commands recognized by the RC8650 may be included as part of the
greeting message, which can be used to set up custom default settings and/or play a prerecorded message or tone sequence. An integrated nonvolatile memory area is also provided for storing a custom
pronunciation dictionary, allowing the pronunciation of any character
string to be redeﬁned.
RECOMMENDED CONNECTIONS
Power/Ground
Power and ground connections are made to multiple pins of the
RC8650 and RC46xx chips. Every VCC pin must be connected to
power, and every VSS pin must be connected to ground. To minimize
noise, the analog and digital circuits in the RC8650 use separate
power busses. These busses are brought out to separate pins and
should be tied to the supply as close as possible.
Musical Tone Generator
Make sure adequate decoupling is placed on the AVREF pin, as noise
present on this pin will also appear on the AO output pins and affect
A/D converter accuracy. In systems where the power supply is very
quiet, AVREF can be connected directly to VCC. Designs incorporating
a switching power supply, or supplies carrying heavy loads, may require ﬁltering at the AVREF pin; a 150 Ω series VCC resistor in combination with a 100 µF capacitor to ground should sufﬁce.
An integrated, three-voice musical tone generator is capable of generating up to three tones simultaneously over a four-octave range.
Simple tones to attention-getting sounds can be easily created.
Touch-Tone Generator
The RC8650 includes an integrated DTMF (Touch-Tone) generator.
This is useful in telephony applications where standard DTMF tones
are used to signal a remote receiver, modem, or access the public
switched telephone network.
Connect any unused input pins to an appropriate signal level (see
Table 1.1). Leave any unused output pins and all NC pins unconnected.
Sinusoidal Tone Generator
Chip Interconnects
A precision, dual sinusoidal tone generator can synthesize the tones
often used in signaling applications. The tone frequencies can be
independently set, allowing signals such as call-progress tones to be
generated.
Pins IC0 through IC32 and PIO0 through PIO7 must be connected between the RC8650 and RC46xx chips. IC30, IC31, and IC32 must have
47 kΩ – 100 kΩ pullup resistors to VCC.
Recorded Audio Playback
Clock Generator
The RC8650 has an internal oscillator and clock generator that can be
controlled by an external 7.3728 MHz crystal, ceramic resonator, or
external 7.3728 MHz clock source. If an external clock is used, connect it to the XIN pin and leave XOUT unconnected. See Figure 1.2 for
recommended clock connections.
Up to 15 minutes of prerecorded messages and sound effects can
be stored in the RC8650 for on-demand playback. Recordings are
stored in on-chip nonvolatile memory, providing zero-power message
storage. Additionally, the RC8650 can play eight-bit PCM and ADPCM
audio in real time, such as speech and/or sound effects stored in an
external memory or ﬁle system.
Analog-to-Digital Converter
��
��
The four channel, 8-bit A/D converter can be used to monitor battery
cell voltages, temperature, and other analog quantities. The ADC can
be programmed on the ﬂy to convert any single channel, or scan up
to four channels repetitively.
��
��
��
��
��
��
��
��
��
��
��
��
��
��
Figure 1.2. Clock Connections
8
��
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
INTERFACING THE RC8650
Table 1.2. Baud Rate Options
The RC8650 contains both asynchronous serial and 8 bit bus interfaces. All text, commands, tone generator data, real time audio data, etc.,
are transmitted to the RC8650 via one of these ports. For maximum
ﬂexibility, use of the serial port is recommended whenever possible.
Not all RC8650 functions are supported through the bus interface. In
particular, index markers, operating system updates, chipset identiﬁcation, current operating settings, and A/D conversion are only supported through the serial interface.
��
��
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
Serial Interface
The serial port operates with 8 data bits (LSB ﬁrst), 1 or more stop bits,
no parity, and any standard baud rate between 300 and 115200 bps.
In order for the RC8650 to determine the incoming baud rate, there
must be at least one isolated “1” or “0” in the input character. The CR
character, 0Dh, is recommended for locking the baud rate. The character is not otherwise processed by the RC8650; it is discarded.
The CTS# pin should be used to control the ﬂow of serial data to the
RC8650. It is not necessary to check CTS# before transmitting every
byte, however. All data is routed through a high speed 16-byte buffer
within the RC8650 before being stored in the primary buffer. CTS# may
be checked every eight bytes with no risk of data loss.
If the measured bit period is determined to be a valid baud rate, the
RC8650 acknowledges lock acquisition by transmitting the ASCII character “l” (6Ch) on the TXD pin. The baud rate will remain locked unless
changed with the baud rate command, or the RC8650 is reset.
Baud rate selection
The serial port’s baud rate can be programmed using any of three
methods: pin strapping, auto-detect, and by command. Pin strapping
sets the baud rate according to the logic levels present on the BRS0–
BRS2 pins, as shown in Table 1.2. Auto-detect enables the serial port
to automatically detect the baud rate of the incoming data. The baud
rate command (described in Section 2) allows the baud rate to be
changed at any time, effectively overriding the ﬁrst two methods. Pin
strapping cannot be used to program baud rates higher than 19200; to
do this, auto-detection or the baud rate command must be used.
��
��
��
��
��
��
��
��
�
��
��
��
��
��
Figure 1.4. Baud Rate Detection Timing
��
��
�
��
��
��
��
��
��
��
��
��
��
��
��
��
��
�
��
��
�
��
�
��
��
�
�
Figure 1.3. RS-232C Interface
9
��
��
�
��
��
��
��
�
��
�
��
��
��
��
��
��
��
��
��
��
��
The automatic baud rate detection mechanism is enabled when the
BRS0–BRS2 pins are all at a High logic level and the BRD pin is connected to the RXD pin. The baud rate is determined by the shortest
��
��
��
��
��
��
��
��
��
High or Low period detected in the input stream. This period is assumed to be the bit rate of the incoming data. In addition to the baud
rates listed in Table 1.2, auto-detect mode also supports 38400,
57600, and 115200 baud rates.
A typical RS-232C interface is shown in Figure 1.3. Note that the
MAX232A transceiver is not required if the host system’s serial port
operates at logic levels compatible with the RC8650 (0/+5 or 0/+3.3
V). The RC8650’s serial port may be connected directly to the host
system in this case.
��
��
��
��
��
��
��
��
�
�
�
�
�
�
�
�
�
��
��
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
Note The measurement cycle ends when there have been no Highto-Low nor Low-to-High transitions on the BRD pin for 75 ms or longer.
Consequently, the RC8650 will ignore any data sent to it for a period of
75 ms after the “lock-on” character has been received. The CTS# pin
is driven High during this time, and the acknowledgment character is
not transmitted until the RC8650 is actually ready to accept data. See
Figure 1.4.
Because the RC8650 can take up to 15 µs to accept data written to
it (AC Characteristics, tYHWH parameter), software drivers should wait
for RDY to drop to 0 after a byte is written in order to avoid overwriting
it with the next data byte. Not doing so could result in the loss of data.
Waiting for RDY to drop to 0 ensures that RDY will not falsely show that
the RC8650 is ready the next time the driver is called.
If a system interrupt can occur while waiting for RDY to become 0, or if
RDY cannot otherwise be checked at least once every 8 µs, a software
timeout should be enforced to avoid hanging up in the wait loop. The
time RDY stays 0 is relatively short (8 µs min.) and can be missed if interrupted. The timeout should be at least 15 µs, which is the maximum
time for RDY to drop to 0 after writing a byte of data. In non time-critical applications, the output routine could simply delay 15 µs or longer
before exiting, without checking for RDY = 0 at all.
Status messages
Real-time status information is provided via the TXD pin. Status are
transmitted as one-byte messages, shown in Table 1.3. Each message correlates to a status ﬂag in the Status Register, shown in Table
1.4. The speciﬁc character used, and whether it will be transmitted,
are functions of the VC and STM bits of the Protocol Options Register.
(The Protocol Options Register is described in Section 2.) For information about how to obtain reading-progress status, see the Index Marker
command description.
Figure 1.5 illustrates the recommended method of writing data to the
RC8650’s bus interface. This method should be used for writing all
types of data, including text, commands, tone generator and real time
audio data.
Table 1.3. Status Messages
��
��
��
��
��
��
��
��
��
�
��
��
�
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
�
��
��
��
��
��
��
Bus/Printer Interface
The RC8650’s bus interface allows the RC8650 to be connected to a
microprocessor or microcontroller in the same manner as a static RAM
or I/O device, as shown in Figure 1.6. The microprocessor controls
all transactions with the RC8650 over the system data bus using the
RD and WR# signals. RD controls the reading of the RC8650’s Status
Register; WR# controls the transfer of data into the RC8650. The Status
Register bits and their deﬁnitions are shown in Table 1.4.
��
��
��
��
�
A registered bus transceiver is required for communication between
the RC8650 and microprocessor; two 74HCT374s placed back to
back may be substituted for the 74HCT652 shown in the ﬁgure. Prior
to each write operation to the RC8650, the host processor should verify
that the RC8650 is ready by testing the RDY status ﬂag.
��
��
��
��
�
��
��
The RC8650 can also be interfaced to a PC’s printer port as shown
in Figure 1.6. A 74HCT374 can be used in place of the 74HCT652,
since bidirectional communication is not necessary. Handshaking is
performed automatically via the BUSY pin.
Figure 1.5. Recommended Method of Writing Data Via the Bus Interface
10
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
Table 1.4. Bus Interface Status Register Bit Deﬁnitions
�
��
�
��
��
��
��
�
�
�
�
�
�
�
�
�
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
�
�
�
�
�
�
��
��
��
��
�
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
�
��
��
��
��
��
��
��
��
��
��
�
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
Figure 1.6. Bus/Printer Interface
11
�
�
�
�
�
�
�
�
�
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
��
��
�
��
��
��
��
��
��
��
��
��
��
�
�
�
�
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
�
�
�
�
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
Figure 1.7. Method of Capturing Status Information for Driving External Circuitry
The ampliﬁer’s shutdown pin can be controlled by the TS0 pin to minimize current drain when the RC8650 is inactive.
Analog Audio Output
The analog output pins AO0 and AO1 are high impedance (10 kΩ
typical) outputs from the RC8650’s internal D/A converters. When using these outputs, the addition of an external low-pass ﬁlter is highly
recommended. When laying out the printed circuit board, avoid running digital lines near the AO lines in order to minimize induced noise
in the audio path. If space permits, run a guard ground next to the
AO traces.
Digital Audio Output
The digital audio pin DAOUT outputs the RC8650’s audio signal as
a digital audio stream consisting of 8 data bits per sample. The normalized sampling rate for all text to speech modes and the DTMF
generator is 84 kbs (10,500 bytes/sec). The sinusoidal generator, prerecorded and real time audio playback mode rates are user programmable, so their normalized rates will vary. See the Pin Descriptions and
Audio Control Register command description for further details.
The circuit shown in Figure 1.8 is a low-pass ﬁlter/power ampliﬁer capable of delivering 1.1 W to an 8 Ω load, when operating from a +5 V
power supply (power output will be less when operating from +3.3 V).
��
��
��
��
��
��
��
��
� �
��
��
��
�
��
��
��
� �
��
�
�
�
�
�
��
�
��
��
Figure 1.8. 3 kHz Low-Pass Filter/Power Ampliﬁer
12
��
��
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
ELECTRICAL SPECIFICATIONS
��
��
��
�
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
�
�
��
��
��
��
�
��
��
��
��
��
��
��
��
��
��
��
�
�
��
��
�
�
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
�
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
�
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
�
�
�
�
�
�
�
�
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
Figure 1.9. Test Circuit
* WARNING: Stresses greater than those listed under “Absolute Maximum Ratings” may cause permanent damage to the device. This is a
stress rating only; operation of the device at any condition above those
indicated in the operational sections of these speciﬁcations is not implied. Exposure to absolute maximum rating conditions for extended
periods may affect device reliability.
ABSOLUTE MAXIMUM RATINGS*
Supply voltage, VCC and AVCC . . . . . . . . . . . . . . . . . . –0.3 V to +6.5 V
DC input voltage, VI . . . . . . . . . . . . . . . . . . . . . . . –0.3 V to VCC +0.3 V
Operating temperature, TA . . . . . . . . . . . . . . . . . . . . . . . 0 °C to +70 °C
Storage temperature, TS . . . . . . . . . . . . . . . . . . . . . –55 °C to +125 °C
13
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
DC CHARACTERISTICS
TA = 0 °C to +70 °C, VCC = AVCC = AVREF = 3.3 V / 5 V, VSS = AVSS = 0 V, XIN = 7.3728 MHz
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
�
��
�
�
�
�
�
��
��
��
��
��
��
��
��
��
��
��
�
�
�
��
��
��
��
��
��
��
��
�
�
��
�
��
��
��
��
��
��
��
��
��
��
��
��
AC CHARACTERISTICS
TA = 0 °C to +70 °C, VCC = AVCC = AVREF = 3.3 V / 5 V, VSS = AVSS = 0 V
External Clock Input Timing
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
Figure 1.10. External Clock Waveform
14
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
Bus Interface Timing
��
��
��
��
��
��
��
��
��
��
��
��
��
��
�
�
��
��
��
��
��
��
��
�
��
��
�
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
�
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
Figure 1.11. Bus Interface Waveforms
15
��
��
��
��
��
��
��
��
��
��
�
��
��
��
��
��
��
��
��
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
Analog Audio Timing
��
��
��
��
��
��
Figure 1.12. Analog Audio Waveforms
Digital Audio Timing
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
�
��
��
��
��
��
��
��
��
Figure 1.13. Digital Audio Waveforms
16
��
��
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
Standby Timing
��
��
��
��
��
��
��
��
��
�
��
��
��
��
��
�
��
��
��
��
��
��
��
Figure 1.14. Standby Waveform
Reset Timing
��
��
��
��
��
��
��
��
��
�
�
��
��
�
��
��
��
��
��
��
��
��
Figure 1.15. Reset Waveform
17
��
��
��
��
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
PACKAGE INFORMATION
100 Pin Plastic 14 x 20 mm QFP (measured in millimeters)
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
18
��
��
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
48 Pin Plastic 12 x 20 mm TSOP (measured in millimeters)
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
Recommended PCB Layouts (measured in millimeters)
��
��
��
��
��
��
��
��
19
��
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
ORDERING INFORMATION
The RC8650 is available in several audio capacity and voltage ranges. The ordering part number is formed by combining several ﬁelds, as indicated
below. Refer to the “Valid Combinations” table, which lists the conﬁgurations that are planned to be supported in volume. All conﬁgurations include
the RC8650AFP chip; the companion chip is shown in parentheses. For example, the RC8650-1, a 5 V part with 130 seconds of recordable audio
memory, is composed of the RC8650AFP and RC4651FP.
� � � � � � � � �
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
20
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
SECTION 2: PRINCIPLES OF OPERATION
This section describes the operating characteristics of the DoubleTalk
RC8650 chipset.
TRANSLATION ACCURACY
Because the RC8650 must handle the highly irregular spelling system
of English, as well as proper names, acronyms, technical terms, and
borrowed foreign words, there inevitably will be words that it will mispronounce. If a word is mispronounced, there are three techniques
for correcting it:
OPERATING MODES
The RC8650 has four primary operating modes and two inactive
modes designed to achieve maximum functionality and ﬂexibility. The
operating mode can be changed anytime, even on the ﬂy, by issuing
the appropriate command to the RC8650.
1. Spell the word phonetically for the desired pronunciation.
2. Redeﬁne the way the word should be pronounced by creating an
exception for it in the RC8650’s exception dictionary. This method
allows words to be corrected without having to modify the original
text, and it automatically corrects all instances of the word. Exception dictionaries are covered in detail in Section 4.
Note The RC8650 will not begin speaking until it receives a CR (ASCII
13) or Null (ASCII 00) character—this ensures that a complete contextual analysis can be performed on the input text. If it is not possible for
the application to send a CR or Null at the end of each text message,
use the Timeout Delay command.
3. Use the RC8650’s Phoneme mode.
The RC8650 does not make any distinction between uppercase and
lowercase characters—text and commands may be sent in any combination of uppercase and lowercase. All data sent to the RC8650
is buffered in an internal 2 KB input buffer, allowing additional text
and commands to be queued even while the RC8650 is producing
output.
The ﬁrst technique is the easiest way to ﬁne tune word pronunciations—by tricking the RC8650 into the desired pronunciation. Among
the more commonly mispronounced words are compound words
(baseball), proper names (Sean), and foreign loan words (chauffeur).
Compound words can usually be corrected by separating the two
words with a space, so that “baseball” becomes “base ball.” Proper
names and foreign words may require a bit more creativity, so that
“Sean” becomes “Shon,” and “chauffeur” becomes “show fur.” Heteronyms (words with identical spelling but different meanings and pronunciations) can also be modiﬁed using this technique. For example, if
the word read is to be pronounced “reed” instead of “red,” it can simply
be respelled as “reed.”
Text-to-speech mode. All text sent to the RC8650 is automatically
translated into speech by the integrated DoubleTalk TTS engine. TTS
mode can be further subdivided into three translation modes: Text,
which reads text normally; Character, which reads (spells) one character at a time; and Phoneme, which allows the TTS engine’s phonemes
to be directly accessed. TTS mode is the default operating mode.
Real Time Audio Playback mode. Data sent to the RC8650 is written
directly to the RC8650’s audio buffer. This results in a high data rate,
but provides the capability of producing the highest quality speech, as
well as sound effects. PCM and ADPCM data types are supported.
COMMANDS
The commands described in the following pages provide a simple yet
ﬂexible means of controlling the RC8650 under software control. They
can be used to vary voice attributes, such as the volume or pitch, to
suit the requirements of a particular application or listener’s preferences. Commands are also used to change operating modes.
Prerecorded Audio Playback mode. This mode allows recorded
messages and sound effects that have been stored in the RC8650 to
be played back. PCM and ADPCM data types are supported.
Commands can be freely intermixed with the text that is to be spoken,
allowing the voice to be dynamically controlled. Commands affect only
the data that follows them in the data stream.
Tone Generator modes. These modes activate the RC8650’s musical tone generator, sinusoidal generator, or DTMF generator. They can
be used to generate audible prompts, music, signaling tones, dial a
telephone, etc.
Command Syntax
Idle mode. To help conserve power in battery-powered systems, the
RC8650 automatically enters a reduced-power state whenever it is
inactive. Data can still be read and written to the RC8650 while in this
mode. Current draw is typically 700 µA @ 3.3 V.
All RC8650 commands are composed of the command character,
a parameter n comprised of a one to four-digit number string, and a
single string literal that uniquely identiﬁes the command. Some commands simply enable or disable a feature of the RC8650 and do not
require a parameter. The general command format is:
Standby mode. This mode powers down the RC8650, where current
draw is typically only 2 µA. Standby mode can be invoked from either
the STBY# pin or with the Sleep command. Data cannot be read from
or written to the RC8650 in this mode.
<command character>[<number string>]<string literal>
21
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
If two or more commands are to be used together, each must be prefaced with the command character. This is the only way the RC8650
knows to treat the remaining characters as a command, rather than
text that should be spoken. For example, the following commands
program pitch level 40 and volume level 7 (CTRL+A is the default
command character):
TTS COMMANDS
This section describes the software commands that affect the text-tospeech synthesizer.
Text Mode/Delay (T/nT)
This command places the RC8650 in the Text operating mode. The
optional delay parameter n is used to create a variable pause between
words. The shortest, and default delay of 0, is used for normal speech.
For users not accustomed to synthetic speech, the synthesizer’s intelligibility may be improved by introducing a delay. The longest delay
that can be speciﬁed is 15. If the delay parameter is omitted, the current (last set) value will be used and the exception dictionary will be
disabled. This feature is useful for returning from another operating
mode or disabling the exception dictionary (see Enable Exception
Dictionary command).
CTRL+A “40P” CTRL+A “7V”
The command character
The default RC8650 command character is CTRL+A (ASCII code 01).
The command character itself can be spoken by the RC8650 by sending it twice in a row: CTRL+A CTRL+A. This special command allows
the command character to be spoken without affecting the operation
of the RC8650, and without having to change to another command
character and then back again.
Changing the command character
Character Mode/Delay (C/nC)
The command character can be changed to another control character
(ASCII 01-26) by sending the current command character, followed by
the new character. To change the command character to CTRL+D, for
example, issue the command CTRL+A CTRL+D. To change it back,
issue the command CTRL+D CTRL+A. It’s generally a good idea to
change the command character if the text to be read contains characters which may otherwise be interpreted as command characters (and
hence commands). The command character can be unconditionally
reset to CTRL+A by sending CTRL+^ (ASCII 30) to the RC8650.
This command puts the RC8650 in the Character operating mode.
The optional delay parameter n is used to create a variable pause
between characters. Values between 0 (the default) and 15 provide
pauses from shortest to longest, respectively. Values between 16 and
31 provide the same range of pauses, but control characters will not
be spoken. If the delay parameter is omitted, the current value will be
used and the exception dictionary will be disabled.
Phoneme Mode (D)
Command parameters
This command disables the text-to-phonetics translator, allowing the
RC8650’s phonemes to be accessed directly. Table 2.1 lists the phonemes that can be produced by the RC8650.
Command parameters are composed of one to four digit number
strings. The RC8650 supports two types of parameters: absolute and
relative. Absolute parameters explicitly specify the parameter’s new
value, such as 9S or 3B. Relative parameters specify a displacement
from a parameter’s current value, not the actual new value itself.
When concatenating two or more phonemes, each phoneme must be
delimited by a space. For example, the word “computer” would be
represented phonetically as
Relative parameters can specify either a positive or negative displacement from a parameter’s current value. For example, the Volume command +2V increases the volume level by two (V+2→V). If the current
volume is 4, the volume will increase to 6 after the command has executed. The command –2V will have a similar effect, except the volume
will be decreased by two.
K AX M P YY UW DX ER
Phoneme attribute tokens
The RC8650 supports a number of phoneme attribute tokens that can
be used in addition to the standard commands. These tokens do not
require the command character or any parameters, but can only be
used in Phoneme mode and exception dictionaries.
If the value of a parameter falls outside the command’s range, the value will either wrap around or saturate, depending on the setting of the
SAT bit of the Protocol Options Register. For example, if parameters are
programmed to wrap, the current volume is 7 and the command +4V
is issued, the resultant volume will be (7+4)–10 = 1, since the volume
range is 0-9. If parameters are programmed to saturate, the resultant
volume would be 9 instead.
As indicated in Table 2.2, the / and \ tokens temporarily increase and
decrease the pitch by m steps. Besides being temporary, the difference between using the pitch tokens and the Pitch command is that
the effective pitch range is extended beyond the normal 0-99 range
by approximately ±20 steps, and if the pitch should fall out of range,
it will always saturate, regardless of the Protocol Options Register SAT
setting.
When writing application programs for the RC8650, it is recommended
that relative parameters be used for temporarily changing voice attributes (such as raising the pitch of a word), using absolute-parameter
commands only once in the program’s initialization routine. This way,
if the base value of an attribute needs to be changed, it only needs to
be changed in the initialization routine.
All other phoneme attribute token commands remain in effect until
explicitly changed.
22
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
to change the stress or emphasis of speciﬁc words in a phrase. This
is because Phoneme mode allows voice attributes to be modiﬁed on
phoneme boundaries within each word, whereas Text mode allows
changes only at word boundaries. This is illustrated in the following
examples.
Table 2.1. DoubleTalk Phoneme Symbols
Phoneme
Symbol
A
AA
AE
AH
AW
AX
AY
B
CH
D
DH
DX
E
EH
EI
ER
EW
EY
F
G
H
I
IH
IX
IY
J
K
KX
L
Example
Word
das (Spanish)
cot
cat
cut
cow
bottom
bite
bib
church
did
either
city
ser (Spanish)
bet
mesa (Spanish)
bird
acteur (French)
bake
fee
gag
he
libro (Spanish)
bit
rabbit
beet
age
cute
ski
long
Phoneme
Symbol
M
N
NG
NY
O
OW
OY
P
PX
R
RR
S
SH
T
TH
TX
U
UH
UW
V
W
WH
Y
YY
Z
ZH
space
,
.
Example
Word
me
new
rung
niño (Spanish)
no (Spanish)
boat
boy
pop
spot
ring
tres (Spanish)
sell
shell
tin
thin
stick
uno (Spanish)
book
boot
valve
we
when
mayo (Spanish)
you
zoo
vision
variable pause *
medium pause
long pause
CTRL+A "d" CTRL+A "m" "//h aw
+<\\yy uw
s p \iy k
t uw
t
-\w ey .+/"
-/d>/eh r
\m iy
dh ae
Note that expression is disabled in this example, since the pitch variations due to the internal intonation algorithms would otherwise interfere
with the pitch tokens. Compare this with the same phrase produced in
Text mode with expression enabled:
CTRL+A "t" CTRL+A "e" "How dare you speak to
me that way!"
Phoneme mode is also useful in applications that provide their own
text-to-phoneme translation, such as the front end of a custom textto-speech system.
Speed (nS)
The synthesizer’s speech rate can be adjusted with this command,
from 0S (slowest) through 9S (fastest). The default rate is 1S (5S if the
VC bit of the Protocol Options Register is set to 0).
Voice (nO)
The text-to-speech synthesizer has eight standard voices and a
number of individual voice parameter controls that can be used to
independently vary the voice characteristics. Voices are selected with
the commands 0O through 7O, shown in Table 2.3. Because the Voice
command alters numerous internal voice parameters (articulation,
pitch, expression, tone, etc.), it should precede any individual voice
parameter control commands.
* Normally used between words; duration determined by nT command
Table 2.3. Voice Presets
Table 2.2. Phoneme Attribute Tokens
��
��
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
Articulation (nA)
This command adjusts the articulation level, from 0A through 9A.
Excessively low articulation values tend to make the voice sound
slurred; very high values, on the other hand, can make the voice sound
choppy. The default articulation is 5A.
Applications of Phoneme mode
Phoneme mode is useful for creating customized speech, when the
normal text-to-speech modes are inappropriate for producing the
desired voice effect. For example, Phoneme mode should be used
23
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
Expression (E/nE)
Table 2.4. Punctuation Filter
Expression, or intonation, is the variation of pitch within a sentence or
phrase. When expression is enabled (n > 0), the RC8650 attempts to
mimic the pitch patterns of human speech. For example, when a sentence ends with a period, the pitch drops at the end of the sentence;
a question mark will cause the pitch to rise.
�
�
�
�
�
The optional parameter n determines the degree of intonation. 0E provides no intonation (monotone), whereas 9E is very animated sounding. 5E is the default setting. If the parameter is omitted, the current
(last set) value will be used. This is useful for re-enabling intonation
after a Monotone command.
��
��
��
��
��
Eﬀect on number strings
The values of n listed in Table 2.4 cause number strings to be read one
digit at a time (e.g., 0123 = “zero one two three”). ORing 04h to the
values listed in the table (n = 4-7) forces number strings to be read as
numbers (0123 = “one hundred twenty three”). N = 6 and n = 7 also
force currency strings to be read as they are normally spoken—for
example, $11.95 will be read as “eleven dollars and ninety ﬁve cents.”
Finally, ORing 08h to these values (n = 8-15) disables leading zero
suppression; number strings beginning with zero will always be read
one digit at a time.
Monotone (M)
This command disables all intonation (expression), causing the
RC8650 to speak in a monotonic voice. Intonation should be disabled
whenever manual intonation is applied using the Pitch command or
phoneme attribute tokens. Note that this command is equivalent to
the 0E command.
Formant Frequency (nF)
The default ﬁlter setting is 6B (Some punctuation, Numbers mode,
leading zero suppression enabled).
This command adjusts the synthesizer’s overall frequency response
(vocal tract formant frequencies), over the range 0F through 9F. By
varying the frequency, voice quality can be ﬁne-tuned or voice type
changed. The default frequency is 5F.
CONTROL COMMANDS
Volume (nV)
Pitch (nP)
This is a global command that controls the RC8650’s output volume
level, from 0V through 9V. 0V yields the lowest possible volume; maximum volume is attained at 9V. The default volume is 5V. The Volume
command can be used to set a new listening level, create emphasis in
speech, or change the output level of the tone generators.
This command varies the synthesizer’s pitch over a wide range, which
can be used to change the average pitch during speech production,
produce manual intonation, or create sound effects (including singing). Pitch values can range from 0P through 99P; the default is 50P.
Tone (nX)
Timeout Delay (nY)
The synthesizer supports three tone settings, bass (0X), normal (1X)
and treble (2X), which work much like the bass and treble controls on
a stereo. The best setting to use depends on the speaker being used
and personal preference. Normal (1X) is the default setting.
The RC8650 defers translating the contents of its input buffer until a
CR or Null is received. This ensures that text is spoken smoothly from
word to word and that the proper intonation is given to the beginnings
and endings of sentences. If text is sent to the RC8650 without a CR or
Null, it will remain untranslated in the input buffer indeﬁnitely.
Reverb (nR)
The RC8650 contains a programmable timer that is able to force the
RC8650 to translate its buffer contents after a preset time interval. The
timer is enabled only if the Timeout Delay parameter n is non-zero, the
RC8650 is not active (not talking), and the input buffer contains no CR
or Null characters. Any characters sent to the RC8650 before timeout
will automatically restart the timer.
This command is used to add reverberation to the voice. 0R (the default) introduces no reverb; increasing values of n correspondingly
increase the reverb delay and effect. 9R is the maximum setting.
Punctuation Filter (nB)
Depending on the application, it may be desirable to limit the reading
of certain punctuation characters. For example, if the RC8650 is used
to proofread documents, the application may call for only unusual
punctuation to be read. On the other hand, an application that orally
echoes keyboard entries for a blind user may require that all punctuation be spoken.
The Timeout parameter n speciﬁes the number of 200 millisecond
periods in the delay time, which can range from 200 milliseconds to 3
seconds. The default value is 0Y, which disables the timer.
Sleep Timer (nQ)
The sleep timer is used to force the RC8650 into Standby mode after a
programmed time interval. For example, the RC8650 can power down
automatically if the user forgets to turn off the power at the end of the
day. An audible “reminder” tone can even be programmed to sound
every ten minutes to remind the user that the power was left on, before
shutdown occurs.
The RC8650 supports four primary levels of punctuation ﬁltering as
shown in Table 2.4. These levels determine which punctuation characters will be spoken and which will not. In addition to the four base
levels, the command can be expanded to control how number strings
will be read. This is done by ORing the values 04h and/or 08h to the
base parameter range, as described below.
24
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
Index Marker (nI)
Table 2.5. Timeout Delays
�
�
�
�
�
�
��
Index markers are nonspeaking “bookmarks” that can be used to keep
track of where the RC8650 is reading within a passage of text. The
parameter n is any number between 0 and 99; thus, up to 100 unique
markers may be active at any given time.
��
��
��
��
�
�
��
When the RC8650 has spoken the text up to a marker, it transmits the
marker number to the host via the TXD pin. Note that this value is a
binary number between 0 and 99, not a literal ASCII number string as
was used in the command to place the marker. This allows the marker
to be transmitted as a one-byte value.
There is no limitation to how many index markers can be used in a text
string. The frequency depends on the resolution required by the application. In Text mode, for example, one marker per sentence or one
marker per word would normally be used. In Phoneme mode, markers
can be placed before each phoneme to monitor phoneme production,
which is useful for synchronizing an animated mouth with the voice.
The sleep timer is stopped and reset whenever the RC8650 is active,
and begins running when the RC8650 enters Idle mode. In this way,
the RC8650 will not shut itself down during normal use, as long as
the programmed timer interval is longer than the maximum time the
RC8650 is inactive.
The command parameter n determines when Standby mode will be
entered. You can place the RC8650 in Standby mode immediately,
program the sleep timer to any of 15 ten-minute intervals (10 to 150
minutes), or disable the sleep timer altogether (Table 2.6).
Baud Rate (nH)
The serial port’s baud rate can be programmed to the rates listed in
Table 2.7. If included as part of the greeting message, the command
will effectively override the baud rate set by the BRS pins.
Note that the delay interval is simply n x 10 minutes for 0 < n < 16. ORing 10h to these values (16 < n < 32) also enables the reminder tone,
which sounds at the end of each ten minute interval. Programming n
= 0 disables the sleep timer, which is the default setting. Setting n =
16 forces the RC8650 to enter Standby mode as soon as all output
has ceased.
Table 2.7. Programmable Baud Rates
�
�
�
�
�
�
�
�
�
�
�
��
��
If the sleep timer is allowed to expire, the RC8650 will emit the ASCII
character “p” from the TXD pin and the STBY status ﬂag will be set to
1, just before entering Standby mode. This enables the host to detect
that the RC8650 has entered Standby mode.
Once the RC8650 has entered Standby mode, it can be re-awakened
only by a hardware reset or by driving the STBY# pin low for 250 ns or
longer, then High again. All of the RC8650 handshake signals (BUSY,
CTS#, and RDY#) are forced to their “not ready” states while the
RC8650 is in Standby.
��
��
��
��
��
��
��
��
��
��
��
��
��
Table 2.6. Sleep Timer
�
�
�
�
�
��
��
��
�
�
��
��
TS Pin Control (nK)
The TS pins provide talk status information for each audio channel,
which can be used to activate a transmitter, take a telephone off hook,
enable an audio power ampliﬁer, etc., at the desired time. Each pin’s
state and polarity can be conﬁgured as shown in Table 2.8. The programming of the TS pins do not affect the Status Register TS ﬂag in
any way. The default setting is 1K.
��
��
�
�
��
��
��
�
�
��
If a TS pin is programmed High or Low, it will remain so until changed
otherwise. This feature can be used to activate a transmitter, for example, before speech output has begun. In the automatic mode, the
TS pin is asserted as soon as output begins; it will return to its false
state when all output has ceased. Note that because RC8650 commands work synchronously, the TS pin will not change state until all
text and commands, up to the TS Pin Control command, have been
spoken and/or executed.
25
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
Zap Commands (Z)
Table 2.8. TS Pin Control
�
This command prevents the RC8650 from honoring subsequent commands, causing it to read commands as they are encountered (useful
in debugging). Any pending commands in the input buffer will still
be honored. The only way to restore command recognition after the
Zap command has been issued is to write CTRL+^ (ASCII 30) to the
RC8650 or perform a hardware reset.
��
��
��
��
��
�
�
�
�
Protocol Options Register (nG)
This command controls various internal RC8650 operating parameters. The command parameter n is calculated by ORing together the
individual control bits shown in Table 2.9. For example, 193G (193 =
128 + 64 + 1) disables V8600 emulation, enables all status messages
and speciﬁes that parameters should saturate. 128G is the default
setting.
Reinitialize (@)
This command reinitializes the RC8650 by clearing the input buffer and
restoring the voice parameters and control registers to their factory
default settings. The exception dictionary, prerecorded audio, greeting
message, baud rate, nor TS pin control setting are affected.
Bit POR.7 (VC) programs the RC8650 to emulate RC Systems’ original
V8600 voice synthesizer module. When this bit is set to 0 (which V8600
application programs do, as this bit was undeﬁned in the V8600),
the overall voice speed range is reduced and the default speed is
changed from 1S to 5S, matching the characteristics of the V8600.
The serial port status messages (see Table 1.3) are also affected by
the setting of this bit.
Stop (CTRL+X), Skip (CTRL+Y)
The Stop command stops the RC8650 and ﬂushes its input buffer of all
text and commands. The Skip command skips to the next sentence in
the buffer. Neither command affects any of the RC8650’s settings.
Note The format of these commands is unique in that the command
character (CTRL+A) is not used with them. The CTRL+X (ASCII 24)
and CTRL+Y (ASCII 25) characters are written directly to the RC8650,
which enables the RC8650 to react immediately, even if its input buffer is full. To be most effective, the states of the RC8650 handshaking
signals should be ignored.
Note Relative parameters work differently than usual with this command. Instead of specifying a displacement from the register’s
current value, relative parameters allow you to set (“+”) and clear
(“–”) individual register bits. For example, +65G sets bits POR.0 and
POR.6; –65G clears POR.0 and POR.6.
Table 2.9. Protocol Options Register Bit Deﬁnitions
��
��
��
�
�
�
�
��
�
�
�
�
�
�
�
�
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
26
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
the host must pause between reading each byte in order to keep the
average transfer rate from exceeding 10 kbytes/sec.
Audio Control Register (nN)
The Audio Control Register determines whether the RC8650’s audio
stream will be output as an analog signal on the AO pins or as serial
digital data on the DAOUT pin. See Table 2.10 for the deﬁnition of each
register bit. The default register setting is 0N.
Figure 2.1 illustrates the synchronous data transfer mode. Note how
either DARTS# or DACLK can be used to regulate the ﬂow of data
from the RC8650.
In the digital audio modes, data is transferred from the DAOUT pin in
8 bit linear, offset binary format (midscale = 80h). The DARTS# pin can
be used to regulate the ﬂow of data—it must be Low for transfers to
begin. In the synchronous mode, do not attempt to read the data at an
average rate faster than 10 kbytes/sec. At clock rates above 80 kHz
Note Relative parameters work differently than usual with this command. Instead of specifying a displacement from the register’s
current value, relative parameters allow you to set (“+”) and clear
(“–”) individual register bits. For example, +40N sets bits ACR.3 and
ACR.5; –5N clears ACR.0 and ACR.2.
Table 2.10. Audio Control Register Deﬁnitions
��
��
��
��
��
��
��
��
�
�
�
�
�
�
�
�
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
27
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
Figure 2.1. Synchronous Digital Audio Transfer Timing
Enable Exception Dictionary (U)
The “pause” tone can be used to generate longer inter-digit delays
in phone number strings, or to create precise silent periods in the
RC8650’s output. The generator’s output level can be adjusted with
the Volume command (nV). DTMF commands may be intermixed with
text and other commands without restriction.
TONE GENERATION COMMANDS
Table 2.11. DTMF Dialer Button Map
The exception dictionary is enabled with this command. If the RC8650
is in Phoneme mode, or if an exception dictionary has not been loaded,
the command will have no effect. The exception dictionary can be disabled by issuing one of the mode commands D, T, or C.
Musical/Sinusoidal Tone Generators (J/nJ)
The musical and sinusoidal tone generators are activated with these
commands. Refer to Section 3 for more information.
DTMF Generator (n*)
The DTMF (Touch-Tone) generator generates the 16 standard tone
pairs commonly used in telephone systems. Each tone is 100 ms in
duration, followed by a 100 ms inter-digit pause—more than satisfying
telephone signaling requirements (both durations can be extended to
500 ms by setting the DDUR bit of the Protocol Options Register). The
mapping of the command parameter n to the buttons on a standard
telephone is shown in Table 2.11.
28
�
��
�
�
�
�
��
��
��
��
��
��
��
�
�
�
�
�
�
�
�
�
�
��
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
The output sampling rate can be programmed to any rate between 4
and 11 kHz (32,000-88,000 bps) by choosing the appropriate parameter value. The relationship between the command parameter n and
the sampling rate fs is
AUDIO PLAYBACK COMMANDS
Prerecorded Audio Playback Mode (n&)
A virtually unlimited number of prerecorded sound effects and messages can be stored in the RC8650, limited only by the amount of
available on-chip audio memory. RCStudio, a Windows-based application available from RC Systems, makes it easy to create, manage,
and download sound libraries composed of standard Windows wave
ﬁles to the RC8650. Sound libraries created with RCStudio can also
be downloaded to the RC8650 by simply transmitting the library ﬁle
in its entirety.
n = 155 – 617/fs
fs = 617/(155 – n)
where fs is measured in kHz. For example, to program an 8 kHz sampling rate, choose n=78. The range of n is 0–99, hence fs can range
from 4 to 11 kHz.
The following procedure should be used for sending PCM or ADPCM
audio data to the RC8650 in real time:
Each sound ﬁle (message or sound effect) in a sound library is
automatically assigned a record number, beginning with zero. The
ﬁrst ﬁle is record 0, the second is record 1, and so on. The playback
command plays records in any random order, using n to specify the
desired record.
1) Program the desired volume level with the Volume (nV) command.
A volume setting of 5 will cause the data to be played back at its
original volume level. This step is optional.
The playback level can be adjusted with the Volume (nV) command.
A volume setting of 5 will cause the ﬁles to be played back at their
original volume level.
2) Issue the Real Time Audio Playback Mode command n# if PCM
data is being sent, or n% for ADPCM data. The RC8650 expects
the audio data to immediately follow the command; therefore, be
sure not to terminate the command with a CR or NUL. The TS pin
and TS ﬂag will be asserted at this time.
Text and/or commands may be freely intermixed with the playback
command. For example,
3) If the RC8650’s serial port is being used for transferring the audio
data, change the host system’s baud rate to 115,200 baud at this
time.
^A “11*” “Hello” ^A “–3V” ^A “3&” ^A “+3V” ^A “9&”
plays the Touch-Tone “#” key and says “hello” at the current volume
setting, followed by the fourth sound ﬁle at a reduced volume level,
and ﬁnally the tenth sound ﬁle at the original volume level.
4) Begin transferring the audio data to the RC8650. The same methods employed for sending any other type of data to the RC8650
should be used. Note that the DAC will not begin taking samples
from the audio buffer until at least 100 bytes have been sent or the
value 80h is sent, whichever occurs ﬁrst.
Real Time Audio Playback Mode (n#/n%)
This mode allows audio samples to be written directly to the RC8650’s
digital-to-analog converter (DAC) via the RC8650’s serial and parallel
ports. All data sent to the RC8650 is routed directly to the RC8650’s
internal audio buffer; the RC8650 then outputs samples from the buffer
to the DAC at the rate programmed by n. Because the audio data is
buffered within the RC8650, the output sampling rate is independent
of the data rate into the RC8650, as long as the input rate is equal to
or greater than the programmed sampling rate.
5) After the last byte of audio data has been sent to the RC8650, send
the value 80h (–128). This signals the RC8650 to terminate Real
Time Audio Playback mode and return to the text-to-speech mode
of operation. Note that up to 2048 bytes of data may still be in the
audio buffer, so the RC8650 may continue producing sound for as
long as 0.5 second (at 4 kHz sampling rate) after the last byte of
data has been sent. The TS pin/TS ﬂag will not be cleared until all
of the audio data has been output to the DAC, at which time the
RC8650 will again be able to accept data from the host.
The RC8650 supports PCM and ADPCM audio data formats. RC Systems’ RCStudio software can convert standard Windows wave ﬁles to
PCM and ADPCM formats for use with the RC8650. ADPCM compression yields data ﬁles that are half the size of PCM ﬁles, thereby reducing the required data bandwidth and storage requirements.
If the host’s serial port baud rate was changed in step 3, it should
now be changed back to its original rate.
29
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
Figure 2.2 is a functional block diagram of the ADC input stage; Figure
2.3 illustrates the ADC in operation. Table 2.12 lists the deﬁnitions of
each bit of the ADC Control Register. The default register setting is 0$.
A/D CONVERTER COMMANDS
ADC Control Register (n$)
The ADC Control Register controls the operation of the integrated
analog-to-digital converter. All ADC results are transferred via the
TXD pin.
Operation of the ADC is not mutually exclusive of other RC8650
functions. The ADC can operate concurrently with text-to-speech,
tone generation, audio playback, etc. The effective sampling rate in
continuous mode is one-tenth the serial port baud rate (e.g., 115200
baud = 11.52 ksps).
The following is an overview of the ADC:
–
–
–
–
Four channels, 8-bit resolution (±2 LSB precision)
One-shot, continuous, single sweep, and continuous sweep
modes of operation
Selectable software or hardware triggering
Support for external ampliﬁcation/signal conditioning of all four
ADC channels
Note Relative parameters work differently than usual with this command. Instead of specifying a displacement from the register’s
current value, relative parameters allow you to set (“+”) and clear
(“–”) individual register bits. For example, +34$ sets bits ADR.1 and
ADR.5; –16$ clears ADR.4.
Table 2.12. ADC Control Register Deﬁnitions
�
��
��
��
��
�
��
��
�
�
�
�
�
�
�
�
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
30
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
��
��
��
��
��
��
��
��
��
��
��
��
��
��
Figure 2.2. ADC Input Block Diagram
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
Figure 2.3. ADC Transfer Timing
31
��
��
��
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
MISCELLANEOUS COMMANDS
Interrogate (12?)
This command retrieves the current operating settings of the RC8650.
Table 2.13 lists the parameters in the order they are transmitted from
the TXD pin, the command(s) that control each parameter, and each
parameter’s range. The parameters are organized as a byte array of
one byte per parameter.
Write Greeting Message (255W)
Anytime the RC8650 is reset, an optional user-defined greeting
message is automatically played. The message may consist of any
text/command sequence up to 234 characters in length. Modal commands can be included, such as tone generator and audio playback
commands.
Table 2.13. Parameters Returned by Interrogate Command
Caution The exception dictionary is erased whenever a new greeting
message is written to the RC8650.
��
��
��
��
��
��
��
��
��
��
��
CTRL+A “3S” CTRL+A “2O” “ready”
��
��
��
will program the RC8650 to use voice speed 3, Big Bob’s voice,
and say “ready” whenever it is reset.
��
��
��
��
��
��
��
��
��
��
��
��
��
�
��
��
�
��
��
�
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
�
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
To create a new greeting message, perform the following steps:
��
1) Write the command CTRL+A “255W”.
2) Write the exact text/command sequence you want to store, up to
234 characters. For example, the string
3) Write a Null (ASCII 00) to terminate the command and store the
greeting in the RC8650’s nonvolatile memory.
The RCStudio software, available from RC Systems, can automatically
create and download greeting messages for you. Greeting messages
created with RCStudio include the commands necessary to allow the
ﬁle to be downloaded to the RC8650 by simply transmitting the ﬁle in
its entirety.
Load Exception Dictionary (L)
This command purges the RC8650’s exception dictionary and stores
subsequent output from the host in the RC8650’s nonvolatile dictionary
memory. The maximum dictionary size is 16 KB.
Exception dictionaries must be compiled into the format required by
the RC8650 before they can be used. The RCStudio software, available from RC Systems, includes a dictionary editor and compiler for
performing this task. Dictionaries that have been compiled with RCStudio include the Load command in the ﬁle header, allowing the ﬁle
to be downloaded to the RC8650 by simply transmitting the ﬁle in its
entirety.
��
Exception dictionaries are covered in detail in Section 4.
Chipset Identiﬁcation (6?)
This command returns RC8650 system information that is used during
factory testing. Eight bytes are transmitted via the TXD pin. The only
information that may be of relevance to an application is the internal
microcode revision number, which is conveyed in the last two bytes in
packed-BCD format. For example, 13h 01h would be returned if the
version number was 1.13.
32
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
COMMAND SUMMARY
Table 2.14. RC8650 Command Summary
��
��
��
��
�
��
��
��
��
��
��
��
�
�
��
��
��
��
��
��
��
�
��
��
��
��
�
�
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
33
��
��
��
��
��
�
��
��
��
��
��
��
��
�
�
��
��
��
��
��
��
��
�
��
��
��
��
�
�
��
��
��
��
��
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
SECTION 3: MUSICAL & SINUSOIDAL TONE GENERATORS
MUSICAL TONE GENERATOR
The RC8650 contains a three-voice tone generator that can be used
for creating music and sound effects. This section explains how to
program the generator.
Note The RC8650 expects the tone generator data to immediately follow the J command; therefore, be sure not to terminate the command
with a CR or Null.
Note The musical tone generator output is available only from the AO
pins. Digital audio output from the DAOUT pin is not possible.
The tone generator is controlled with four, four-byte data and command frames, called Initialize, Voice, Play, and Quit. With these, the
programmer can control the volume, duration, and frequencies of the
three voices.
The musical tone generator is activated with the J command (no parameter). Once activated, all data output to the RC8650 is directed to
the musical tone generator.
��
�
�
�
�
�
��
��
��
��
��
�
�
�
��
��
��
��
�
�
�
�
�
��
��
Figure 3.1. Musical Tone Generator Command Formats
Initialize Command
Voice Frame
The Initialize command sets up the tone generator’s relative amplitude
and tempo (speed). The host must issue this command to initialize
the tone generator before sending any Voice frames. The Initialize
command may, however, be issued anytime afterward to change the
volume or tempo on the ﬂy.
Voice frames contain the duration and frequency (pitch) information
for each voice. All Voice frames are stored in a 2 KB buffer within the
RC8650, but are not played until the Play command is issued. If the
number of Voice frames exceeds 2 KB in length, the RC8650 will automatically begin playing the data.
Initialize command format
Voice frame format
The Initialize command consists of a byte of zero and three parameters. The parameters are deﬁned as follows:
KA
KTL
KTH
Voice frames are composed of three frequency time constants (K1-K3)
and a duration byte (KD), which speciﬁes how long the three voices
are to be played.
Voice amplitude (1-255)
Tempo, low byte (0-255)
Tempo, high byte (0-255)
The relationship between the time constant Ki and the output frequency fi is:
fi = 16,768/Ki
The range of the tempo KT (KTL and KTH) is 1-65,535 (1–FFFFh); the
larger the value, the slower the overall speed of play. The amplitude
and tempo affect all three voices, and stay in effect until another Initialize command is issued. If the command is issued between Voice
frames to change the volume or tempo on the ﬂy, only the Voice frames
following the command will be affected.
where fi is in Hertz and Ki = 4-255. Setting Ki to zero will silence voice
i during the frame.
KD may be programmed to any value between 1 and 255; the larger it
is made, the longer the voices will play during the frame.
34
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
mediate note values to be played, while maintaining the same degree
of accuracy. This is important when, for example, a thirty-second note
is to be played staccato, or a note is dotted (multiplying its length by
1.5).
Table 3.1. Musical Note Pitch/Ki Values
��
�
��
�
��
�
�
��
�
��
�
��
�
�
��
�
��
�
�
��
�
��
�
��
�
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
�
��
�
�
��
�
��
�
��
�
�
��
�
��
�
�
��
�
��
�
��
�
�
��
�
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
Table 3.2. Musical Note Duration/KD Values
��
��
��
��
��
��
��
��
��
��
��
��
��
�
��
��
��
��
��
��
Using the suggested values, it turns out that most musical scores
sound best when played at a tempo of 255 or faster (i.e., KTH = 0). Of
course, the “right” tempo is the one that sounds the best.
Play Command
The Play command causes the voice data in the input buffer to begin
playing. Additional Initialize commands and Voice frames may be sent
to the RC8650 while the tone generator is operating. The TS pin and
TS ﬂag are asserted at this time, enabling the host to synchronize to
the playing of the tone data. TS becomes inactive after all of the data
has been played.
Quit Command
The task of ﬁnding Ki for a particular musical note is greatly simpliﬁed
by using Table 3.1. The tone generator can cover a four-octave range,
from C two octaves below Middle C (Ki = 255), to D two octaves above
Middle C (Ki = 14). Ki values less than 14 are not recommended.
The Quit command marks the end of the tone data in the input buffer.
The RC8650 will play the contents of the buffer up to the Quit command, then return to the text-to-speech mode that was in effect when
the tone generator was activated. Once the Quit command has been
issued, the RC8650 will not accept any more data until the entire buffer has been played.
For example, the Voice frame
DATA 24,64,0,0
Example Tune
will play Middle C using voice 1 (K1 = 64). Since K2 and K3 are zero,
voices 2 and 3 will be silent during the frame. The duration of the note
is a function of both the tempo KT and duration KD, which in this case
is 24.
The Basic program shown in Figure 3.2 reads tone generator data
from a list of DATA statements and LPRINTs each value to the RC8650.
The program assumes that the RC8650 is connected to a PC’s printer
port, although output could be redirected to a COM port with the DOS
MODE command.
As another example,
The astute reader may have noticed some “non-standard” note durations in the DATA statements, such as the ﬁrst two Voice frames in
line 240. According to the original music, some voices were not to be
played as long as the others during the beat. The F-C-F notes in the
ﬁrst frame are held for 46 counts, while the low F and C in the second
frame are held for two additional counts. Adding the duration (ﬁrst and
ﬁfth) bytes together, the low F and C do indeed add up to 48 counts
(46 + 2), which is the standard duration of a quarter note.
DATA 48,64,51,43
plays a C-E-G chord, for a duration twice as long as the previous
example.
Choosing note durations and tempo
Table 3.2 lists suggested KD values for each of the standard musical
note durations. This convention permits shorter (1/64th note) and inter-
35
RC SYSTEMS
100
110
120
130
140
150
160
170
180
190
200
210
220
230
240
250
260
270
280
290
300
310
RC8650 VOICE SYNTHESIZER
LPRINT
' ensure serial port baud rate is locked
LPRINT CHR$(1);"J"; ' activate tone generator
READ B0,B1,B2,B3
' read a frame (4 bytes)
LPRINT CHR$(B0); CHR$(B1); CHR$(B2); CHR$(B3);
IF B0 + B1 + B2 + B3 > 0 THEN 120 ' loop until Quit
END
'
'
' Data Tables:
'
' Init (volume = 255, tempo = 86)
DATA 0,255,86,0
'
' Voice data
DATA 46,48,64,192, 2,0,64,192, 48,48,0,0, 48,40,0,0, 48,36,0,0
DATA 94,24,34,0, 2,24,0,0, 24,0,36,0, 24,0,40,0, 48,0,48,0
DATA 48,40,0,192, 46,36,0,0, 2,0,0,0, 48,36,0,0, 48,24,34,0
DATA 46,24,34,0, 2,0,34,0, 46,24,34,0, 2,24,0,0, 24,0,36,0
DATA 24,0,40,0, 48,0,48,0
'
' Play, Quit
DATA 0,0,1,1, 0,0,0,0
Figure 3.2. Example Musical Tone Generator Program
36
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
The tone frequencies F1 and F2 are computed as follows:
SINUSOIDAL TONE GENERATOR
The musical tone generator is capable of producing three tones simultaneously, and works well in applications which require neither precise
frequencies nor a “pure” (clean) output. The output is a pulse train rich
in harmonic energy, which tends to sound more interesting than pure
sinusoids in music applications.
Fi = Ki x fs / 1024 (Hz)
where 0 ≤ Ki ≤ 255. Substituting the relationship fs = 617 / (155 - n)
into this equation,
Fi = Ki x 603 / (155 – n) (Hz)
The sinusoidal tone generator enables the simultaneous generation of
two sinusoidal waveforms. Applications for this generator range from
generating simple tones to telephone call-progress tones (such as a
dial tone or busy signal). The frequency range is 0 to 2746 Hz, with a
resolution of 4 to 11 Hz.
Depending on the value of n, Fi can range from 0 Hz to 2746 Hz. If only
one tone is to be generated, the other tone frequency may be set to 0
(Ki = 0), or equal in frequency. Note, however, that due to the additive
nature of the tone generators, the output amplitude from both generators running at the same frequency will be twice that of just one generator running. Both K1 and K2 may be set to 0 to generate silence.
The sinusoidal tone generator is activated with the command nJ,
where n is an ASCII number between 0 and 99. Note the similarity to
the musical tone generator command, J, which uses no parameter.
The parameter n programs the internal sampling rate, much like the
Real Time Audio Playback command does; in fact, the sampling rate
fs has the same relationship to n as the Real Time Audio Playback
command:
Note that the frequency step size and frequency range are strictly
functions of n. In general, the larger n is, the larger the step size and
range will be. The parameter Ki can be thought of as a multiplier, which
when multiplied by the step size, yields the output frequency. For example, setting n = 95 (corresponding to an internal sampling rate of
10.28 kHz) results in a frequency step size of 603 / (155 - 95) Hz, or
10 Hz. Thus, the output frequency range spans 0 Hz to 255 x 10 Hz,
or 2550 Hz, in 10 Hz steps.
fs = 617 / (155 – n)
Immediately following the nJ command are three binary parameter
bytes:
As an example, suppose your application needed to generate the tone
pair 440/350 Hz (a dial tone) for say, 2.5 seconds. We will choose n =
95, because it yields a convenient step size of 10 Hz. The tone duration
parameter Kd is calculated as follows:
nJ Kd K1 K2
where Kd determines the tone duration, and K1 and K2 set the output
frequencies of generators 1 and 2, respectively.
Kd = 2410 x Td / (155 – n)
The tone duration and frequencies are not only functions of these
parameters, but of n as well. The output amplitude is a function of
the Volume command (nV). The command and parameter values are
buffered within the RC8650, and can be intermixed with text and other
commands without restriction.
substituting Td = 2.5 (sec) and n = 95,
Kd = 2410 x 2.5 / (155 – 95) = 100
K1 (440 Hz) is computed as follows:
K1 = F1 x (155 – n) / 603
The tone duration Td is calculated as follows:
= 440 x (155 – 95) / 603 = 44
Td = Kd x 256 / fs (sec)
In like manner, K2 (350 Hz) is computed to be 35.
where 0 ≤ Kd ≤ 255. Substituting the relationship fs = 617 / (155 – n)
into the above equation,
In order to embed the command in a text ﬁle, the computed values
must be converted into their ASCII equivalents: 100 = “d”, 44 = “,” and
35 = “#”. The complete command becomes
Td = Kd x (155 – n) / 2410 (sec)
Setting Kd = 1 yields the shortest duration; Kd = 0 (treated as 256)
the longest. Depending on the value of n, Td can range from 23 ms
to 16.5 sec.
^A95Jd,#
which can be embedded within normal text for the synthesizer.
37
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
SECTION 4: EXCEPTION DICTIONARIES
Exception dictionaries make it possible to alter the way the RC8650
interprets character strings it receives. This is useful for correcting
mispronounced words, triggering the generation of tones and/or
the playback of prerecorded sounds, or even speaking in a foreign
language. In some cases, an exception dictionary may even negate
the need of a text pre-processor in applications that cannot provide
standard text strings. This section describes how to create exception
dictionaries for the RC8650.
Table 4.1. Context Tokens
��
The text-to-speech modes of the RC8650 utilize an English lexicon and
letter-to-sound rules to convert text the RC8650 receives into speech.
The pronunciation rules determine which sounds, or phonemes, each
character will receive based on its relative position within each word.
The integrated DoubleTalk text-to-speech engine analyzes text by
applying these rules to each word or character, depending on the operating mode in use. Exception dictionaries augment this process by
deﬁning exceptions for (or even replacing) these built in rules.
Exception dictionaries can be created and edited with a word processor or text editor that stores documents as standard text (ASCII) ﬁles.
However, the dictionary must be compiled into the internal format used
by the RC8650 before it can be used. The RCStudio software, available from RC Systems, includes a dictionary editor and compiler.
�
�
�
��
��
��
��
�
�
�
��
��
��
�
�
��
��
�
��
��
��
�
�
��
��
��
��
��
�
EXCEPTION SYNTAX
Exceptions have the general form
�
�
L(F)R=P
�
which means “the text fragment F, occurring with left context L and
right context R, gets the pronunciation P.” All three parts of the exception to the left of the equality sign must be satisﬁed before the text
fragment will receive the pronunciation given by the right side of the
exception.
��
��
��
��
text fragment is to receive, which may consist of any combination
of phonemes (Table 2.1), phoneme attribute tokens (Table 2.2), and
commands (Table 2.14). Using the tone generator and prerecorded
audio playback commands, virtually limitless combinations of speech,
tones, and sound effects can be triggered from any input text pattern. If
no pronunciation is given, no sound will be given to the text fragment;
the text fragment will be silent.
The text fragment deﬁnes the input characters that are to be translated by the exception, and may consist of any combination of letters,
numbers, and symbols. Empty (null) text fragments may be used to
generate sound based on a particular input pattern, without actually
translating any of the input text. The text fragment (if any) must always
be contained within parentheses.
A dictionary ﬁle may also contain comments, but they must be on lines
by themselves (i.e., they cannot be on the same line as an exception).
Comment lines must begin with a semicolon character (;), so the compiler will know to skip over them.
Characters to the left of the text fragment specify the left context (what
must come before the text fragment in the input string), and characters to the right deﬁne the right context. Both contexts are optional, so
an exception may contain neither, either, or both contexts. There are
also 15 special symbols, or context tokens, that can be used in an
exception’s context deﬁnitions (Table 4.1).
An example of an exception is
C(O)N=AA
Note that although context tokens are, by deﬁnition, valid only within
the left and right context deﬁnitions, the wildcard token may also be
used within text fragments. Any other context token appearing within
a text fragment will be treated as a literal character.
which states that o after c and before n gets the pronunciation AA, the
o-sound in cot. For example, the o in conference, economy, and icon
would be pronounced according to this exception.
Another example is
The right side of an exception (P) speciﬁes the pronunciation that the
38
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
$R(H)=
The ﬁrst exception states that o followed by e, i, or y is to be pronounced OW, the o-sound in boat. The second exception does not
place any restriction on what must come before or after o, so o in
any context will receive the UW pronunciation. If the exceptions were
reversed, the (O)+ exception would never be reached because the
(O) exception will always match o in any context. In general, tightlydefined exceptions (those containing many context restrictions)
should precede loosely-deﬁned exceptions (those with little or no
context deﬁnitions).
which states that h after initial r is silent, as in the word rhyme (the $
context token represents any non-alphabetic character, such as a
space between words; see Table 4.1).
Punctuation, numbers, and most other characters can be redeﬁned
with exceptions as well:
(5)=S I NG K O
(CHR$)=K EH R IX K T ER
(Spanish ﬁve)
(Basic function)
(RAT)=R AE T
(RATING)=R EY T IH NG
(R)=R
THE TRANSLATION ALGORITHM
In order to better understand how an exception dictionary works, it
is helpful to understand how the DoubleTalk text-to-speech engine
processes text.
This is an example of how not to organize exceptions. The exception
(RATING) will never be used because (RAT) will always match ﬁrst.
According to these exceptions, the word rating would be pronounced
“rat-ing.”
Algorithms within the DoubleTalk engine analyze input text a character at a time, from left to right. A list of pronunciation rules is searched
sequentially for each character until a rule is found that matches the
character in the correct position and context. The algorithm then
passes over the input character(s) bracketed in the rule (the text fragment), and assigns the pronunciation given by the right side of the
rule to them. This process continues until all of the input text has been
converted to phonetic sounds.
It can be beneﬁcial to group exceptions by the ﬁrst character of the
text fragments, that is, all of the A exceptions in one group, all the B
exceptions in a second group, and so on. This gives an overall cleaner
appearance, and can prove to be helpful if the need arises to troubleshoot any problems in your dictionary.
TEXT NOT MATCHED BY THE DICTIONARY
The following example illustrates how the algorithm works by translating the word receive.
It is possible that some input text may not match anything in a dictionary, depending on the nature of the dictionary. For example, if a dictionary was written to handle unusual words, only those words would be
included in the dictionary. On the other hand, if a dictionary deﬁned
the pronunciation for another language, it would be comprehensive
enough to handle all types of input. In any case, if an exception is not
found for a particular character, the English pronunciation will be given
to that character according to the built in pronunciation rules.
The algorithm begins with the letter r and searches the R pronunciation
rules for a match. The ﬁrst rule that matches is $(RE)^#=R IX, because the r in receive is an initial r and is followed by an e, a consonant
(c), and a vowel (e). Consequently, the text fragment re receives the
pronunciation R IH, and the scan moves past re to the next character:
receive. (E is not the next scan character because it occurred inside
the parentheses with the r; the text fragment re as a whole receives
the pronunciation R IX)
Generally, the automatic switchover to the built in rules is desirable
if the dictionary is used to correct mispronounced words, since by
deﬁnition the dictionary is deﬁning exceptions to the built in rules. If
the automatic switchover is not desired, however, there are two ways
to prevent it from occurring. One way is to end each group of exceptions with an unconditional exception that matches any context. For
example, to ensure that the letter “a” will always be matched, end the
A exception group with the exception (A)=pronunciation. This technique works well to ensure matches for speciﬁc characters, such as
certain letters or numbers.
The ﬁrst match among the C rules is (C)+=S, because c is followed
by an e, i, or y. C thus receives the pronunciation S, and processing
continues with the second e: receive.
(EI)=IY is the ﬁrst rule to match the second e, so ei receives the
sound IY. Processing resumes at the character receive, which matches the default V rule, (V)=V.
The ﬁnal e matches the rule #:(E)$=, which applies when e is ﬁnal
and follows zero or more consonants and a vowel. Consequently, e
receives no sound and processing continues with the following word
or punctuation, if any. Thus, the entire phoneme string for the word
receive is R IX S IY V.
If the exception dictionary is to replace the built in rules entirely, end
the dictionary with the following exception:
()=
This special exception causes unmatched characters to be ignored
(receive no sound), rather than receive the pronunciation deﬁned by
the built in rules.
RULE PRECEDENCE
Since DoubleTalk uses its translation rules in a sequential manner, the
position of each exception relative to the others must be carefully considered. For example, consider the following pair of exceptions:
EFFECT ON PUNCTUATION
(O)+=OW
(O)=UW
39
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
Punctuation deﬁned in the exception dictionary has priority over the
Punctuation Filter command. Any punctuation deﬁned in the dictionary
will be used, regardless of the Punctuation Filter setting.
No Cussing, Please
The reading of speciﬁc characters or words can be suppressed by
writing exceptions in which no pronunciation is given.
Note If the dollar sign character ($) is deﬁned within the text fragment of any exception, currency strings will not be read as dollars
and cents.
(????)=
When Zero Isn’t Really Zero
When reading addresses or lists of numbers, the word “oh” is often
substituted for the digit 0. For example, we might say 1020 North Eastlake as “one oh two oh North Eastlake.” The digit 0 can be redeﬁned
in this manner with the following exception:
CHARACTER MODE EXCEPTIONS
Exceptions are deﬁned independently for the Character and Text
modes of operation. The beginning of the Character mode exceptions
is deﬁned by inserting the letter C just before the ﬁrst Character mode
exception. No exceptions prior to this marker will be used when the
RC8650 is in Character mode, nor will any exceptions past the marker
be used in Text mode. For example:
.
.
()=
(Text mode exceptions)
C
.
.
.
()=
(Character mode exceptions marker)
(YOU ﬁll in the blanks!)
(0)=OW
Acronyms and Abbreviations
Acronyms and abbreviations can be deﬁned so the words they represent will be spoken.
$(KW)$=K IH L AH W AA T
$(DR)$=D AA K T ER
$(TV)$=T EH L AX V IH ZH IX N
(optional; used if built in rules are not to be
used in no-match situations)
String Parsing & Decryption
(Character mode exceptions)
Sometimes the data that we would like to have read is not available in
a “ready-to-read” format. For example, the output of a GPS receiver
may look something like this:
(optional; used if built in rules are not to be
used in no-match situations)
$GPGGA,123456,2015.2607,N,...
The ﬁrst 14 characters of the string consists of a ﬁxed header and variable time data, which we would like to discard. The following exception
ensures that the header will not be read:
APPLICATIONS
The following examples illustrate some ways in which the exception
dictionary can be used.
($GPGGA,``````,)=
Correcting Mispronounced Words
Note how wildcard tokens are used for handling the time data (8th–13th
characters), since the content of this ﬁeld is variable.
Correcting mispronounced words is the most common application for
exception dictionaries.
The 15th–16th and 17th–18th characters represent the latitudinal coordinate in degrees and minutes, respectively. The three exceptions
shown below handle the latitudinal component of the GPS string. Note
in the ﬁrst exception how a null text fragment is used in the appropriate
position to generate the word “degrees,” without actually translating
any of the input characters.
S(EAR)CH=ER
$(OK)$=OW K EY
The ﬁrst exception corrects the pronunciation of all words containing
search (search, searched, research, etc.). As this exception illustrates,
it is only necessary to deﬁne the problem word in its root form, and
only the part of the word that is mispronounced (ear, in this case). The
second exception corrects the word ok, but because of the left and
right contexts, will not cause other words (joke, look, etc.) to be incorrectly translated.
,\\()\\.=D IX G R IY Z , ,
(.)=M IH N IH T S , ,
(,N,)=N OW R TH
L AE T IH T UW D
The four exceptions together will translate the example string as “20
degrees, 15 minutes, north latitude.” (Additional exceptions for handling the seconds component, and digits themselves, are not shown
for clarity).
40
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
lines of the bus interface (see Figure 1.6) and hardwiring the remaining four to the appropriate logic levels, virtually any set of 16 ASCII
characters can be generated, which in turn can be interpreted by the
exception dictionary.
Heteronyms
Heteronyms are words that have similar spellings but are pronounced
differently, depending on the context, such as read (“reed” and “red”)
and wind (“the wind blew” and “wind the clock”). Exceptions can be
used to ﬁx up these ambiguities, by including non-printing (Control)
characters in the text fragment of the exception.
For example, by connecting the four control bits to DB0 through DB3,
DB4 and DB5 to VCC, DB6 and DB7 to ground and the strobe to PWR#,
ASCII codes 30h through 3Fh (corresponding to the digits “0” through
“9” and the six ASCII characters following them) can be generated by
the four control bits. Message strings would then be assigned to each
of these ASCII characters. For example, you could make the character
“0” (corresponding to all four control bits = 0) say, “please insert quarter,” with the following dictionary entry:
Suppose a line of text required the word “close” to be pronounced as
it is in “a close call,” instead of as in “close the window.” The following
exception changes the way the s will sound:
(^DCLOSE)=K L OW S
Note the CTRL+D character (^D) in the text fragment. Although a nonprinting character, the translation algorithms treat it as they would any
printing character. Thus, the string “^D close” will be pronounced with
the s receiving the “s” sound, wherever it appears in the text stream.
Plain “close” (without the CTRL+D) will be unaffected—the s will still
receive the “z” sound. It does not matter where you place the Control
character in the word, as long as you use it the same way in your
application’s text. You may use any non-printing character (except LF
and CR) in this manner.
(0)=P L IY Z
IH N S ER T
K W OW R T ER
The Timeout timer should also be activated (1Y, for example) in order
for the “message” to be executed. Otherwise, the RC8650 will wait
indeﬁnitely for a CR/Null character that will never come. The timer command could be included in the greeting message.
TIPS
Make sure that your exceptions aren’t so broad in nature that they do
more harm than good. Exceptions intended to ﬁx broad classes of
words, such as word endings, are particularly notorious for ruining
otherwise correctly pronounced words.
Foreign Languages
Dictionaries can be created that enable the RC8650 to speak in foreign languages. It’s not as difﬁcult as it may seem—all that is required
in most cases is a pronunciation guide and a bit of patience. If you
don’t have a pronunciation guide for the language you’re interested
in, check your local library. Most libraries have foreign language dictionaries that include pronunciation guides, which make it easy to
transcribe the pronunciation rules into exception form.
Take care in how your exceptions are organized. Remember, an exception’s position relative to others is just as important as the content
of the exception itself.
When Things Don’t Work as Expected
On rare occasions, an exception may not work as expected. This occurs when the built in pronunciation rules get control before the exception does. The following example illustrates how this can happen.
Language Translation
Exception dictionaries even allow the RC8650 to read foreign language text in English! The following exceptions demonstrate how this
can be done with three example Spanish/English words.
Suppose an exception redeﬁned the o in the word “process” to have
the long “oh” sound, the way it is pronounced in many parts of Canada.
Since the word is otherwise pronounced correctly, the exception redeﬁnes only the “o:”
(GRANDE)=L AA R J
(BIEN)=F AY N
(USTED)=YY UW
PR(O)CESS=OW
The sense of translation can also be reversed:
But much to our horror, the RC8650 simply refuses to take on the new
Canadian accent.
(LARGE)=G RR A N D EI
(FINE)=B I EI N
(YOU)=U S T EI DH
It so happens that the RC8650 has a built in rule which looks something like this:
$(PRO)=P R AA
Message Macros
Certain applications may not be able to send text strings to the
RC8650. An example of such an application is one that is only able
to output a four bit control word and strobe. Sixteen unique output
combinations are possible, but this is scarcely enough to represent
the entire ASCII character set.
This rule translates a group of three characters, instead of only one as
most of the built in rules do. Because the text fragment PRO is translated as a group, the o is processed along with the initial “pr,” and
consequently the exception never gets a shot at the o.
You can, however, assign an entire spoken phrase to a single ASCII
character with the exception dictionary. By driving four of the data bus
If you suspect this may be happening with one of your exceptions,
include more of the left-hand side of the word in the text fragment (in
the example above, (PRO)CESS=P R OW would work).
41
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
SECTION 5: RC8650 EVALUATION KIT
The RC8650 Evaluation Kit comes with everything required to evaluate
and develop applications for the RC8650 chipset using a Windowsbased PC. The included RCStudio™ software provides an integrated development environment with the following features:
•
•
•
•
EVALUATION KIT CONTENTS
The following components are included in the DoubleTalk RC8650
Evaluation Kit:
•
•
•
•
•
Read any text, either typed or from a ﬁle
Easy access to the various RC8650 voice controls
Manage collections of sound ﬁles and store them in the RC8650
Exception dictionary editor/compiler, and much more...
Printed circuit board containing the RC8650-1 chipset
AC power supply
Speaker
Serial cable
RCStudio™ development software CD
The evaluation board can also be used in stand-alone environments
by simply printing the desired text and commands to it via the onboard
RS-232 serial or parallel ports.
EVAL BOARD OUTLINE
��
��
��
��
�
��
��
��
�
��
��
��
��
��
��
��
�
��
�
��
��
��
��
��
��
��
��
��
��
��
��
��
��
�
�
�
�
��
��
��
42
��
��
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
CONNECTOR PIN ASSIGNMENTS & SCHEMATICS
Table 5.1. P1 Pin Assignments (Audio Output & Control)
Table 5.4. P101 Pin Assignments (RS-232 Serial Interface)
��
��
��
��
��
��
��
��
�
��
�
��
�
��
�
��
�
��
��
��
�
��
�
��
�
��
��
��
�
��
�
��
�
��
��
��
�
��
�
��
�
��
��
��
�
��
�
�
�
��
��
��
�
��
��
��
�
��
��
��
Table 5.5. P102 Pin Assignments (TTL Serial Interface)
Table 5.2. P2 Pin Assignments (A/D Converter)
��
��
��
��
�
��
�
��
�
��
�
��
�
��
�
��
�
��
�
��
�
��
��
��
��
��
��
�
�
�
��
�
�
�
�
��
��
�
�
�
��
��
�
��
��
�
��
�
��
�
��
�
��
Table 5.6. P103 Pin Assignments (Printer/Bus Interface)
��
�
��
JP4-JP6 must be open in order to use the TTL interface
Table 5.3. JP1-JP3 Pin Assignments (Baud Rate)
��
��
��
��
43
��
��
��
��
�
��
��
��
�
��
��
��
�
��
��
��
�
��
��
��
�
��
��
��
�
��
��
��
�
��
��
��
�
��
��
��
�
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
44
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
�
��
��
��
��
��
�
��
��
�
�
��
��
��
��
��
��
��
��
��
��
��
��
�
�
��
��
�
�
��
��
��
��
��
��
��
��
��
��
��
��
��
�
��
��
��
��
��
��
��
�
�
��
�
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
�
��
��
��
��
��
��
��
��
��
�
��
��
�
�
��
��
�
��
��
��
��
��
��
��
��
��
��
�
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
�
�
�
�
�
�
�
�
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
��
��
��
��
��
��
��
��
��
��
��
�
��
��
�
�
�
�
�
�
�
��
��
��
��
45
��
��
��
��
��
��
��
��
��
�
�
�
�
��
��
��
��
�
��
��
��
��
��
�
�
��
��
�
��
��
��
�
�
�
�
�
�
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
�
��
��
��
��
��
��
��
��
��
��
�
��
�
�
�
�
��
�
�
�
��
��
��
�
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
�
�
�
�
�
�
�
�
�
��
��
��
��
�
��
��
�
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
�
��
��
��
�
�
��
��
�
��
��
��
��
��
�
�
�
�
��
�
�
�
�
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
�
�
�
��
��
��
��
��
��
��
�
��
�
�
��
��
��
��
��
��
��
��
��
�
�
�
�
��
��
��
��
��
��
��
��
�
�
�
�
�
�
�
�
�
�
�
�
�
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
�
��
�
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
��
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
��
��
��
��
��
��
��
��
��
��
46
� �
��
� �
��
�
��
��
�
��
�
��
�
�
��
��
��
� �
�
�
�
�
��
��
��
�
�
�
�
��
��
��
��
��
��
��
��
�
�
��
��
� �
��
��
��
��
��
��
��
��
��
��
��
�
�
�
�
�
��
��
�
�
�
�
��
��
��
��
��
��
��
��
��
��
��
�
�
��
��
��
� �
�
��
��
� �
��
��
��
��
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
��
��
�
47
��
�
�
�
��
��
�
��
��
��
�
��
��
��
��
��
��
��
��
�
��
��
�
��
�
��
��
��
��
�
��
��
�
�
��
�
��
��
��
�
�
��
��
��
��
��
��
��
��
��
��
��
��
��
RC SYSTEMS
RC8650 VOICE SYNTHESIZER
Speciﬁcations written in this publication are believed to be accurate, but are not guaranteed to be entirely free of error. RC Systems reserves the right to make changes
in the devices or the device speciﬁcations described in this publication without notice. RC Systems advises its customers to obtain the latest version of device speciﬁcations to verify, before placing orders, that the information being relied upon by the customer is current.
In the absence of written agreement to the contrary, RC Systems assumes no liability relating to the sale and/or use of RC Systems products including ﬁtness for a
particular purpose, merchantability, for RC Systems applications assistance, customer’s product design, or infringement of patents or copyrights of third parties by or
arising from use of devices described herein. Nor does RC Systems warrant or represent that any license, either express or implied, is granted under any patent right,
copyright, or other intellectual property right of RC Systems covering or relating to any combination, machine, or process in which such devices might be or are used.
RC Systems products are not intended for use in medical, life saving, or life sustaining applications.
Applications described in this publication are for illustrative purposes only, and RC Systems makes no warranties or representations that the devices described herein
will be suitable for such applications.
1609 England Avenue, Everett, WA 98203
Phone: (425) 355-3800 Fax: (425) 355-1098
http://www.rcsys.com

Open as PDF

Similar pages: MOTOROLA MC144144P; ZILOG Z8613112SSC; CMLMICRO M37630M4T; ELM ELM329SM; 1395 Node Adapter Board; ETC DLP-MAVLCD1; ETC RC8660-1; WINBOND WTS701EF; ZILOG Z8623012PSC; SKYWORKS CX74017; NSC ADC12L080CIVY; TI IP_TNETV1050; STMICROELECTRONICS TDA7550; STMICROELECTRONICS M74HCT651M1R; ASM-SENSOR 3197; CIRRUS CS6422-CSZ; Data Sheet; ATMEL MG; NSC LM1894N; NSC LM832N; SONY GXB2000