Presentation

Challenges in Embedded
Speaker-Independent Name-Dialing
Meir Griniasty
Hagai Aronowitz
Ezer Melzer
Ruthi Aloni-Lavi
Ruben Maislos
Adoram Erell
Speech Research Lab
®
Intel Wireless Communications and Computing Group
Benefits and Value
?
?
No voice-enrollment barrier
More functionality
Faster phonebook search
Fun
? Return on investment
?
Handset vendors: Competitive edge
? Cellular-operators: More air-time
?
?
?
®
Driving
At home
Hands Free, Eyes Free
Computation Resources
Hebrew
PXA800F
Baseband Processor
TX
312MHz
Intel®
XScale ™ core
Intel® Micro
Signal
Architecture
core
RF
Transceiver
Intel® Flash
4 MByte
Intel® Flash
0.5 MByte
Analog
Mixed Signal
SRAM
512 KByte
SRAM
64 KByte
Applications
English
French
German
….
….
….
Games
Phonebook
Calendar
Voice memo
Voice dial
®
I2C
I2S
GSM/GPRS
Power
Management
GSM/GPRS Logic
Peripherals
RTOS
1$ / MB
RX
1
4
7
2
5
8
0
Polyphonic
Ringer
3
6
9
#
*
5x5
Keypad
Display
SIM
SIND Algorithmic Challenges
?
Letter-to-phoneme
?
?
?
Acoustic models
?
?
?
No room for large name-lexicons
Non-native names
No room for rich models
Large databases are expensive
Language-mix
1.
2.
?
?
®
George Harrison
George Moustaki
Noise
No data-collection from pilot deployments
User Interface
?
Dual mode
?
?
?
?
User language selection
Lead the user to improve his hit-rate
?
?
®
Full VUI for eyes-free
Snappy for normal-use
Distance to microphone
Wait for the prompt
Intel SIND
?
Languages:
?
?
?
?
?
?
?
?
?
?
?
®
US-English, Mandarin, Japanese
UK-English, French, German, Italian, Spanish
…
Stochastic L2P + compressed lexicon
Multilingual acoustic units
Instantaneous noise adaptation
Unsupervised speaker adaptation
Restricted grammar
Concatenative TTS
Voice enrollment for “stubborn” names
Bundled with Intel cellular processors
Computing Resources
Resource
Flash
®
Intel SIND
L2P
1 language
7 - 27 KB
+ 4 B x #lexicon-entries
Acoustic
models
5 languages
70 KB
English TTS
200 KB
x 500
Names
SRAM
40 KB
+ 0.2 KB x #Names
140 KB
MHz for 0.5 sec
response-time
(PXA250)
60 + 0.4 x #Names
260 MHz
Stochastic L2P
®
Word
Lexical-Accuracy
[%]
Recognition
Error-rate
Increase
relative to lexicon
English
74
25 %
German
86
French
98
Italian
99
Contacts-activation demo
®