Challenges in Embedded Speaker-Independent Name-Dialing Meir Griniasty Hagai Aronowitz Ezer Melzer Ruthi Aloni-Lavi Ruben Maislos Adoram Erell Speech Research Lab ® Intel Wireless Communications and Computing Group Benefits and Value ? ? No voice-enrollment barrier More functionality Faster phonebook search Fun ? Return on investment ? Handset vendors: Competitive edge ? Cellular-operators: More air-time ? ? ? ® Driving At home Hands Free, Eyes Free Computation Resources Hebrew PXA800F Baseband Processor TX 312MHz Intel® XScale ™ core Intel® Micro Signal Architecture core RF Transceiver Intel® Flash 4 MByte Intel® Flash 0.5 MByte Analog Mixed Signal SRAM 512 KByte SRAM 64 KByte Applications English French German …. …. …. Games Phonebook Calendar Voice memo Voice dial ® I2C I2S GSM/GPRS Power Management GSM/GPRS Logic Peripherals RTOS 1$ / MB RX 1 4 7 2 5 8 0 Polyphonic Ringer 3 6 9 # * 5x5 Keypad Display SIM SIND Algorithmic Challenges ? Letter-to-phoneme ? ? ? Acoustic models ? ? ? No room for large name-lexicons Non-native names No room for rich models Large databases are expensive Language-mix 1. 2. ? ? ® George Harrison George Moustaki Noise No data-collection from pilot deployments User Interface ? Dual mode ? ? ? ? User language selection Lead the user to improve his hit-rate ? ? ® Full VUI for eyes-free Snappy for normal-use Distance to microphone Wait for the prompt Intel SIND ? Languages: ? ? ? ? ? ? ? ? ? ? ? ® US-English, Mandarin, Japanese UK-English, French, German, Italian, Spanish … Stochastic L2P + compressed lexicon Multilingual acoustic units Instantaneous noise adaptation Unsupervised speaker adaptation Restricted grammar Concatenative TTS Voice enrollment for “stubborn” names Bundled with Intel cellular processors Computing Resources Resource Flash ® Intel SIND L2P 1 language 7 - 27 KB + 4 B x #lexicon-entries Acoustic models 5 languages 70 KB English TTS 200 KB x 500 Names SRAM 40 KB + 0.2 KB x #Names 140 KB MHz for 0.5 sec response-time (PXA250) 60 + 0.4 x #Names 260 MHz Stochastic L2P ® Word Lexical-Accuracy [%] Recognition Error-rate Increase relative to lexicon English 74 25 % German 86 French 98 Italian 99 Contacts-activation demo ®