Introduction Existing Speech Dereverberation Techniques Proposed Speech Dereverberation Technique Experimental Results Summary and Future Research Recent Advances in Speech Dereverberation Dr.Ir. Emanuël Habets In collaboration with Dr. Sharon Gannot and Dr. Israel Cohen Department of Electrical Engineering, Technion - IIT School of Electrical Engineering, Bar-Ilan University IBM Speech Technologies Seminar 2008 Dr.Ir. Emanuël Habets Recent Advances in Speech Dereverberation Introduction Existing Speech Dereverberation Techniques Proposed Speech Dereverberation Technique Experimental Results Summary and Future Research Outline 1 Introduction 2 Existing Speech Dereverberation Techniques 3 Proposed Speech Dereverberation Technique 4 Experimental Results 5 Summary and Future Research Dr.Ir. Emanuël Habets Recent Advances in Speech Dereverberation Introduction Existing Speech Dereverberation Techniques Proposed Speech Dereverberation Technique Experimental Results Summary and Future Research What is reverberation? Motivation for Speech Dereverberation Applications Problem Formulation Challenges Outline 1 Introduction 2 Existing Speech Dereverberation Techniques 3 Proposed Speech Dereverberation Technique 4 Experimental Results 5 Summary and Future Research Dr.Ir. Emanuël Habets Recent Advances in Speech Dereverberation Introduction Existing Speech Dereverberation Techniques Proposed Speech Dereverberation Technique Experimental Results Summary and Future Research What is reverberation? Motivation for Speech Dereverberation Applications Problem Formulation Challenges What is reverberation? Reverberation is the process of multi-path propagation of a sound from its source to a receiver. Audio Example: Anechoic Speech. Reverberation Speech. Dr.Ir. Emanuël Habets Recent Advances in Speech Dereverberation Introduction Existing Speech Dereverberation Techniques Proposed Speech Dereverberation Technique Experimental Results Summary and Future Research What is reverberation? Motivation for Speech Dereverberation Applications Problem Formulation Challenges Motivation for Speech Dereverberation Wall Interferences Desired Source Microphone signal Microphone Signal degradation that is caused by reverberation and ambient noise can decrease the fidelity and intelligibility of speech and the recognition performance of automatic speech recognition systems. Dr.Ir. Emanuël Habets Recent Advances in Speech Dereverberation Introduction Existing Speech Dereverberation Techniques Proposed Speech Dereverberation Technique Experimental Results Summary and Future Research What is reverberation? Motivation for Speech Dereverberation Applications Problem Formulation Challenges Motivation for Speech Dereverberation Desire to work hands-free and handset-free !!! Dr.Ir. Emanuël Habets Recent Advances in Speech Dereverberation Introduction Existing Speech Dereverberation Techniques Proposed Speech Dereverberation Technique Experimental Results Summary and Future Research What is reverberation? Motivation for Speech Dereverberation Applications Problem Formulation Challenges Applications There is a variety of applications for speech dereverberation. Automotive Hands-Free Car Phone Kits. Health Hearing Aids, Home-Care. Home/Office Speech and Speaker Recognition, Internet Telephony, Teleconferencing, Set-top boxes, Home Automation. Mobile Mobile Phones, Smartphones, PDA’s, Mobile Multimedia Systems. Dr.Ir. Emanuël Habets Recent Advances in Speech Dereverberation Introduction Existing Speech Dereverberation Techniques Proposed Speech Dereverberation Technique Experimental Results Summary and Future Research What is reverberation? Motivation for Speech Dereverberation Applications Problem Formulation Challenges Problem Formulation Given the anechoic speech signal s(n) and the acoustic impulse response h(n) we can express the reverberant speech signal as z(n) = n X j=−∞ s(j)h(n − j). The microphone signal can be written as x(n) = z(n) + v (n). where v (n) denotes the additive ambient noise component. Dr.Ir. Emanuël Habets Recent Advances in Speech Dereverberation Introduction Existing Speech Dereverberation Techniques Proposed Speech Dereverberation Technique Experimental Results Summary and Future Research What is reverberation? Motivation for Speech Dereverberation Applications Problem Formulation Challenges Problem Formulation Ultimate Goal Complete Dereverberation: Given the microphone signals our objective is to estimate the anechoic speech signal s(n) up to an arbitrary scale and time delay. Sufficient Goal Partial Dereverberation: Given the microphone signals our objective is to estimate a filtered version of the anechoic speech signal s(n). This filter should introduce less reverberation and spectral coloration compared to a reference acoustic channel. Dr.Ir. Emanuël Habets Recent Advances in Speech Dereverberation Introduction Existing Speech Dereverberation Techniques Proposed Speech Dereverberation Technique Experimental Results Summary and Future Research What is reverberation? Motivation for Speech Dereverberation Applications Problem Formulation Challenges Challenges Speech dereverberation is a blind problem. Source Signal: Unknown. Non-stationary. Acoustic Channel: Unknown. Time-varying. Impulse response is very long, i.e., approx. fs · RT60 samples. Impulse response is nonminimum-phase. Dr.Ir. Emanuël Habets Recent Advances in Speech Dereverberation Introduction Existing Speech Dereverberation Techniques Proposed Speech Dereverberation Technique Experimental Results Summary and Future Research Classification Class I: Reverberation Cancellation Class II: Reverberation Suppression Outline 1 Introduction 2 Existing Speech Dereverberation Techniques 3 Proposed Speech Dereverberation Technique 4 Experimental Results 5 Summary and Future Research Dr.Ir. Emanuël Habets Recent Advances in Speech Dereverberation Introduction Existing Speech Dereverberation Techniques Proposed Speech Dereverberation Technique Experimental Results Summary and Future Research Classification Class I: Reverberation Cancellation Class II: Reverberation Suppression Existing Speech Dereverberation Techniques In the context of automatic speech/speaker recognition dereverberation can be integrated into the recognizer. Speech dereverberation can be performed in the Feature Domain Cepstral Mean Normalization Cepstral Mean and Variance Normalization Reverberation Models [Sehr and Kellermann, 2006-2007]) Signal Domain Dr.Ir. Emanuël Habets Recent Advances in Speech Dereverberation Introduction Existing Speech Dereverberation Techniques Proposed Speech Dereverberation Technique Experimental Results Summary and Future Research Classification Class I: Reverberation Cancellation Class II: Reverberation Suppression Classification Reverberation Suppression Reverberation Cancellation Source Characteristics Exact Explicit Speech Modelling Litle LP Residual Enhancement Spectral Enhancement HERB Temporal Envelope Filtering Blind Deconvolution None Spatial Processing Homomorphic Deconvolution None Litle Exact Channel Knowledge Figure: Overview of different speech dereverberation techniques. Dr.Ir. Emanuël Habets Recent Advances in Speech Dereverberation Introduction Existing Speech Dereverberation Techniques Proposed Speech Dereverberation Technique Experimental Results Summary and Future Research Classification Class I: Reverberation Cancellation Class II: Reverberation Suppression Class I: Reverberation Cancellation s(t) Linear System L(t) x(t) Inverse System L−1 (t) ŝ(t) Unknown Environment Two distinct approaches: Estimate s(t) directly, or the parameters of the signal model and the excitation signal, i.e., by treading the parameters of the system L(t) as nuisance parameters. Firstly, model the linear system L(t). Secondly, estimate the parameters of the system L(t). Finally, deconvolve x(t) with L−1 (t) to recover s(t). Dr.Ir. Emanuël Habets Recent Advances in Speech Dereverberation Introduction Existing Speech Dereverberation Techniques Proposed Speech Dereverberation Technique Experimental Results Summary and Future Research Classification Class I: Reverberation Cancellation Class II: Reverberation Suppression Class I: Reverberation Cancellation Examples (non-exhaustive list) Blind Deconvolution (i.i.d. assumption) [Haykin, 1994]. Null-space of the spatial correlation matrix [Gannot, 2003]. Bayesian parameter estimation techniques to estimate the unknown parameters of the speech and the channel model [Hopgood, 2000]. Problems and Limitations Insufficiently robust to small changes in the AIR [Radlovic, 2000]. Channels cannot be identified uniquely when they contain common zeros. Observation noise causes severe problems. Some methods require knowledge of the order of the unknown system. Dr.Ir. Emanuël Habets Recent Advances in Speech Dereverberation Introduction Existing Speech Dereverberation Techniques Proposed Speech Dereverberation Technique Experimental Results Summary and Future Research Classification Class I: Reverberation Cancellation Class II: Reverberation Suppression Class II: Reverberation Suppression Examples (non-exhaustive list) Cepstrum Techniques (e.g., liftering in the Cepstrum domain). Spatial Processing (e.g., delay and sum beamformer). Linear Prediction Residual Enhancement [Gaubitch et al., 2004-2007]. Spectral Enhancement [Lebart, 2001; Habets, 2004-2007]. Problems and Limitations a priori knowledge of the source and/or the channel is required. Only partial dereverberation is possible. Tendency to introduce speech distortions. Dr.Ir. Emanuël Habets Recent Advances in Speech Dereverberation Introduction Existing Speech Dereverberation Techniques Proposed Speech Dereverberation Technique Experimental Results Summary and Future Research Proposed Approach Statistical Reverberation Models Single-Microphone Spectral Enhancement Multi-Microphone Spectral Enhancement Outline 1 Introduction 2 Existing Speech Dereverberation Techniques 3 Proposed Speech Dereverberation Technique 4 Experimental Results 5 Summary and Future Research Dr.Ir. Emanuël Habets Recent Advances in Speech Dereverberation Introduction Existing Speech Dereverberation Techniques Proposed Speech Dereverberation Technique Experimental Results Summary and Future Research Proposed Approach Statistical Reverberation Models Single-Microphone Spectral Enhancement Multi-Microphone Spectral Enhancement Proposed Approach Reflections affect the desired signal in two distinct ways: Early reflections introduce spectral coloration. Late reflections change the waveform’s temporal envelope as exponentially decaying tails are added at sound offsets. Independent research has shown that the speech fidelity and intelligibility are mainly degraded by late reverberation. Dr.Ir. Emanuël Habets Recent Advances in Speech Dereverberation Introduction Existing Speech Dereverberation Techniques Proposed Speech Dereverberation Technique Experimental Results Summary and Future Research Proposed Approach Statistical Reverberation Models Single-Microphone Spectral Enhancement Multi-Microphone Spectral Enhancement Proposed Approach Let us split the AIR into two components: h (n) h(n) he (n) 0 N time index n Dr.Ir. Emanuël Habets Recent Advances in Speech Dereverberation Introduction Existing Speech Dereverberation Techniques Proposed Speech Dereverberation Technique Experimental Results Summary and Future Research Proposed Approach Statistical Reverberation Models Single-Microphone Spectral Enhancement Multi-Microphone Spectral Enhancement Proposed Approach The received microphone signal x(n) can then be expressed as n X x(n) = j=n−N` +1 | s(j)he (n − j) + {z ze (n) } n−N X` j=−∞ | s(j)h` (n − j) + v (n) {z z` (n) } In the short-time Fourier transform (STFT) domain: X (`, k) = Ze (`, k) + Z` (`, k) + V (`, k). We aim at the suppression of late reverberation and noise, i.e., at the estimation of the early speech component Ze (`, k). Dr.Ir. Emanuël Habets Recent Advances in Speech Dereverberation Introduction Existing Speech Dereverberation Techniques Proposed Speech Dereverberation Technique Experimental Results Summary and Future Research Proposed Approach Statistical Reverberation Models Single-Microphone Spectral Enhancement Multi-Microphone Spectral Enhancement Polack’s Statistical Reverberation Model Polack developed a time-domain model where an AIR is described as a realization of a non-stationary stochastic process: ( b(n)e−αn for n ≥ 0; h(n) = 0 otherwise, where b(n) is a white zero–mean Gaussian stationary noise sequence and α is linked to the reverberation time T60 through α, Dr.Ir. Emanuël Habets 3 ln(10) . T60 fs Recent Advances in Speech Dereverberation Introduction Existing Speech Dereverberation Techniques Proposed Speech Dereverberation Technique Experimental Results Summary and Future Research Proposed Approach Statistical Reverberation Models Single-Microphone Spectral Enhancement Multi-Microphone Spectral Enhancement Generalized Statistical Reverberation Model To model the contribution of the direct path, the AIR h(n) is divided into two segments: −αn 0 ≤ n < N ; r hd (n) = bd (n)e −αn h(n) = hr (n) = br (n)e n ≥ Nr ; 0 otherwise. The value Nr is chosen such that hd (n) contains the direct path and that hr (n) consists of all later reflections. Dr.Ir. Emanuël Habets Recent Advances in Speech Dereverberation Introduction Existing Speech Dereverberation Techniques Proposed Speech Dereverberation Technique Experimental Results Summary and Future Research Proposed Approach Statistical Reverberation Models Single-Microphone Spectral Enhancement Multi-Microphone Spectral Enhancement Late Reverberant Spectral Variance Estimation 1 The spectral variance of the reverberant signal component zr (n) is given by λ̂zr (`, k) = e−2α(k)R (1 − κ(k)) λ̂zr (` − 1, k) + κ(k) e−2α(k)R λz (` − 1, k), where λz (`, k) = E{|Z (`, k)|2 } and κ is inversely proportional to the Direct to Reverberation Ratio. 2 The spectral variance of the late reverberant signal component z` (n) is given by λz` (`, k) = e−2α(k)(N` −R) λ̂zr (` − Dr.Ir. Emanuël Habets N` + 1, k). R Recent Advances in Speech Dereverberation Introduction Existing Speech Dereverberation Techniques Proposed Speech Dereverberation Technique Experimental Results Summary and Future Research Proposed Approach Statistical Reverberation Models Single-Microphone Spectral Enhancement Multi-Microphone Spectral Enhancement Single-Microphone Spectral Enhancement x(n) TF Analysis X(, k) Post-Filter Noise Estimator Ẑe (, k) TF Synthesis ẑe (n) λ̂v (, k) λ̂v (, k) Late Reverberant Energy Estimator λ̂z (, k) Figure: Block diagram of the developed single-microphone speech enhancement system. Dr.Ir. Emanuël Habets Recent Advances in Speech Dereverberation Introduction Existing Speech Dereverberation Techniques Proposed Speech Dereverberation Technique Experimental Results Summary and Future Research Proposed Approach Statistical Reverberation Models Single-Microphone Spectral Enhancement Multi-Microphone Spectral Enhancement Post-Filter Various spectral enhancement methods can be used, e.g., spectral subtraction and statistical methods. We used a statistical method that is based on a Mean Squared Error distortion measure and a Log Spectral Amplitude fidelity criterion. The STFT coefficients of the speech and interference are assumed to be complex Gaussian random variables. The resulting gain function depends on the a priori and a posteriori Signal to Interference Ratios, and the speech presence probability. We developed several modifications to improve the joint suppression of ambient noise and late reverberation. Dr.Ir. Emanuël Habets Recent Advances in Speech Dereverberation Introduction Existing Speech Dereverberation Techniques Proposed Speech Dereverberation Technique Experimental Results Summary and Future Research Proposed Approach Statistical Reverberation Models Single-Microphone Spectral Enhancement Multi-Microphone Spectral Enhancement Multi-Microphone Spectral Enhancement Until now we exploited time diversity and spectral diversity. However, reverberation induces spatial diversity , which can be exploited by using multiple microphones. The late reverberant spectral variance estimate can be improved using multiple microphones. The speech presence probability estimation can be improved using spatial information (Mean Squared Coherence). Dr.Ir. Emanuël Habets Recent Advances in Speech Dereverberation Introduction Existing Speech Dereverberation Techniques Proposed Speech Dereverberation Technique Experimental Results Summary and Future Research Proposed Approach Statistical Reverberation Models Single-Microphone Spectral Enhancement Multi-Microphone Spectral Enhancement Multi-Microphone Spectral Enhancement The multi-microphone Minimum Mean Squared Error (MMSE) estimator can be divided into a Minimum Variance Distortionless Response (MVDR) beamformer and a single-Microphone MMSE estimator. X1 (, k) .. . XM (, k) Multi-Channel Y (, k) MMSE Estimator X1 (, k) .. . XM (, k) MVDR Y MVDR (, k) Beamformer Single-Channel Y (, k) MMSE Estimator Figure: Multi-microphone MMSE estimator and the equivalent MVDR beamformer and single-microphone MMSE estimator. Dr.Ir. Emanuël Habets Recent Advances in Speech Dereverberation Introduction Existing Speech Dereverberation Techniques Proposed Speech Dereverberation Technique Experimental Results Summary and Future Research Single-Microphone Multi-Microphone Audio Demonstration Outline 1 Introduction 2 Existing Speech Dereverberation Techniques 3 Proposed Speech Dereverberation Technique 4 Experimental Results 5 Summary and Future Research Dr.Ir. Emanuël Habets Recent Advances in Speech Dereverberation Introduction Existing Speech Dereverberation Techniques Proposed Speech Dereverberation Technique Experimental Results Summary and Future Research Single-Microphone Multi-Microphone Audio Demonstration Virtual Room Dimensions: 5 m × 4 m × 6 m Volume: 120 m3 x1 (n) Di D Reverberation time: 0.2 - 1 sec Source-Microphone Distance: 0.25 - 4 m xM (n) Signal to Noise Ratio: 10 - 30 dB Reverberation starts at N` /fs = 40 ms. Figure: Virtual room setup. Dr.Ir. Emanuël Habets Recent Advances in Speech Dereverberation Introduction Existing Speech Dereverberation Techniques Proposed Speech Dereverberation Technique Experimental Results Summary and Future Research Single-Microphone Multi-Microphone Audio Demonstration Performance Evaluation Measures Quantity: Segmental Signal to Interference Ratio (SIR): ! P`R+N−1 2 s(n) 1 X n=`R SIRseg = 10 log10 P`R+N−1 |L| (s(n) − ŝ(n))2 n=`R [dB], `∈L (1) Quality: Bark Spectral Distortion (BSD) score: 1 X BSD = |L| `∈L PKs 2 ks =1 (Ls (`, ks ) − Lŝ (`, ks )) , PKs 2 ks =1 (Ls (`, ks )) Dr.Ir. Emanuël Habets Recent Advances in Speech Dereverberation (2) Introduction Existing Speech Dereverberation Techniques Proposed Speech Dereverberation Technique Experimental Results Summary and Future Research Single-Microphone Multi-Microphone Audio Demonstration Joint Late Reverberation and Noise Suppression 5 0.4 Microphone Processed NS Processed RS+NS Bark Spectral Distortion Segmental SIR [dB] 0 0.35 −5 −10 −15 Microphone Processed NS Processed RS+NS 0.3 0.25 0.2 0.15 0.1 0.05 −20 0.2 0.4 0.6 0.8 Reverberation Time [s] 1 0 0.2 0.4 0.6 0.8 Reverberation Time [s] 1 Figure: Segmental SIRs and BSDs of the unprocessed microphone signal, the processed signal after noise suppression (NS), and the processed signal after joint reverberation and noise suppression (RS+NS). The reverberation time varies between 0.2 and 1 s (SNR = 30 dB, D = 1 m, and N` /fs = 40 ms). Dr.Ir. Emanuël Habets Recent Advances in Speech Dereverberation Introduction Existing Speech Dereverberation Techniques Proposed Speech Dereverberation Technique Experimental Results Summary and Future Research Single-Microphone Multi-Microphone Audio Demonstration Joint Late Reverberation and Noise Suppression 10 2.5 Segmental SIR [dB] Bark Spectral Distortion Microphone Processed NS Processed RS+NS 5 0 −5 −10 −15 Microphone Processed NS Processed RS+NS 2 1.5 1 0.5 −20 −25 0.5 1 1.5 2 2.5 3 3.5 Source−Microphone Distance [m] 4 0 0.5 1 1.5 2 2.5 3 Source−Microphone Distance [m] 3.5 4 Figure: Segmental SIRs and BSDs of the unprocessed microphone signal, the processed signal after noise suppression (NS), and the processed signal after joint reverberation and noise suppression (RS+NS). The source-microphone varies between 0.25 and 4 m (SNR = 30 dB, T60 = 500 ms, and N` /fs = 40 ms). Dr.Ir. Emanuël Habets Recent Advances in Speech Dereverberation Introduction Existing Speech Dereverberation Techniques Proposed Speech Dereverberation Technique Experimental Results Summary and Future Research Single-Microphone Multi-Microphone Audio Demonstration Joint Late Reverberation and Noise Suppression 10 0 −5 −10 −15 −20 10 Microphone Processed NS Processed RS+NS 0.18 Bark Spectral Distortion Segmental SIR [dB] 5 0.2 Microphone Processed NS Processed RS+NS 0.16 0.14 0.12 0.1 12.5 15 17.5 20 22.5 SNR [dB] 25 27.5 30 0.08 10 12.5 15 17.5 20 22.5 SNR [dB] 25 27.5 30 Figure: Segmental SIRs and BSDs of the unprocessed microphone signal, the processed signal after noise suppression (NS), and the processed signal after joint reverberation and noise suppression (RS+NS). The SNR of the received signal varies between 10 and 30 dB (D = 1 m,T60 = 500 ms, and N` /fs = 40 ms). Dr.Ir. Emanuël Habets Recent Advances in Speech Dereverberation Introduction Existing Speech Dereverberation Techniques Proposed Speech Dereverberation Technique Experimental Results Summary and Future Research Single-Microphone Multi-Microphone Audio Demonstration Joint Late Reverberation and Noise Suppression 5 Bark Spectral Distortion Segmental SRR [dB] 0 0.5 Microphone DSB DSB−PF −5 −10 −15 −20 1 2 3 4 5 6 7 Number of Microphones 8 9 0.4 0.3 0.2 0.1 0 1 Microphone DSB DSB−PF 2 3 4 5 6 7 Number of Microphones 8 9 Figure: Segmental SIRs and BSDs of the reference microphone signal, the DSB signal, and the DSB-PF signal. The number of microphones ranges from 1 to 9 (D = 1.5 m, T60 = 0.5 s, SNR = 30 dB, and N` /fs = 40 ms). Dr.Ir. Emanuël Habets Recent Advances in Speech Dereverberation Introduction Existing Speech Dereverberation Techniques Proposed Speech Dereverberation Technique Experimental Results Summary and Future Research Single-Microphone Multi-Microphone Audio Demonstration Audio Demonstration Frequency [kHz] Frequency [kHz] Microphone 4 4 0.2 0.15 2 0.1 2 4 Processed 6 8 Amplitude 0 0 Microphone Processed 0.05 0 −0.05 −0.1 2 −0.15 0 0 2 4 Time [sec] 6 8 −0.2 0 2 4 Time [sec] 6 8 Figure: Spectrograms and waveforms of the microphone signal and processed signal (M = 4, D = 1.5 m, T60 = 0.7 s, SNR = 20 dB, and N` /fs = 48 ms). Dr.Ir. Emanuël Habets Recent Advances in Speech Dereverberation Introduction Existing Speech Dereverberation Techniques Proposed Speech Dereverberation Technique Experimental Results Summary and Future Research Outline 1 Introduction 2 Existing Speech Dereverberation Techniques 3 Proposed Speech Dereverberation Technique 4 Experimental Results 5 Summary and Future Research Dr.Ir. Emanuël Habets Recent Advances in Speech Dereverberation Introduction Existing Speech Dereverberation Techniques Proposed Speech Dereverberation Technique Experimental Results Summary and Future Research Summary and Future Research Summary We developed an effective and computational efficient estimator for the late reverberant spectral variance. Suppression of late reverberation and ambient noise is possible using spectral enhancement. Future Research Optimal fidelity criteria for speech dereverberation? A suitable technique to equalize the spectral colouration caused by the early reflections needs to be developed. Together with the developed spectral enhancement technique it can provide a practical solution for speech dereverberation. Dr.Ir. Emanuël Habets Recent Advances in Speech Dereverberation Introduction Existing Speech Dereverberation Techniques Proposed Speech Dereverberation Technique Experimental Results Summary and Future Research Thank you for your attention.... For more information visit www.dereverberation.com and ehabets.dereverberation.com. Dr.Ir. Emanuël Habets Recent Advances in Speech Dereverberation