TY - CONF AB - The parametric Bayesian Feature Enhancement (BFE) and a datadriven Denoising Autoencoder (DA) both bring performance gains in severe single-channel speech recognition conditions. The first can be adjusted to different conditions by an appropriate parameter setting, while the latter needs to be trained on conditions similar to the ones expected at decoding time, making it vulnerable to a mismatch between training and test conditions. We use a DNN backend and study reverberant ASR under three types of mismatch conditions: different room reverberation times, different speaker to microphone distances and the difference between artificially reverberated data and the recordings in a reverberant environment. We show that for these mismatch conditions BFE can provide the targets for a DA. This unsupervised adaptation provides a performance gain over the direct use of BFE and even enables to compensate for the mismatch of real and simulated reverberant data. AU - Heymann, Jahn AU - Haeb-Umbach, Reinhold AU - Golik, P. AU - Schlueter, R. ID - 11813 KW - codecs KW - signal denoising KW - speech recognition KW - Bayesian feature enhancement KW - denoising autoencoder KW - reverberant ASR KW - single-channel speech recognition KW - speaker to microphone distances KW - unsupervised adaptation KW - Adaptation models KW - Noise reduction KW - Reverberation KW - Speech KW - Speech recognition KW - Training KW - deep neuronal networks KW - denoising autoencoder KW - feature enhancement KW - robust speech recognition T2 - Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on TI - Unsupervised adaptation of a denoising autoencoder by Bayesian Feature Enhancement for reverberant asr under mismatch conditions ER - TY - JOUR AB - In this contribution we present a theoretical and experimental investigation into the effects of reverberation and noise on features in the logarithmic mel power spectral domain, an intermediate stage in the computation of the mel frequency cepstral coefficients, prevalent in automatic speech recognition (ASR). Gaining insight into the complex interaction between clean speech, noise, and noisy reverberant speech features is essential for any ASR system to be robust against noise and reverberation present in distant microphone input signals. The findings are gathered in a probabilistic formulation of an observation model which may be used in model-based feature compensation schemes. The proposed observation model extends previous models in three major directions: First, the contribution of additive background noise to the observation error is explicitly taken into account. Second, an energy compensation constant is introduced which ensures an unbiased estimate of the reverberant speech features, and, third, a recursive variant of the observation model is developed resulting in reduced computational complexity when used in model-based feature compensation. The experimental section is used to evaluate the accuracy of the model and to describe how its parameters can be determined from test data. AU - Leutnant, Volker AU - Krueger, Alexander AU - Haeb-Umbach, Reinhold ID - 11861 IS - 1 JF - IEEE/ACM Transactions on Audio, Speech, and Language Processing KW - computational complexity KW - reverberation KW - speech recognition KW - automatic speech recognition KW - background noise KW - clean speech KW - computational complexity KW - energy compensation KW - logarithmic mel power spectral domain KW - mel frequency cepstral coefficients KW - microphone input signals KW - model-based feature compensation schemes KW - noisy reverberant speech automatic recognition KW - noisy reverberant speech features KW - reverberation KW - Atmospheric modeling KW - Computational modeling KW - Noise KW - Noise measurement KW - Reverberation KW - Speech KW - Vectors KW - Model-based feature compensation KW - observation model for reverberant and noisy speech KW - recursive observation model KW - robust automatic speech recognition SN - 2329-9290 TI - A New Observation Model in the Logarithmic Mel Power Spectral Domain for the Automatic Recognition of Noisy Reverberant Speech VL - 22 ER - TY - CONF AB - The accuracy of automatic speech recognition systems in noisy and reverberant environments can be improved notably by exploiting the uncertainty of the estimated speech features using so-called uncertainty-of-observation techniques. In this paper, we introduce a new Bayesian decision rule that can serve as a mathematical framework from which both known and new uncertainty-of-observation techniques can be either derived or approximated. The new decision rule in its direct form leads to the new significance decoding approach for Gaussian mixture models, which results in better performance compared to standard uncertainty-of-observation techniques in different additive and convolutive noise scenarios. AU - Abdelaziz, Ahmed H. AU - Zeiler, Steffen AU - Kolossa, Dorothea AU - Leutnant, Volker AU - Haeb-Umbach, Reinhold ID - 11716 KW - Bayes methods KW - Gaussian processes KW - convolution KW - decision theory KW - decoding KW - noise KW - reverberation KW - speech coding KW - speech recognition KW - Bayesian decision rule KW - GMM KW - Gaussian mixture models KW - additive noise scenarios KW - automatic speech recognition systems KW - convolutive noise scenarios KW - decoding approach KW - mathematical framework KW - reverberant environments KW - significance decoding KW - speech feature estimation KW - uncertainty-of-observation techniques KW - Hidden Markov models KW - Maximum likelihood decoding KW - Noise KW - Speech KW - Speech recognition KW - Uncertainty KW - Uncertainty-of-observation KW - modified imputation KW - noise robust speech recognition KW - significance decoding KW - uncertainty decoding SN - 1520-6149 T2 - Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on TI - GMM-based significance decoding ER - TY - JOUR AB - In this contribution we extend a previously proposed Bayesian approach for the enhancement of reverberant logarithmic mel power spectral coefficients for robust automatic speech recognition to the additional compensation of background noise. A recently proposed observation model is employed whose time-variant observation error statistics are obtained as a side product of the inference of the a posteriori probability density function of the clean speech feature vectors. Further a reduction of the computational effort and the memory requirements are achieved by using a recursive formulation of the observation model. The performance of the proposed algorithms is first experimentally studied on a connected digits recognition task with artificially created noisy reverberant data. It is shown that the use of the time-variant observation error model leads to a significant error rate reduction at low signal-to-noise ratios compared to a time-invariant model. Further experiments were conducted on a 5000 word task recorded in a reverberant and noisy environment. A significant word error rate reduction was obtained demonstrating the effectiveness of the approach on real-world data. AU - Leutnant, Volker AU - Krueger, Alexander AU - Haeb-Umbach, Reinhold ID - 11862 IS - 8 JF - IEEE Transactions on Audio, Speech, and Language Processing KW - Bayes methods KW - compensation KW - error statistics KW - reverberation KW - speech recognition KW - Bayesian feature enhancement KW - background noise KW - clean speech feature vectors KW - compensation KW - connected digits recognition task KW - error statistics KW - memory requirements KW - noisy reverberant data KW - posteriori probability density function KW - recursive formulation KW - reverberant logarithmic mel power spectral coefficients KW - robust automatic speech recognition KW - signal-to-noise ratios KW - time-variant observation KW - word error rate reduction KW - Robust automatic speech recognition KW - model-based Bayesian feature enhancement KW - observation model for reverberant and noisy speech KW - recursive observation model TI - Bayesian Feature Enhancement for Reverberation and Noise Robust Speech Recognition VL - 21 ER - TY - JOUR AB - In this paper, we present a novel blocking matrix and fixed beamformer design for a generalized sidelobe canceler for speech enhancement in a reverberant enclosure. They are based on a new method for estimating the acoustical transfer function ratios in the presence of stationary noise. The estimation method relies on solving a generalized eigenvalue problem in each frequency bin. An adaptive eigenvector tracking utilizing the power iteration method is employed and shown to achieve a high convergence speed. Simulation results demonstrate that the proposed beamformer leads to better noise and interference reduction and reduced speech distortions compared to other blocking matrix designs from the literature. AU - Krueger, Alexander AU - Warsitz, Ernst AU - Haeb-Umbach, Reinhold ID - 11850 IS - 1 JF - IEEE Transactions on Audio, Speech, and Language Processing KW - acoustical transfer function ratio KW - adaptive eigenvector tracking KW - array signal processing KW - beamformer design KW - blocking matrix KW - eigenvalues and eigenfunctions KW - eigenvector-based transfer function ratios estimation KW - generalized sidelobe canceler KW - interference reduction KW - iterative methods KW - power iteration method KW - reduced speech distortions KW - reverberant enclosure KW - reverberation KW - speech enhancement KW - stationary noise TI - Speech Enhancement With a GSC-Like Structure Employing Eigenvector-Based Transfer Function Ratios Estimation VL - 19 ER - TY - JOUR AB - In this paper, we present a new technique for automatic speech recognition (ASR) in reverberant environments. Our approach is aimed at the enhancement of the logarithmic Mel power spectrum, which is computed at an intermediate stage to obtain the widely used Mel frequency cepstral coefficients (MFCCs). Given the reverberant logarithmic Mel power spectral coefficients (LMPSCs), a minimum mean square error estimate of the clean LMPSCs is computed by carrying out Bayesian inference. We employ switching linear dynamical models as an a priori model for the dynamics of the clean LMPSCs. Further, we derive a stochastic observation model which relates the clean to the reverberant LMPSCs through a simplified model of the room impulse response (RIR). This model requires only two parameters, namely RIR energy and reverberation time, which can be estimated from the captured microphone signal. The performance of the proposed enhancement technique is studied on the AURORA5 database and compared to that of constrained maximum-likelihood linear regression (CMLLR). It is shown by experimental results that our approach significantly outperforms CMLLR and that up to 80\% of the errors caused by the reverberation are recovered. In addition to the fact that the approach is compatible with the standard MFCC feature vectors, it leaves the ASR back-end unchanged. It is of moderate computational complexity and suitable for real time applications. AU - Krueger, Alexander AU - Haeb-Umbach, Reinhold ID - 11846 IS - 7 JF - IEEE Transactions on Audio, Speech, and Language Processing KW - ASR KW - AURORA5 database KW - automatic speech recognition KW - Bayesian inference KW - belief networks KW - CMLLR KW - computational complexity KW - constrained maximum likelihood linear regression KW - least mean squares methods KW - LMPSC computation KW - logarithmic Mel power spectrum KW - maximum likelihood estimation KW - Mel frequency cepstral coefficients KW - MFCC feature vectors KW - microphone signal KW - minimum mean square error estimation KW - model-based feature enhancement KW - regression analysis KW - reverberant speech recognition KW - reverberation KW - RIR energy KW - room impulse response KW - speech recognition KW - stochastic observation model KW - stochastic processes TI - Model-Based Feature Enhancement for Reverberant Speech Recognition VL - 18 ER -