TY - CONF AB - This contribution describes a step-wise source counting algorithm to determine the number of speakers in an offline scenario. Each speaker is identified by a variational expectation maximization (VEM) algorithm for complex Watson mixture models and therefore directly yields beamforming vectors for a subsequent speech separation process. An observation selection criterion is proposed which improves the robustness of the source counting in noise. The algorithm is compared to an alternative VEM approach with Gaussian mixture models based on directions of arrival and shown to deliver improved source counting accuracy. The article concludes by extending the offline algorithm towards a low-latency online estimation of the number of active sources from the streaming input data. AU - Drude, Lukas AU - Chinaev, Aleksej AU - Tran Vu, Dang Hai AU - Haeb-Umbach, Reinhold ID - 11753 KW - Accuracy KW - Acoustics KW - Estimation KW - Mathematical model KW - Soruce separation KW - Speech KW - Vectors KW - Bayes methods KW - Blind source separation KW - Directional statistics KW - Number of speakers KW - Speaker diarization T2 - 14th International Workshop on Acoustic Signal Enhancement (IWAENC 2014) TI - Towards Online Source Counting in Speech Mixtures Applying a Variational EM for Complex Watson Mixture Models ER - TY - CONF AB - In this paper we propose to employ directional statistics in a complex vector space to approach the problem of blind speech separation in the presence of spatially correlated noise. We interpret the values of the short time Fourier transform of the microphone signals to be draws from a mixture of complex Watson distributions, a probabilistic model which naturally accounts for spatial aliasing. The parameters of the density are related to the a priori source probabilities, the power of the sources and the transfer function ratios from sources to sensors. Estimation formulas are derived for these parameters by employing the Expectation Maximization (EM) algorithm. The E-step corresponds to the estimation of the source presence probabilities for each time-frequency bin, while the M-step leads to a maximum signal-to-noise ratio (MaxSNR) beamformer in the presence of uncertainty about the source activity. Experimental results are reported for an implementation in a generalized sidelobe canceller (GSC) like spatial beamforming configuration for 3 speech sources with significant coherent noise in reverberant environments, demonstrating the usefulness of the novel modeling framework. AU - Tran Vu, Dang Hai AU - Haeb-Umbach, Reinhold ID - 11913 KW - array signal processing KW - blind source separation KW - blind speech separation KW - complex vector space KW - complex Watson distribution KW - directional statistics KW - expectation-maximisation algorithm KW - expectation maximization algorithm KW - Fourier transform KW - Fourier transforms KW - generalized sidelobe canceller KW - interference suppression KW - maximum signal-to-noise ratio beamformer KW - microphone signal KW - probabilistic model KW - spatial aliasing KW - spatial beamforming configuration KW - speech enhancement KW - statistical distributions T2 - IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2010) TI - Blind speech separation employing directional statistics in an Expectation Maximization framework ER -